In this post, we are going to discuss a common evolution of server-side architecture that many growing companies face. It is a now-legendary transition from a monolithic application to a micro-services architecture. And although decoupling is a sound software development concept, there are a number of risks, and pain points associated with it. This writeup covers some of the issues we faced while scaling Grammarly’s server backend and the solutions and insights that we had in the process.

Phase one: Okay, it works

When a startup begins development of a product, there is almost never 100% knowledge of what will work and what will not. The goal is to quickly build a minimally viable product and then tailor it to iterate on the concept and adapt it to market needs. If things go well, the product grows bigger and bigger (usually also very quickly). Shortages of resources, uncertainty, and time pressure reduce focus on sound system architecture and the quality of the code and tests (actually, if you have some automated tests at this point you are doing quite well).

However, at some point in time, the team will realize that the old architecture, although being a very respectable thing that brought the company to success, cannot satisfy the new requirements for scalability and stability. Quite the opposite – it is unpredictable and barely manageable; developers are constantly working on hot fixes and outages. Most likely, by this time, the code base has increased in size several times, but initially unclear requirements are much more clear because the necessary understanding of the market is clear.

Needless to say, all the original experiments are still somewhere in the code, and there are new developers in the team who do not know the context but need to write their own code on top of that. The patches do not improve the code but just fix the holes to keep the system functional. It is time to turn the caterpillar into a butterfly, maybe even a whole swarm of them.

In the picture above, I have depicted a very high level architecture of our original product. Components freely communicated with each other and used a common database.

Phase two: Okay, it works badly

It is perfectly natural to start development by making a new product that does it all: business logic, payments, user profiles and settings, a support panel, etc. A single project is easy to set up in a local development environment, easy to deploy, test, and monitor. But if you do not review it and take it to the gym regularly, it becomes heavy and tangled.

When that happens it is time to make some serious decisions, and one of the popular alternatives is to divide a monolithic application into a set of loosely coupled services, each implementing its own logic and communicating via some APIs.

At Grammarly, we reached this point some time ago, and we figured out that there are several reasons to migrate to micro service (http://en.wikipedia.org/wiki/Microservices) architecture.

  1. Smaller code is easier to manage and understand.

    This is simple enough, but one may say that you can just refactor the existing code to make more components, units, or sub-projects. However, this will not solve the following problem.

  2. Different components have different needs.

    By its nature, some functionality requires more computational resources, while others need more memory or disk space, database access, or network bandwidth. Certain components may need to use 3rd-party services completely irrelevant to others (guess what happens if those services are down). Moving a component to an appropriate server environment allows proper scaling. It also provides another benefit, namely:

  3. Independent releases

    While some of your components are mission-critical (like registration and payment processing) and should be up and running most of the time, some are just supplementary. If your application is monolithic, you cannot deploy its components separately. Every deployment requires a hard restart.

  4. Data encapsulation

    Another significant benefit of decoupling the system is the ability to freely change the architecture and data if the API remains intact. For example, if you want to use a different database backend or structure, you are good to go. You can do any crazy refactoring, move to another platform, language, and so on.

Challenges of decoupling

The first challenge to face when implementing micro-services is how they will communicate. There are several options available.

  1. Use a shared database.

    In this case, the database becomes the bottleneck. It destroys the idea of data encapsulation, so you will not be able to change this part easily. The benefit of this approach is persistent storage. Once information has made its way to the database, it will stay there until processed and survive the restart. It is also possible to add queues for batch processing.

  2. Use an Internet protocol (e.g. HTTP).

    This works very well when you need synchronous data processing. Direct service-to-service requests will decentralize the system. The disadvantage of this is the absence of persistence. When some service is offline others will need to cache outgoing requests to replay them later.

    There is an option to use UDP for broadcasting, but in this case, you need to make sure you are ready to lose some data, or use TCP instead.

  3. Use message queuing platform.

    Imagine that account information was changed in the account management service, and you need to notify several other services. If you use HTTP requests, it means you should notify each service (so you have to know them “personally”) and tell them what’s changed. If there are lots of changes, this is going to cost you something, and in the worst case, you will have an internal DDoS attack.If you use shared database, you probably have to use some triggers, so your database server will have a hard time as well.

    For non-synchronous processing, a good choice is a message queuing platform like Kafka or RabbitMQ. Using this platform, the services will be able to exchange messages in a format of your choice (JSON or binary). The great advantage of such an approach is the ability to broadcast events – so the originating service does not really know its audience; it just sends, for example, the message “User account changed”. The services that are interested in such information subscribe to theses messages and process them one by one.

    Message queues can be persistent, so information will not be lost in case you restart. Also, the messaging platform provides means to monitor its health directly. If necessary, it can be scaled separately.

Another challenge is the adjustment of the development process. The people who are responsible for the service development and evolution need to sync-up frequently to align the interfaces and releases to avoid surprises. A good practice is to outline the test and release plan to distribute the load evenly so that several components do not get released at the same time.

Being freer from what’s happening behind the API, the inter-team communication should become a kind of ritual to be carefully carried out. It is not a bad thing because sometimes it is good to pull developers out of the code and bring them together to share the best practices and establish a common ground across the teams. Too much diversity in tools and processes may also make the product harder to control.

Where to decompose?

A micro-service should be small enough to be easily managed and big enough to contain the necessary functionality. No matter how fast the communication channel is. It will, of course, be slower than a direct function call.

When decomposing the application, note the differences and similarities of the components. If components have too much in common, are too small or need to communicate quickly, their separation could slow the system down.

Common sense and collective consciousness of the team should be able to propose the right architecture at the right time. It is therefore important to start the decoupling when there is enough information about what the product should do.

It is reasonable to break off one piece at a time to make sure the decoupling process will not affect users with some unpredictable glitches. For example, outdated functionality can be refactored as a service and plugged in with the appropriate backup plan. Another approach is to implement the new functionality completely as a service and connect it to the old code.

Here is the scheme of the new architecture (for clarity, not all the services are listed). Dashed lines indicate communication via message queues (MQEx are message exchanges), solid lines show synchronous request channels.

Side effects are guaranteed

There are some interesting positive consequences of decoupling.

  1. Better interfaces

    It is important to create and support a good API in each service for communication with other ones. In general, supporting such an API will often bring more sense to the architecture of a particular service in general.

  2. More shared libraries

    It will be necessary to separate the commonly used code into shared libraries to reuse functionality across the services and avoid spaghetti code. This should lead to polished and better documented code, but will require some additional effort to ensure that new releases of these common libraries are backwards-compatible or at least do not break such compatibility in a surprising way.

  3. Better team structure

    Sometimes technical changes lead to organizational ones. Having separate services, allows a better split the areas of responsibility across the engineering team, and permits the teams to work in parallel. After all, the Conway's Law usually holds :)

  4. Easier testing and debugging

    It may sound odd, but decoupling does not make the service harder to debug.

    First, the code of a micro-service is not supposed to be big.

    Second, because of the fixed API, it is easier to mock/simulate the missing components. The individual service developer should not know and care what’s happening in other services, which platform they use and so on, she just needs to plug in some simulator to do the job.

    If the idea of simulation looks too complicated, another option is to use non-production servers (test ones, for example) for development purposes.

At a certain point, decoupling of monolithic architecture can be very beneficial to a mature application. There may be some turbulence at the beginning of this phase, but our experience was very positive. A good decoupling makes the system more stable, scalable, and flexible while ensuring the development team continues being productive and having fun.