TL;DR

A well-known logistics company uses a monolith application developed over 20 years to generate efficient routes for drivers to deliver packages. This application was transformed into a set of microservices as part of a previous modernization effort.

The system faced various challenges, including issues with reliability, performance, observability, and cost, as a result of a hasty migration from monolith to microservices.

In this post, I will discuss the reasons for the problems and explain how transitioning from a system of microservices to a modular monolith (modulith) architecture effectively solved them.

Overview

The Delivery Routes Generator monolith has been developed and refined over a span of 20+ years. Its main purpose is to generate over 75,000 routes daily, enabling drivers to efficiently deliver packages to consumers and businesses.

To enhance its scalability and performance, the monolithic architecture has been transformed into a system of six microservices using Java and Spring Boot. Furthermore, to ensure optimal coverage and reliability, three instances of the system of microservices are deployed in each of the three cloud regions.

The Stops Loader is responsible for receiving, parsing, and saving delivery stops from an external system, while the Stops Sequencer arranges the stops in a specific order. The Route Generator then applies business rules to the stop sequences for creating delivery routes, which are subsequently optimized by a third-party Route Optimizer. Finally, the optimized routes are stored within the Route Generator.

The plans and reports are produced by the Route Report, assessed using the Route Analyzer, with both obtaining the routes from the Route Generator.

REST API is utilized for communication among microservices and batch processes to synchronize data between them.
There are 18 REST endpoints distributed across 6 microservices, each managed by one of the 3 development teams.

Problems

To gain a better understanding of the problems impacting the strangled monolith, we interviewed the Product Owners, DevOps Engineers, Architects, and Developers. Below are their paraphrased comments, categorized accordingly:

Reliability

“Progressive latency issues resulting in cascading failures undermine reliability. Failures within batch processes leading to data inconsistencies, while the chatty microservices further compound these challenges”

Maintenance and Deployments

“Whenever we make changes to one microservice, it seems like another microservice always ends up breaking. This means we have to deploy all the microservices simultaneously, which can be a real headache. It’s become a complex process that requires constant coordination”

Debugging

“It can be a real pain to troubleshoot problems when debugging takes forever and is often super challenging. Plus, to make matters worse, we sometimes miss out on important log messages because of log sampling issues”

Observability and Scalability

“When we scale microservice instances, we run into noisy neighbor issues, and a sudden surge in metrics”

Cloud Spend

“Why is our cloud expenditure significantly high? What resources are included in this cost? We are experiencing considerable spending on data ingress and egress charges, as well as infrastructure expenses consuming a large portion of our operating budget. Additionally, our maintenance budget has been depleted sooner than anticipated”

Analysis

The Delivery Routes Generator needs to maintain consistent data, is always available, and can execute quickly. Unfortunately, the microservices are communicating with each other in the same way as the old monolith.

The Route Generator played the role of an orchestrator, in charge of directing or initiating the actions on other microservices.

The microservices were too small in scope because the bounded contexts were not properly defined.

The REST API chattiness is attributed to the module mismatch and microservice per domain approach.

Simultaneous datastore read and write operations in the Route Generator during the heavy load situations introduced latency in the system, and made it a single point of failure.

The syncing of data between microservices had a big impact on the data inconsistencies.

Solution: Move to Moduliths

We transformed the system of microservices into moduliths.

What did we do ?

  • Create bounded contexts based on the use-cases in the system of microservices, and incorporate aggregates to provide functionalities to support the use-case behavior.
  • Organize the codebase into modules (Route Generator, Plan Generator and Report) based on the bounded contexts, and store them in a monorepo.
  • We use the Vertical Slice architecture to implement the aggregates.
  • Incorporate API contracts at the aggregate level, and contract testing among aggregates, and between modules.
  • The Delivery Routes Generator consists of two modules - Route Generator and Plan Generator + Report. These modules are deployed as containerized applications using simple CI/CD pipelines.
  • The Plan Generator can still create plans and reports using the routes generated by the Route Generator, even if the latter is offline. This makes it more resilient and available.
  • The Route Generator and Plan Generator + Report are deployed and scaled independently, and they each have their own persistence stores.
  • Move from fine-grained to coarse-grained API endpoints, eliminated the metrics explosion.
  • The decrease in deployable services and datastores solved the problem of noisy neighbors, and reduced the cloud spend by 40%.

After moving to moduliths, DORA metrics indicate a significant improvement in performance. Deployment frequency has gone up by 50%, cycle time has decreased by 75%, and mean time to restore has decreased by 80%.

Takeaways

  • The migration from monolith to modulith offers a simpler path, with a codebase that is easy to maintain and a straightforward CI/CD deployment.
  • Moduliths are great for trying out different ways to break down, fine-tune and deploy code. This way, you can figure out which modules are appropriate for moving to microservices.
  • Before diving into microservices, make sure to think it through. Otherwise, you might create a distributed monolith that might perform even worse than the original monolith.