Architecture deep dive

Data Sharing Issues in a Microservice Architecture

Microservices can be a real pain to deal with; it's all warm and fuzzy until you reach the moment they need to share data…

Alexandre Olive

Published in

ITNEXT

9 min readOct 24, 2023

Desk with disorganized boxes on it with files everywhere digital art — Generated by Dall-E

There you are, three months into your company's new project, where your team's mission is to split your application's monolith into shining new microservices while adding new functionalities. You have heard the words "scalability" and "agility" more times than you thought humanly possible coming from your boss.

Every functionality from that monolith is now a service with a separate database. The architects took the word "micro" literally; I guess they had never heard of Domain Driven Design.

Today seemed like a typical developer day; you had your weekly grooming session, where you discussed new tickets with your teammates. But you realized one of the tickets was different from the others. The product team has decided they want a list of data spanning multiple microservices, the kind of table list with so many columns that it won't fit most screens. And they want to be able to sort the list on every field.

Now you're thinking, is that even possible in our current microservice architecture without doing hundreds of calls to gather the data?

I have been dealing with microservices as a tech lead for some years now, and in today's article, I want to talk about one of the issues I have been facing more and more lately, which has been a pain in my day-to-day job: data sharing between microservices.

Microservices architecture is one of the subjects the most talked — or written — about in IT, so I won’t talk about the default: what is good or not with microservices. I will talk specifically about this issue with an actual use case and then the possible solutions to mitigate it.

Obligatory disclaimers: I am not an architecture expert. If you have implemented better solutions or think my article needs correction, please leave a comment. I’ll be happy to read through them and learn new things.

What is the issue, doctor?

To properly understand the issue, we need an example of a microservice architecture with a real-life example. At my current company, we create videos for products. So, we have a video management service and a product management service. Those two services have separate databases.

A video is linked to a product with an N-to-1 (video to product) relationship, so we store the product ID in the video table.

I'm sorry for my database schema expert readers; I'm far from being one.

Now, let's say my lovely product team asked me to create a page for my back office where they want to see the list of videos with the video's title and product name next to each other. They also want to be able to sort videos by product name.

Since both services are separate, you can't join tables to retrieve data from videos and products in a single query. The only way to do it in the current architecture is to get all the videos first, then retrieve all the product information, and finally show the page.

A junior developer might think: I can retrieve a list of 20 videos first. Then, send the 20 product IDs in a single HTTP call to the product service to retrieve the product information, and finally, regrouping video and product information in the front end is okay. It’s only two HTTP calls, right?

Yes, it's only two calls to show the data, but how would you handle sorting on the product name? You can't sort in the front end because you don't know the product name linked to the video before limiting it to the first twenty videos.

If you have thousands of videos for thousands of products, sorting on the product name can only be done at the database level, and that's what this article issue is all about. You can't sort videos on product names at the database level because it's split into multiple microservices.

This sorting issue is just one of the many issues you can face with data sharing in microservices. Fun fact: with this exact problem, my thought that microservices were perfect was broken a few years back. That's why I chose it for this article — like, did you know Santa Claus is not real? This kind of revelation.

What's the treatment?

There are a couple of solutions for this issue; it depends on how often your data is updated or how fresh you need it to be.

You might not want to hear it, but one of the solutions is to duplicate the data you need from the product service in the video database.

Adding the product name to the video table allows you to sort by product name without issues.

Now, the question is how you keep that data fresh. If I update the product name directly in the product service, how is that replicated in all the videos using that product?

Is being late okay for you?

Is it okay if the data stored in the video table is inconsistent with the new data in the product table for some time? That's what we call "Eventual consistency". We agree that the data will be consistent at some point in a short time, just not perfectly at the exact moment.

If yes, you can use an event-driven architecture setup with a queue system, like Kafka or RabbitMQ, to communicate events between services.

Every time the product service receives an update, it sends a message in a queue dedicated to product updates with the new information.

Any service can subscribe to the product update queue and update its own data when a new message is received.

Alright… This is in a perfect world where the fallacies of distributed computing are universal laws. But in the real world, it's clearly not that easy.

What happens if the update on the video service fails? Do you want to roll back the update in the product service? What if you have other services that use this data? Do you want to roll them back as well?

There could be a failure at any moment in the system; one of the services using the product data could fail its update, and then you would be stuck with inconsistent data.

There are documented patterns you may want to follow when dealing with transaction management, like the SAGA pattern, which explains in-depth how to implement multi-service transactions.

That's where things get really complicated. The network is unreliable, so a good idea is to implement exponential backoff retries if anything fails. This means that any failed attempt is repeated with more and more wait times between retries to ensure there is a real problem, not just a network or any temporary failure.

If it fails completely, you must prepare for a transaction rollback orchestration. Those rollback actions are called Compensating transactions.

The rollback orchestration for us means a separate queue for failed product updates. The failing service sends a message in the queue with the previous state of the product, and every service polls the message from it to roll back their data —then you start thinking, what if this fails as well? It just never ends…

We decided to accept that some small number of updates will fail and the "contact customer service" button will be of great use—it should be there anyway, but please don't use this as your only solution.

As we said before, the issue with this architecture is:

It adds a lot of development overhead and complexity
Data is inconsistent, which could be an issue based on your use case.
On rollback, the user will think his update worked but will find the previous data without understanding why.

But there are many benefits:

It's highly available with low latency
Separation of concerns
Can add an infinite number of services without impacting users.

For example, product names rarely change once added to the site. And even if they do, it's perfectly okay if the value in the video list is incorrect or if the sort is not using the correct value for some time.

Being late is just impossible.

You exit your grooming session happy with your eventual consistency solution. Still, your product team goes mad, and it's unthinkable that the wrong product name in the video list is not the correct one, even for a few seconds! Are you out of your mind? What would our clients think? — does this look like a real scenario to you, too? Okay, maybe I'm exaggerating.

The video/product is a wrong example. A better example would be a payment system, but I'll roll with it for this article anyway.

Okay, so the data cannot be inconsistent, so every time an update happens on the product service, it has to be replicated on the video service synchronously.

You have the beginner-friendly implementation; I'll call it One-Phase commit (1PC). You'll see why later.

The coordinator there could be a separate service or the product service itself. For us, it's a separate service: a GraphQL gateway.

The user will not get a validation that the product update worked until all the transactions to every service using product data are successful.

This means that the more services need the product information, the longer the wait time for the user will be.

What if any error occurs while updating the data in every service?

Every product information updated before needs to be rolled back synchronously. So, each updated service needs to revert to the previous state.

A more complicated and secure pattern is Two-Phase Commit (2 PC). Prepare and commit phase. Instead of directly doing the updates like I showed you, the orchestrator will:

First, the coordinator asks all impacted services if running the update without updating anything is okay. That's the "prepare" phase.
If any service answers with an error, it aborts the update and warns the user. We, of course, implement retries before validating the failure.
If all the service answers are yes, it proceeds with the update. That's the commit phase.

This brings another level of security before running the upgrade. It can still fail during the commit phase, and you need to prepare to roll back updated data.

There's also a Three-Phase commit (3PC) if you're interested.

The main advantage of this architecture is that data is always consistent everywhere.

However, the disadvantages are that updates take much longer and services are more coupled.

This is nightmare-inducing; I'm primarily working with non-sensitive data in my day-to-day job, so while we try to have a good microservice architecture, it's not a life-or-death situation if some data are inconsistent and need human correction.

The solutions I talked about are not one-size-fits-all. As I said in the article, issues still exist that applications dealing with sensitive data can't accept as okay to fail. I haven't had the luck — or bad luck — to work on one of those applications yet, so I still have much to learn.

Microservices are great for many reasons: scaling, separation of concerns, deployment speed, etc. But before creating those services, consider how your application will interact with them. If it's too late, you might want to merge services; that's a route we are considering.

I have a love-hate relationship with microservices. I think they are great; until I face those kinds of issues, then they're driving me insane like a toxic relationship… maybe monoliths were not that bad?

Thank you for reading this article until the end. If you liked this article, please don’t hesitate to follow me on X (Twitter) or add me on Linkedin.

Below is a list of excellent articles or posts I found while researching this subject.