Deferred tasks in a microservice architecture

Vlad Pavlenko
ITNEXT
Published in
10 min readFeb 13, 2021

--

Often in projects, there is a need to perform deferred tasks, such as sending email, push, and other specific tasks specific to the domain area of your application. The difficulties begin when the usual crontab is no longer enough, when batch processing is not suitable, and when each task unit has its own execution time or it is assigned dynamically.

To solve this problem, a solution called Trigger Hook was created. The schematic diagram of the operation is shown in Figure 1. The diagram shows what happens to tasks during their entire life cycle. Changing the color means changing the status of the task.

  • Red task — d task whose launch time is not yet soon
  • Yellow task — a task whose launch time is coming soon
  • Green task — a task whose launch time has come
  • Blue task — a task that was processed
  • Grey — not confirmed the status of the task in the database
  • Black — command to delete

Life cycle tasks:

  • When creating a task, it gets into the database (square block) (red and yellow).
  • Tasks are loaded into memory (triangular block) if their start time is coming soon (red->yellow). This structure is implemented in the form of a prioritized queue (heap).
  • When the task execution time comes, it is sent for execution (yellow->green). An intermediate buffer is used before processing to compensate for peak loads.
  • If the task is successfully submitted, it is deleted from the database (green->blue). An intermediate buffer is used before deletion, also to compensate for peak loads.

Next, I will try to describe in more detail some of the features and give arguments in favor of choosing this solution.

Simplicity of the API

The ID is accepted in the UUIDv4 format. If not passed, it will be generated independently. The ability to pass the task id from an external service will be useful when using an asynchronous channel. The start time is specified in the UNIX format.

Create:

Delete:

Handling launch events:

Durability

Task processing may fail, for example, if the connection to the message broker is lost. In this case, the task will not be confirmed, but a second attempt will be made to send it. The task is marked as processed only when the confirmation method is called. Stopping the application suddenly will not cause inconsistencies in the database.

In addition, taking into account the general market trend towards transferring applications to the cloud in the form of microservices, new application requirements are being formed. At least what was in the background earlier is now becoming more important. With this approach, containerized applications are temporary in nature. The Trigger Hook mechanism makes it possible to collapse a micro-service on one server and deploy it on another in a production environment without a soft stop.

If the application crashes, there is a possibility that some tasks may not be confirmed in the database. When you restart the application, these tasks will be sent for execution again. This behavior is a trade-off in favor of providing fault tolerance. When your application receives a message from Trigger Hook, it should only execute the task once, and ignore it when it receives it again. Such situations are common in event-oriented architectures and they should not disrupt the internal state and generate a large number of errors.

Accuracy and performance

To avoid a high frequency of requests to the database, a mechanism is provided for periodically preloading sets of tasks whose execution time is in the specified range. In other words, rare requests for sets of tasks are made instead of frequent requests for tasks one at a time. Such a scheme is well suited if, for example, several hundred thousand tasks are scheduled for one time. Once the tasks are loaded, they are sorted in order of priority. When the timer for the highest priority task expires, it is immediately sent to the client code for processing. This allows you to achieve high peak performance and process tasks with second-by-second accuracy.

Also, greater performance of sending tasks for execution is achieved due to a simple task storage scheme, indexing, and multithreaded database access.

The main indicators of the task processing speed were measured.

Application server:

  • AWS EC2 Ubuntu 20
  • t2.micro
  • 1 vCPUs 2.5 GHz
  • 1 GiB RAM

Database server:

  • AWS RDS MySQL 8.0
  • db.t3.micro
  • 2 vCPUs
  • 1 GiB RAM
  • Network: 2085 Mbps

Creating tasks test:

  • The duration of the test — 1m 11s
  • Average speed (tasks/sec) — 1396
  • Number of tasks — 100000

Deleting tasks test:

  • The duration of the test — 52s
  • Average speed (tasks/sec) — 1920
  • Number of tasks — 100000

Sending tasks (task status from red to blue) test:

  • The duration of the test — 498ms
  • Average speed (tasks/sec) — 200668
  • Number of tasks — 100000

Confirm tasks (the status of the task from the blue to the delete) test:

  • The duration of the test — 2s
  • Average speed (tasks/sec) — 49905
  • Number of tasks — 100000

Monitoring

To quickly check the correct operation, Trigger Hook provides the ability to connect a time-series database. At the initialization stage, it is possible to determine the frequency of measurements and select the metrics of interest. The full list of available metrics is available here.

It is also possible to connect the logging system via an adapter. Available:

  • fatal errors-leading to a complete shutdown of the application
  • errors that are worth paying attention to, but which do not lead to a stop
  • debag messages

Further in the example, you can see an example of connecting to InfluxDB+Grafana

Trigger Hook as part of a microservice architecture

Asynchronous interaction

When using a microservice architecture, asynchronous interaction is usually preferred. Trigger Hook fits well into a microservice, event-driven architecture. But in any case, incoming (create, delete) and outgoing (task start time event) channels can be either asynchronous or synchronous, depending on the requirements.

Below, Figure 2 shows one of the possible variants of the communication scheme via an asynchronous channel. A queue, such as RabbitMQ, can act as a message broker. This scheme eliminates the blocking of the called microservice by the caller, as in a synchronous request via, for example, HTTP. The broker accepts an unlimited number of tasks (conditionally unlimited), and the handler of these tasks takes over them as they are released. As soon as the create command is processed, an event is sent about the successful creation of the task. Also through the broker, the client service receives this event and reacts to it accordingly-changes the status of the entity using the deferred task. The entity can be, for example, a Push notification to mobile devices with an ad.

A significant disadvantage of this scheme is the complexity of the infrastructure serving such interaction. In fact, the introduction of the status of “waiting” for a response from other microservices is distributed transactions.

Figure 2 — Asynchronous channel communication diagram

Figure 3 shows the processes of creating an entity that has a deferred execution and Figure 4 shows the execution when the time comes.

Figure 3 — The process of creating an entity with deferred execution
Figure 4 — Completing the entity task

Sharing

The lack of the ability to transfer some payload when creating a task may disappoint some. But I assure you, this is not necessary. Trigger Hook contains enough functionality to build a task manager. Treat the Trigger Hook as an abstraction layer located at the infrastructure level. Complete information about the task, such as the type, status, execution time, number of execution attempts, payload, and so on, will be contained in the abstraction layer above Trigger Hook.

The top layer will have domain knowledge. In other words, the task manager will have a certain set of task types, a certain set of events related to certain types of tasks. For example, the request to the interface will sound like “create a deferred task for sending an email message” or “create a deferred task for charging a subscription to YouTube”, and the task manager itself will address the Trigger Hook with the request “create a deferred task”. When it’s time to start the task, Trigger Hook will create the “task completion time has arrived” event. This event will be intercepted by the task manager, which will process it by issuing, for example, the event “subscription fee debiting time has arrived”. Figures 5 and 6 show this process.

Figure 5 — Creating a task using an intermediate layer
Figure 6 — Event processing using an intermediate layer

The communication between the application components should be very weak. This also applies to microservices in general. In practice, one of the reasons for strengthening communication is the transfer of part of the responsibility of one service to another. Therefore, one of the most difficult tasks is to find the partition boundary of a (monolithic, for example) application into microservices. To do this successfully, you need to consider the domain. Now we need to answer the question: in which micro-service should we put the “task manager” layer?

Figure 7-Task Manager in one microservice with Trigger Hook

Figure 7 shows a diagram where the task manager is a separate, microservice containing domain knowledge about the types of tasks, events related to these tasks. As you can see from the diagram, it is supposed to share one microservice of the task manager for different client microservices. Each microservice has its own channel for receiving events. In RabbitMQ, such an event channel can be easily implemented as a direct schema.

Figure 8 — Task Manager as part of a client microservice

Figure 8 shows a different scheme, where the task manager is part of the client microservice and is used only for its internal needs. This scheme is suitable if there are no other microservices that use deferred tasks, or if each microservice has its own task manager with a Trigger Hook microservice.

Scaling up

Some applications are harder to scale than others. Everything is much easier if the application state is stored only in external storage with competitive access support, for example, the classic PHP + MySQL bundle. In this case, several instances of the PHP application are deployed on different servers, and Nginx balances the load between them, while the MySQL resource remains the same for all instances of PHP applications. If MySQL fails, then replicas can be added independently of the PHP application.

Things are a bit more complicated when the application stores its own state. It is more difficult to scale horizontally. Trigger Hook stores its state in RAM. It loads tasks that will start soon. Let’s say you have created a task that will take about 5 seconds to complete. This means that Trigger Hook has already preloaded it for execution. But you wanted to cancel this task. To do this, call the delete API method. It is important to call this method from the application instance that took the task for processing. This is the first difficulty.

Figure 9 — Diagram of the horizontal scaling

The second difficulty is that each Trigger Hook instance must have its own schema in the database. This is due to ensuring database consistency in case of failures. In general, from the point of view of performance, it makes no sense to use Trigger Hook instances for a single database, firstly, because Trigger Hook works in multi-threaded mode, and secondly, all other things being equal, the database is a bottleneck.

Figure 9 shows an example of load scaling. Each instance of Trigger Hook has its own database, on different servers (otherwise, it doesn’t make much sense). There is a load balancer in front of Trigger Hook instances. In addition to balancing, it writes a key-value pair to some hash map database, such as Redis:

task_id:instance_host

Trigger Hook Demo App

The application consists of five microservices. Everyone uses a Docker container. Everything runs on Kubernetes. The application can be easily deployed in minikube. Detailed instructions are described here.

Figure 10 — Simplified scheme of interaction of microservices

Message service — a service (Figure 11) that provides an API for creating email messages and assigning them to be sent at a certain time or canceled. Also allows you to view the full list of messages and their statuses.

Some features:

  • Located at the domain level.
  • It consists of a message manager and a task manager.
  • Written in PHP, Symfony 5 framework.
  • It works in two copies. The first one serves API requests using Nginx. The second one starts the daemon via supervisor to listen for events from the RabbitMQ queue. It has auxiliary instances for running migrations.
  • Uses the diagram from Figure 8 to manage tasks.
Figure 11-Message service

Message Dashboard — interface for the Message service (Figure 12).

Figure 12 — Demo application Interface

The Mailer service is located at the infrastructure level. Must directly do the mailing list. Not implemented, as it is not important in the demo.

Trigger service — an infrastructure-level service. Uses the GRPC channel to receive commands to create and delete tasks, and the AMQP channel to send the task execution time event (trigger).

Figure 13 — Trigger service

Monitoring — is also at the infrastructure level, as it shows technical metrics without reference to business events. Figure 14 shows what the panel looks like. Used by Grafana and InfluxDB. A full description of the metrics is available here.

Figure 14-Trigger Hook Technical Metrics

I hope the app and the article will be useful to you! Follow my github, follow the project, put stars) Thanks!

--

--

Hello everyone! I am a PHP / Golang developer, a fan of microservice architecture.