FaaS-like Spring microservices with autoscaling in Kubernetes

Oleksiy Pylypenko
ITNEXT
Published in
7 min readOct 17, 2021

--

This tutorial shows how to create a basic application demonstrating an approach to writing simple “function-based microservices”. It explains how to create such microservices, as well as deploy them, setup autoscaling, metrics gathering, and tracing.

Such an approach might be quite powerful to conduct large-scale events processing, while gives great balance over ownership of infrastructure and the complexity of deployments.

Project code:

https://github.com/oleksiyp/faas-like-services-example

Java functions

To run event streaming microservices we will use Spring Cloud Stream framework.

Spring Cloud Stream is based on Spring Cloud Function framework — a framework unifying way to build business logic as Java “functions” and run on serverless platforms, standalone or using PaaS.

Spring Cloud Stream extends Spring Cloud Function by adding the ability to bind functions to message brokers. Following message brokers are supported:

Additionally, Spring Cloud Stream might be used as a part of Spring Cloud Data Flow. But the purpose of the article is to show that it is easy enough to establish infrastructure in Kubernetes to run such microservices without such data processing frameworks as Spring Cloud Data Flow, which adds significant complexity in deployment and hides Kubernetes features behind its own deployment adaptor.

Technology choices

To build a demo solution following stack is going to be used:

  • Pop!_OS based Linux machine
  • k3s — easy to setup Kubernetes cluster (any other Kubernetes might be used)
  • RabbitMQ Cluster Operator—operator to deploy RabbitMQ message broker
  • Spring Cloud Stream — a framework to bind functions to RabbitMQ
  • Spring Cloud Sleuth — a framework to trace messages
  • Spring Boot Actuator — a way to provide metrics and liveness&readiness information
  • KEDA—Kubernetes autoscaler based on a variety of ready to use metrics
  • Gradle — Java build tool
  • Skaffold — a tool to perform fast local deployments to Kubernetes cluster
  • Helmfile — tool to customize and deploy helm charts
  • Wavefront — commercial monitoring tool to gather metrics and tracing information

Concept of a demo application

Demo application consists of 4 microservices:

  • streaming-producer — microservice that emits message once per second
  • streaming-processor1 — microservice that adds a delay of 2 seconds while slightly modifying the message
  • streaming-processor2 — microservice that adds a delay of 3 seconds while slightly modifying the message
  • streaming-consumer — consumes messages and dumps them to standard output, no delays applied.

The idea is that processors are slower than producers and are slower at a different rate. This creates a great test for autoscaling setup.

We will see that autoscaling will automatically adjust the number of instances of processors that are handling load based on a rate of processing and production rate.

Producer, processors, and consumer

Producer microservice code is following:

Basically, this creates a microservice that produces a message with a name field containing incremented counter. The rate of messages is configurable by Spring Cloud Stream properties and by default is equal to one message per second.

Processor #1 microservice code is following:

It prepends the “name” field with the “processor1:” string and adds a delay of 2 seconds.

The code of processor #2 microservice is the same as processor #1 microservice but uses a 3-second delay and other prepended string “processor2:”.

Consumer microservice code is following:

The only thing it is doing is publishing every received message to standard output.

Spring stack will deal with all complexities of binding this “Function”, “Supplier” and “Consumer” based lambdas to a message queue, conversion of data formats(JSON), retry logic, dead-letter queue handling, and others. The only thing that is required is to configure such microservices properly.

Demo application repository contains:

  • build.gradle — Gradle project files, building Spring microservices
  • helmfile.yaml — file that is used to deploy the whole application
  • skaffold.yaml — configuration for Skaffold tool to deploy in watch loop
  • helm/rabbitmq-cluster — Helm chart to deploy RabbitMQ cluster via RabbitMQ Cluster Operator
  • helm/streaming-service — Helm chart to deploy Spring microservice

Autoscaling with KEDA

KEDA allows scaling microservice instances based on collected metrics.

For RabbitMQ this Kubernetes definition looks the following way:

This establishes a scaling object for the queue in RabbitMQ. Credentials and cluster addresses are stored in associated secret. KEDA will fetch the size of the queue from RabbitMQ divide it by threshold(value field) and deduce this way how many instances to create.

Because the number of instances depends on queue size this means autoscaling will adapt to the ratio of input rate vs processing rate. If processing going to happen faster than the input rate then the queue will grow and thus more instances going to be spawned. If the input rate drops queue becomes processed and instances are removed by autoscaler.

This allows adjusting the processing rate independently for two flows of different speeds. Also in case, the queue is empty for one minute KEDA will scale all instances to zero.

Testing autoscaling

Let's experiment a bit with this setup.

We will change just one number — a number of replicas of a producer, a replicaCount value in helmfile.yaml

First, let’s set it to 0 and deploy skaffold dev

You will see that initial deployment will deploy no producers and terminate processors very quickly:

Let’s set replicaCount to 1:

At the start, the producer will run to processors almost immediately 10 seconds after the producer started, but because the consumption rate by processors is smaller than production by producer queue will start to accumulate messages and at the end, the system will stabilize with few more pods:

Here because of different rates more pods were issued. Overall two pods of processors running at a rate of 0.5 RPS and three pods of processors running at a rate of 0.333 RPS, which allows processing initially produced stream at the summary rate of 2 RPS.

You can see that queues are filled with some number of messages:

We might drain queues by changing ScaleObject to use message rate together with queue length. Queue length is a very flexible metric but always makes queues sizes grow linearally with the rate of messages. Configuring messages rate as a trigger for scaling requires more complex adjustments of rate thresholds per each microservice.

call rate once switching from 0 replica count to replica count 1

Let’s add one more replica to the producer pod. After some time pods will scale to a stable state:

You can see that rates are adjusted appropriately:

replica count transition 1 to 2

Let’s add one more replica of the producer:

replica count transition from 2 to 3

Now let’s scale down producer:

scaling replicas from 2 to 1 to 0

In the end, you can see that producer and two processors are torn down:

This means no resources are allocated while no input messages exist.

Source codes

Key files:

Conclusion

There might be lots of reasons you would like to create such an event-driven system based on a message broker, functions, and autoscaler.

One is that operating at the PaaS level you have great flexibility at building complex systems. You can enhance the system with a multitude of ready-to-use solutions fitting your needs and tightly integrate them.

RabbitMQ, KEDA, and Spring Cloud Stream are battle-tested products with a solid reputation. This makes them a great match for the core components of the system.

Combining with rich opportunities of Helm deployments in Kubernetes this might result in a very agile and flexible FaaS-like system.

--

--