Load balancing strategies in Kubernetes

L4 round robin, L7 round robin, ring hash, and more

Published in

ITNEXT

3 min readSep 26, 2019

Load balancing is the process of efficiently distributing network traffic among multiple backend services, and is a critical strategy for maximizing scalability and availability. In Kubernetes, there are a variety of choices for load balancing external traffic to pods, each with different tradeoffs.

L4 Round Robin Load Balancing with kube-proxy

In a typical Kubernetes cluster, requests that are sent to a Kubernetes Service are routed by a component named kube-proxy. Somewhat confusingly, kube-proxy isn’t a proxy in the classic sense, but a process that implements a virtual IP for a service via iptables rules. This architecture adds additional complexity to routing. A small amount of latency is introduced for each request which increases as the number of services grow. Moreover, kube-proxy routes at L4 (TCP), which doesn’t necessarily fit well with today’s application-centric protocols. For example, imagine two gRPC clients that are connecting to your backend pods. In L4 load balancing, each client would be sent to a different backend pod (round-robin). This is true even if one client is sending 1 request per minute, while the other client is sending 100 requests per second.

So why use kube-proxy at all? In one word: simplicity. The entire process of load balancing is delegated to Kubernetes and it’s the default strategy. Thus, whether you’re sending a request via Ambassador or via another service, you’re going through the same load balancing mechanism.

L7 Round Robin Load Balancing

What if you’re using a multiplexed keep-alive protocol like gRPC or HTTP/2, and you need a more fair round robin algorithm? You can use an API Gateway for Kubernetes such as Ambassador which can bypass kube-proxy altogether, routing traffic directly to Kubernetes pods. Ambassador is built on Envoy Proxy, a L7 proxy, so each gRPC request is load balanced between available pods.

In this approach, your load balancer uses the Kubernetes Endpoints API to track the availability of pods. When a request for a particular Kubernetes service is sent to your load balancer, the load balancer round robins the request between pods that map to the given service.

Ring hash

Instead of rotating requests between different pods, the ring hash load balancing strategy uses a hashing algorithm to send all requests from a given client to the same pod. The ring hash approach is used for both “sticky sessions” (where a cookie is set to ensure that all requests from a client arrive at the same pod) and for “session affinity” (which relies on client IP or some other piece of client state).

The hashing approach is useful for services that maintain per-client state (e.g., a shopping cart). By routing the same client to the same pod, the state for a given client does not need to be synchronized across pods. Moreover, if you’re caching client data on a given pod, the probability of cache hits also increases. The tradeoff with ring hash is that it can be more challenging to evenly distribute load between different backend servers, since client workloads may not be equal. In addition, the computation cost of the hash adds some latency to requests, particularly at scale.

Maglev

Like ring hash, maglev is a consistent hashing algorithm. Originally developed by Google, maglev was designed to be faster than the ring hash algorithm on hash table lookups and to minimize memory footprint. The ring hash algorithm generates fairly large lookup tables that do not fit onto your CPU processor cache. For microservices, Maglev has one fairly expensive tradeoff: generating the lookup table when a node fails is relatively expensive. Given the ephemeral nature of Kubernetes pods, this may not work. For more details on the tradeoffs of different consistent hashing algorithms, this article covers consistent hashing for load balancing in detail, along with some benchmarks.

Learning More

The networking implementation within Kubernetes is more complex then it might first appear, and also somewhat more limited than many engineers understand. Matt Klein has put together a very informative blog post “Introduction to modern network load balancing and proxying” which provides a great foundation for understanding key concepts. There are also a series of additional posts that explain why organizations have chosen to use Layer 7 aware proxies to load balance ingress traffic, such as Bugsnag, Geckoboard, and Twilio.

Ambassador API Gateway

Built on Envoy Proxy, Ambassador is an open source API Gateway that supports all of the load balancing methods discussed above on Kubernetes. Visit https://www.getambassador.io or join our Slack channel for more information.