Member-only story
Autoscaling apps on Kubernetes with the Horizontal Pod Autoscaler
This article gives a high-level overview of how the Horizontal Pod Autoscaler (HPA) in Kubernetes works and how to use it.
A previous version of this article has been published on learnk8s.io.
Contents
- Introduction
- Different types of autoscaling in Kubernetes
- What is the Horizontal Pod Autoscaler?
- How is the Horizontal Pod Autoscaler configured?
- How are application metrics obtained?
- Putting everything together
Introduction
Deploying a stateless app with a statically configured number of replicas is not optimal.
Traffic patterns can change quickly, and the app should be able to adapt to them:
- When the rate of requests increases, the app should scale up (i.e. increase the number of replicas) to stay responsive.
- When the rate of requests decreases, the app should scale down (i.e. decrease the number of replicas) to avoid wasting resources.
In the context of horizontal scaling, scaling up is also called “scaling out”, and scaling down is also called “scaling in”. This contrasts with vertical scaling (see below), which uses only the terms “scaling up” and “scaling down”.