Member-only story

Autoscaling apps on Kubernetes with the Horizontal Pod Autoscaler

Published in

ITNEXT

10 min readJun 10, 2020

This article gives a high-level overview of how the Horizontal Pod Autoscaler (HPA) in Kubernetes works and how to use it.

A previous version of this article has been published on learnk8s.io.

Introduction
Different types of autoscaling in Kubernetes
What is the Horizontal Pod Autoscaler?
How is the Horizontal Pod Autoscaler configured?
How are application metrics obtained?
Putting everything together

Introduction

Deploying a stateless app with a statically configured number of replicas is not optimal.

Traffic patterns can change quickly, and the app should be able to adapt to them:

When the rate of requests increases, the app should scale up (i.e. increase the number of replicas) to stay responsive.
When the rate of requests decreases, the app should scale down (i.e. decrease the number of replicas) to avoid wasting resources.

In the context of horizontal scaling, scaling up is also called “scaling out”, and scaling down is also called “scaling in”. This contrasts with vertical scaling (see below), which uses only the terms “scaling up” and “scaling down”.

ITNEXT

Autoscaling apps on Kubernetes with the Horizontal Pod Autoscaler

Contents

Introduction

Published in ITNEXT

Written by Daniel Weibel

Responses (4)