Member-only story
Kubernetes Resource Use and Management in Production
Requests, Limits, Overcommitment, Slack/Waste, Throttling
Before we go into production with a Kubernetes application we should understand K8s resource management. The core is to understand how the Kubernetes scheduler handles resource requests and limits, then everything else makes sense. So let’s dive into it!
After reading this you will:
- understand Requests and Limits
- know how the k8s Scheduler works with resources
- have ideas on how to improve cluster usage and stability
TL;DR
- set memory requests=limits
- set no CPU limits or disable CPU limit enforcing in kubelet
- usage should be below requests
- use scaling with HPA / VPA
- monitor/alert on pod resource usage
Pod Resources
In k8s a pod can have one or multiple containers which are usually run by Docker.

A pod can be seen as a wrapper around containers which work tightly together, which is why they should run on the same machine (node). This means that the total pod resources are the sum of all its container resources.
Resource Requests and Limits
Resource requests and limits will be specified for every container, like:

Requests are guaranteed and reserved resources. No other pods can use these.
Limits are an allowance to use more resources than requested. If a container reaches its specified limits it’ll be throttled for CPU and evicted for memory.
Scheduler
The k8s scheduler is responsible for deciding which pod can run on which node. It does so by looking at various configuration…