AKS Performance: Resource Requests and Limits

Chase
ITNEXT
Published in
5 min readNov 24, 2020

--

How much do I need vs how much can I use

Image from http://blog.kubecost.com/

One of the most important things Kubernetes can do for you is to help ensure workloads are running on nodes capable of handling their resource requirements. Setting proper requests and limits can help prevent you from over saturating your nodes which would inevitably result in poor performance.

Resource Requests

Definition of the amount of resources K8s will guarantee to a pod hosted on a node

Resource Requests define the amount of resources that K8s will guarantee to a pod. To some degree you could think of this definition as the minimum requirements for successful pod creation. Let’s look at an example of what this could look like in a yaml:

apiVersion: v1
kind: Pod
metadata:
name: SuperAppOfAwesome
spec:
containers:
- name: appofawesome
image: ubuntu
resources:
requests:
memory: "256Mi"
cpu: "1000m"

limits:
memory: "512Mi"
cpu: "1500m"

CPU units here are set using millicpu, so 1000m would = 1 vCPU/core. 500m would be a half a core, 250m would be a quarter and so on

We can see above that under the spec for the container we have set the resources tag and defined our limits requesting a minimum of 256MB of memory and 1 CPU core. When deployed this will ensure the scheduler finds a home capable of hosting the pod with those resource request requirements in mind. What happens though if K8s cannot find a suitable host? The pod will not be schedule anywhere and will show with a status of Pending.

Different applications will have different requirements though so how do we determine what our applications need to have to run successfully? The short answer here is basically to use historical usage patterns to help define the requirements. This essentially means running the pod and recording its resource consumption be it manually or using a monitoring tool like Prometheus or datadog. The reason historical data is used for pattern analysis is to try and find the usage percentiles and allocate the requests accordingly.

For example if our application on average is using 1.5Cores and 128MB of memory but once a day for 15 minutes spikes up to 2 Cores and 256MB of memory we can get a good solid understanding that we NEED 1.5Cores and 128MB of memory out of the gate to be successful.

In a future article I will show step by step how we can manually determine what requests should be set at for a workload ill design off the top of my head…just because I can and it might be useful to see it done manually before moving to more automated solutions. I might also in that same article show how to do it with Prometheus and or datadog.

Before we move on to talking about limits I want to mention a few more things. By defining the resource in this way you’re setting the pod up with the Guaranteed K8s QoS class. This is important because if for whatever reason the system needs to reap a pod to regain resources for critical functionality this specific QoS class is considered the most important and is not going to be killed before less important pods. You can read more about QoS classes here but basically if you don’t set any resource or limits your QoS is best effort and K8s is going to murderball that pod and its containers first if it needs to.

I should note here that you can set the resource requests differently for each container inside of your pod and that can actually cause your QoS class to change, so be sure to check out the docs to ensure you are setting what you expect.

Limits

Don’t go over this limit…or else!

I am sure you can guess what this is for just based on the name there. This will allow you to set limits on the pod and prevent them from using too much CPU/Memory. This one some would argue is the most important thing you need to set on pretty much every single pod you have least you want to allow your pods to claim all of the resources that a node has available to it.

Yup you read that correctly. If you do not set limits a pod can potentially consume all the CPU and Memory a node has available and prevent additional pods from being deployed. Setting limits allows you to fine tune and control the upper … limits… <_< ….. >_> … that your pod is capable of consuming. Let’s look at the previous manifest:

apiVersion: v1
kind: Pod
metadata:
name: SuperAppOfAwesome
spec:
containers:
- name: appofawesome
image: ubuntu
resources:
requests:
memory: "256Mi"
cpu: "1000m"
limits:
memory: "512Mi"
cpu: "1500m"

You can see here that I have set the limit for memory at 512Mi and the CPU at 1500m, which again for CPU would be 1.5 vCores. This is going to prevent the pod from consuming more that what we define. But how is this done and what happens if the pod tries to go over its limit? Well in AKS our container orchestrator is using cgroups and the limits are going to be monitored and enforced by the kernel. The violation of CPU and Memory limits are handled quite differently though.

CPU Limits are controlled by using the Completely Fair Scheduler for most Linux container orchestration, AKS included here because they are using docker. Basically the CFS cgroup controls the CPU bandwidth using the quota and period settings. When the quota is exhausted in a single period its throttled and resumed in the next period. It’s interesting, check out the link above for more info on that. What this means though is that your performance could be massively impacted so you would want to avoid it if you value performance. Throttling on the CPU is usually incorrect limits but don’t rule out some run away process — always check to be sure :)

Memory on the other hand isn’t quite so generous. K8’s and the kernel are not going to simply throttle the pods ability to request and consume memory. It’s going to kill it. There is a bit more to it than that but it’s not terribly important to know right now. Just understand that a pod attempting to request more than its limit is likely going to be terminated/OOMKilled.

Summary

Hopefully this helps you understand the importance of setting up your pods with the appropriate resource requests and limits to ensure proper performance and stability. Using requests guarantee a minimum amount of resources to the pod. Usinglimits helps to prevent the pod from consuming too much resources on the node.

In future articles I will show you how to find what your request and limits can/should be set at as well as how to identify and deal with pods that are violating their limits.

That said though the next article, who’s link I will add once published, will discuss Resource Quota’s and their importance for helping to control and limit resource usage across clusters that are used by multiple teams.

--

--

Builder and breaker of things, jack of all trades, all opinions are my own.