Kubernetes: monitoring with Prometheus — exporters, a Service Discovery, and its roles

Arseny Zinchenko (setevoy)
ITNEXT
Published in
16 min readApr 26, 2020

--

The next task with our Kubernetes cluster is to set up its monitoring with Prometheus.

This task is complicated by the fact, that there is the whole bunch of resources needs to be monitored:

  • from the infrastructure side — ЕС2 WokerNodes instances, their CPU, memory, network, disks, etc
  • key services of Kubernetes itself — its API server stats, etcd, scheduler
  • deployments, pods and containers state
  • and need to collect some metrics from applications running in the cluster

To monitor all of these the following tools can be used:

What we will do in this post?

  • will spin up a Prometheus server
  • spin up a Redis server and the redis-exporter to grab the Redis server metrics
  • will add a ClusterRole for the Prometheus
  • will configure Prometheus Kubernetes Service Discovery to collect metrics:
  • will take a view on the Prometheus Kubernetes Service Discovery roles
  • will add more exporters:
  • node-exporter
  • kube-state-metrics
  • метрики cAdvisor
  • and the metrics-server

How this will work altogether?

The Prometheus Federation will be used:

  • we already have Prometheus-Grafana stack in my project which is used to monitor already existing resources — this will be our “central” Prometheus server which will PULL metrics from other Prometheus server in a Kubernetes cluster (all our AWS VPC networks are interconnected via VPC peering and metrics will go via private subnetting)
  • in an EKS cluster, we will spin up additional Prometheus server which will PULL metrics from the cluster and exporters and later will hand them over to the “central” Prometheus

helm install

In the 99% guides, I found during an investigation the Kubernetes and Prometheus monitoring all the installation and configuration was confined to the only one “helm install" command.

Helm it’s, of course, great — a templating, a package manager in one tool, but the problem is that it will do a lot of things under the hood, but now I want to understand what’s exactly is going on there.

By the fact, this is the following part of the AWS Elastic Kubernetes Service: a cluster creation automation, part 1 — CloudFormation (in English) and AWS: Elastic Kubernetes Service — автоматизация создания кластера, часть 2 — Ansible, eksctl (in Russian still), so there will be some Ansible in this post.

Versions used here:

  • Kubernetes: (AWS EKS): v1.15.11
  • Prometheus: 2.17.1
  • kubectl: v1.18.0

Content

kubectl context

First, configure access to your cluster — add a new context for your kubectl:

$ aws --region eu-west-2 eks update-kubeconfig --name bttrm-eks-dev-2
Added new context arn:aws:eks:eu-west-2:534***385:cluster/bttrm-eks-dev-2 to /home/admin/.kube/config

Or switch to already existing if any:

$ kubectl config get-contexts

arn:aws:eks:eu-west-2:534****385:cluster/bttrm-eks-dev-1

kubectl config use-context arn:aws:eks:eu-west-2:534***385:cluster/bttrm-eks-dev-2
Switched to context “arn:aws:eks:eu-west-2:534***385:cluster/bttrm-eks-dev-2”.

ConfigMap Reloader

Because we will do a lot of changes to the ConfigMap of our future Prometheus - worth to add the Reloader now, so pods will apply those changes immediately without our intervention.

Create a directory:

$ mkdir -p roles/reloader/tasks

And only one task there — the Reloader installation. Will use kubectl, and call it with the Ansible command module.

In theroles/reloader/tasks/main.yml add the following:

- name: "Install Reloader"
command: "kubectl apply -f https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml"

Enough for now — I just have no time to dig into k8s and its issues with the python-requests import.

Add this role to a playbook:

- hosts:
- all
become:
true
roles:
- role: cloudformation
tags: infra
- role: eksctl
tags: eks
- role: reloader
tags: reloader

Run:

$ ansible-playbook eks-cluster.yml --tags reloader

And check:

$ kubectl get po
NAME READY STATUS RESTARTS AGE]
reloader-reloader-55448df76c-9l9j7 1/1 Running 0 3m20s

Start Prometheus server in Kubernetes

First, let’s start the Prometheus itself in the cluster.

Its config file will be saved as an ConfigMap object.

The cluster already crated and in future everything will be managed by Ansible, so add its directories structure here

$ mkdir -p roles/monitoring/{tasks,templates}

A Namespace

All the resources related to the monitoring will be kept in a dedicated Kubernetes namespace.

In the roles/monitoring/templates/ directory add a config file, call it prometheus-ns.yml.j2 for example:

---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring

Add a new file roles/monitoring/tasks/main.yml to create the namespace:

- name: "Create the Monitoring Namespace"
command: "kubectl apply -f roles/monitoring/templates/prometheus-ns.yml.j2"

Add the role to the playbook:

- hosts:
- all
become:
true
roles:
- role: cloudformation
tags: infra
- role: eksctl
tags: eks
- role: reloader
tags: reloader
- role: monitoring
tags: monitoring

Run to test:

$ ansible-playbook eks-cluster.yml — tags monitoring

Check:

$ kubectl get ns
NAME STATUS AGE
default Active 24m
kube-node-lease Active 25m
kube-public Active 25m
kube-system Active 25m
monitoring Active 32s

The prometheus.yml ConfigMap

As already mentioned, the Prometheus config data will be kept in a ConfigMap.

Create a new file called roles/monitoring/templates/prometheus-configmap.yml.j2 - this is a minimal configuration:

---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
external_labels:
monitor: 'eks-dev-monitor'
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']

Add it to the roles/monitoring/tasks/main.yml:

- name: "Create the Monitoring Namespace"
command: "kubectl apply -f roles/monitoring/templates/prometheus-ns.yml.j2"

- name: "Create prometheus.yml ConfigMap"
command: "kubectl apply -f roles/monitoring/templates/prometheus-configmap.yml.j2"

Now can check one more time — apply it up and check the ConfigMap content:

$ kubectl -n monitoring get configmap prometheus-config -o yaml
apiVersion: v1
data:
prometheus.yml: |
global:
scrape_interval: 15s
external_labels:
monitor: ‘eks-dev-monitor’
scrape_configs:
- job_name: ‘prometheus’
scrape_interval: 5s
static_configs:
- targets: [‘localhost:9090’]
kind: ConfigMap

Prometheus Deployment and a LoadBalancer Service

Now we can start the Prometheus.

Create its deployment file — roles/monitoring/templates/prometheus-deployment.yml.j2:

---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
labels:
app: prometheus
namespace: monitoring
annotations:
reloader.stakater.com/auto: "true"
# service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
replicas: 2
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus-server
image: prom/prometheus
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/prometheus.yml
subPath: prometheus.yml
ports:
- containerPort: 9090
volumes:
- name: prometheus-config-volume
configMap:
name: prometheus-config
---
kind: Service
apiVersion: v1
metadata:
name: prometheus-server-alb
namespace: monitoring
spec:
selector:
app: prometheus
ports:
- protocol: TCP
port: 80
targetPort: 9090
type: LoadBalancer

Here:

  1. create two pods with Prometheus
  2. attach the prometheus-config ConfigMap
  3. create a Service with the LoadBalancer type to access the Prometheus serve on the 9090 port - this will create an AWS LoadBalancer with the Classic type (not Application LB) with a Listener on the 80 port

Annotations here:

  • reloader.stakater.com/auto: «true» — used for the Reloader service
  • service.beta.kubernetes.io/aws-load-balancer-internal: «true» can be commented, for now, later the Internal will be used to configure VPC peering and Prometheus federation, at this moment let's use Internet-facing

Add to the tasks:

...
- name: "Deploy Prometheus server and its LoadBalancer"
command: "kubectl apply -f roles/monitoring/templates/prometheus-deployment.yml.j2"

Run:

$ ansible-playbook eks-cluster.yml --tags monitoring

TASK [monitoring : Deploy Prometheus server and its LoadBalancer] ****
changed: [localhost]

Check pods:

$ kubectl -n monitoring get pod
NAME READY STATUS RESTARTS AGE
prometheus-server-85989544df-pgb8c 1/1 Running 0 38s
prometheus-server-85989544df-zbrsx 1/1 Running 0 38s

And the LoadBalancer Service:

$ kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-server-alb LoadBalancer 172.20.160.199 ac690710a9747460abc19cd999812af8–1463800400.eu-west-2.elb.amazonaws.com 80:30190/TCP 42s

Or in the AWS dashboard:

Check it — open in a browser:

if any issues with the LoadBalancer — the port-forwarding can be used here:

$ kubectl -n monitoring port-forward prometheus-server-85989544df-pgb8c 9090:9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

In this way, Prometheus will be accessible with the http://localhost:9090 URL.

Monitoring configuration

Okay — we’ve started the Prometheus, now we can collect some metrics.

Let’s begin with the simplest task — spin up some service, an exporter for it, and configure Prometheus to collect its metrics.

Redis && redis_exporter

For this, we can use the Redis server and the redis_exporter.

Create a new deployment file roles/monitoring/templates/tests/redis-server-and-exporter-deployment.yml.j2:

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: redis
spec:
replicas: 1
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: redis
spec:
containers:
- name: redis
image: redis
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 6379
- name: redis-exporter
image: oliver006/redis_exporter:latest
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 9121
---
kind: Service
apiVersion: v1
metadata:
name: redis
spec:
selector:
app: redis
ports:
- name: redis
protocol: TCP
port: 6379
targetPort: 6379
- name: prom
protocol: TCP
port: 9121
targetPort: 9121

Annotations here:

  • prometheus.io/scrape — used for filters in pods and services, see the Roles part of this post UPDATE
  • prometheus.io/port — non-default port can be specified here
  • prometheus.io/path — and an exporter’s metrics path can be changed here from the default /metrics

See Per-pod Prometheus Annotations.

Pay attention, that we didn’t set a namespace here — the Redis service and its exporter will be created in the default namespace, and we will see soon what will happen because of this.

Deploy it — manually, it’s a testing task, no need to add to the Ansible:

$ kubectl apply -f roles/monitoring/templates/tests/redis-server-and-exporter-deployment.yml.j2
deployment.extensions/redis created
service/redis created

Check it — it has to have two containers running.

In the default namespace find a pod:

$ kubectl get pod
NAME READY STATUS RESTARTS AGE
redis-698cd557d5-xmncv 2/2 Running 0 10s
reloader-reloader-55448df76c-9l9j7 1/1 Running 0 23m

And containers inside:

$ kubectl get pod redis-698cd557d5-xmncv -o jsonpath=’{.spec.containers[*].name}’
redis redis-exporter

Okay.

Now, add metrics collection from this exporter — update the prometheus-configmap.yml.j2 ConfigMap - add a new target, redis:

---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |

global:
scrape_interval: 15s
external_labels:
monitor: 'eks-dev-monitor'

scrape_configs:

- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']

- job_name: 'redis'
static_configs:
- targets: ['redis:9121']

Deploy it, and check the Targets of the Prometheus:

Nice — a new Target appeared here.

But why it’s failed with the «Get http://redis:9121/metrics: dial tcp: lookup redis on 172.20.0.10:53: no such host» error?

Prometheus ClusterRole, ServiceAccount, and ClusterRoleBinding

So, as we remember we started our Prometheus in the monitoring namespace:

$ kubectl get ns monitoring
NAME STATUS AGE
monitoring Active 25m

While in the Redis deployment we didn’t set a namespace and, accordingly, its pod was created in the default namespace:

$ kubectl -n default get pod
NAME READY STATUS RESTARTS AGE
redis-698cd557d5-xmncv 2/2 Running 0 12m

Or this way:

$ kubectl get pod redis-698cd557d5-xmncv -o jsonpath=’{.metadata.namespace}’
default

To make Prometheus available to get access to all namespace on the cluster — add a ClusterRole, ServiceAccount and ClusterRoleBinding, see the Kubernetes: part 5 — RBAC authorization with a Role and RoleBinding example post for more details.

Also, this ServiceAccount will be used for Prometheus Kubernetes Service Discovery.

Add a roles/monitoring/templates/prometheus-rbac.yml.j2 file:

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- services
- endpoints
- pods
- nodes
- nodes/proxy
- nodes/metrics
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring

Add its execution after creating the ConfigMap and before deployment of the Prometheus server:

- name: "Create the Monitoring Namespace"
command: "kubectl apply -f roles/monitoring/templates/prometheus-ns.yml.j2"

- name: "Create prometheus.yml ConfigMap"
command: "kubectl apply -f roles/monitoring/templates/prometheus-configmap.yml.j2"

- name: "Create Prometheus ClusterRole"
command: "kubectl apply -f roles/monitoring/templates/prometheus-rbac.yml.j2"

- name: "Deploy Prometheus server and its LoadBalancer"
command: "kubectl apply -f roles/monitoring/templates/prometheus-deployment.yml.j2"

Update the prometheus-deployment.yml - in its spec add the serviceAccountName:

...
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus-server
image: prom/prometheus
...

Besides this, to call for pods in another namespace we need to use an FQDN with the namespace specified, in this case, the address to access the Redis exporter will redis.default.svc.cluster.local, see the DNS for Services and Pods.

Update the ConfigMap — change the Redis address:

...
scrape_configs:

- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']

- job_name: 'redis'
static_configs:
- targets: ['redis.default.svc.cluster.local:9121']

Deploy with Ansible to update everything:

$ ansible-playbook eks-cluster.yml --tags monitoring

Check the Targets now:

And metrics:

Prometheus Kubernetes Service Discovery

static_configs in Prometheus is a good thing, but what if you'll have to collect metrics from hundreds of such services?

The solution is to use the kubernetes_sd_config feature.

kubernetes_sd_config roles

Kubernetes SD in Prometheus has a collection of so-called “roles”, which defines how to collect and display metrics.

Each such a Role has its own set of labels, see the documentation:

  • node: will get by one target on each cluster’s WorkerNode кластера, will collect the kubelet’s metrics
  • service: will find and return each Service and its Port
  • pod: all pods and will return its containers as targets to grab metrics from
  • endpoints: will create targets from each an Endpoint for each Service found in a cluster
  • ingress: will create targets for each Path for each Ingress

The difference is only in the labels returned, and what address will be used for each such a target.

Configs examples (more at the end of this post):

Also, to connect to the cluster’s API server with SSL/TLS encryption need to specify a Central Authority certificate of the server to validate it, see the Accessing the API from a Pod.

And for the authorization on the API server, we will use a token from the bearer_token_file, which is mounted from the serviceAccountName: prometheus, which we've set in the deployment above.

node role

Let’s see what we will have in each such a role.

Begin with the node role — add to the scrape_configs, can copy-paste from the example:

...
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

kubernetes_sd_configs:
- role: node

Without relabeling for now — just run and check for targets

Check the Status > Service Discovery for targets and labels discovered

And kubelet_* metrics:

Add some relabeling see the relabel_config and Life of a Label.

What do they suggest there?

...
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
...
  1. collect labels __meta_kubernetes_node_label_alpha_eksctl_io_cluster_name, __meta_kubernetes_node_label_alpha_eksctl_io_nodegroup_name etc, select with (.+) - will get labels like alpha_eksctl_io_cluster_name, alpha_eksctl_io_nodegroup_name, etc
  2. update the __address__ label - set the kubernetes.default.svc:443 value to create an address to call targets
  3. get a value from the __meta_kubernetes_node_nameand update the __metrics_path__ label - set the /api/v1/nodes/__meta_kubernetes_node_name/proxy/metrics

As result, Prometheus will construct a request to the kubernetes.default.svc:443/api/v1/nodes/ip-10–1–57–13.eu-west-2.compute.internal/proxy/metrics — and will grab metrics from this WorkerNode.

Update, check:

Nice!

pod role

Now, let’s see the pod role example from the same resources:

...
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod

relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name

Check:

All nodes were found, but why so much?

Let’s add the prometheus.io/scrape: “true” annotation check:

...
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
...

Which is already added to our Redis, for example:

...
template:
metadata:
annotations:
prometheus.io/scrape: "true"

...

And the result:

http://10.1.44.135:6379/metrics — is the Redis server without metrics.

Do you want to remove it? Add one more filter:

...
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep
regex: .*-exporter
...

I.e. we’ll collect only metrics with the “-exporter” string in the __meta_kubernetes_pod_container_name label.

Check:

Okay — we’ve seen how the Roles in Prometheus Kubernetes Service Discovery is working.

What we have left here?

  • node-exporter
  • kube-state-metrics
  • cAdvisor
  • metrics-server

node-exporter metrics

Add the node_exporter to collect metrics from EC2 instances.

Because a pod with exporter needs to be placed on each WorkerNode — use the DaemonSet type here.

Create a roles/monitoring/templates/prometheus-node-exporter.yml.j2 file - this will spin up a pod on each WorkerNode in the monitoring namespace, and will add a Service make Prometheus available to grab metrics from endpoints:

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter
labels:
name: node-exporter
namespace: monitoring
spec:
template:
metadata:
labels:
name: node-exporter
app: node-exporter
annotations:
prometheus.io/scrape: "true"
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- ports:
- containerPort: 9100
protocol: TCP
resources:
requests:
cpu: 0.15
securityContext:
privileged: true
image: prom/node-exporter
args:
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'
name: node-exporter
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /rootfs
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /

---
kind: Service
apiVersion: v1
metadata:
name: node-exporter
namespace: monitoring
spec:
selector:
app: node-exporter
ports:
- name: node-exporter
protocol: TCP
port: 9100
targetPort: 9100

Add its execution to the roles/monitoring/tasks/main.yml file:

...
- name: "Deploy node-exporter to WorkerNodes"
command: "kubectl apply -f roles/monitoring/templates/prometheus-node-exporter.yml.j2"

And let’s think about how we can collect metrics now.

The first question is — which Role to use here? We need to specify the 9100 port — then we can’t use the node role - it has no Port value:

$ kubectl -n monitoring get node
NAME STATUS ROLES AGE VERSION
ip-10–1–47–175.eu-west-2.compute.internal Ready <none> 3h36m v1.15.10-eks-bac369
ip-10–1–57–13.eu-west-2.compute.internal Ready <none> 3h37m v1.15.10-eks-bac369

What about the Service role?

The address will be set to the Kubernetes DNS name of the service and respective service port

Let’s see:

$ kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
node-exporter ClusterIP 172.20.242.99 <none> 9100/TCP 37m

Okay, what about labels for the service role? All good, but it has no labels for pods on the Worker Nodes - and we need to collect metrics from each node_exporter pod on each WorkerNode.

Let’s go further — the endpoints role:

$ kubectl -n monitoring get endpoints
NAME ENDPOINTS AGE
node-exporter 10.1.47.175:9100,10.1.57.13:9100 44m
prometheus-server-alb 10.1.45.231:9090,10.1.53.46:9090 3h24m

10.1.47.175:9100,10.1.57.13:9100 — aha, here they are!

So — we can use the endpoints role which also has the __meta_kubernetes_endpoint_node_name label.

Try it:

...
- job_name: 'node-exporter'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
regex: 'node-exporter'
action: keep

Check targets:

And metrics:

See requests examples for node_exporter in the Grafana: создание dashboard post (in Russian).

kube-state-metrics

To collect metrics about Kubernetes resources we can use the kube-state-metrics.

Add its installation to the roles/monitoring/tasks/main.yml:

...
- git:
repo: 'https://github.com/kubernetes/kube-state-metrics.git'
dest: /tmp/kube-state-metrics

- name: "Install kube-state-metrics"
command: "kubectl apply -f /tmp/kube-state-metrics/examples/standard/"

The deployment itself can be observed in the https://github.com/kubernetes/kube-state-metrics/blob/master/examples/standard/deployment.yaml file.

We can skip Service Discovery here as we will have the only kube-state-metrics service, so use the static_configs:

...
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']

Check targets:

And metrics, for example — kube_deployment_status_replicas_available:

cAdvisor

The cAdvisor is too well-known - the most widely used system to collect data about containers.

it’s already unbounded to the Kubernetes, so no need for a dedicated reporter — just grab tits metrics. An example can be found in the same https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L102 file.

Update the roles/monitoring/templates/prometheus-configmap.yml.j2:

...
- job_name: 'cAdvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node

relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

Deploy, check:

We have two Kubernetes WorkerNode in our cluster, and we can see metrics from both — great:

metrics-server

There are exporters for it too, but I don’t need them now — just install it to make Kubernetes Limits work and to use the kubectl top.

its installation was a bit more complicated earlier, see the Kubernetes: running metrics-server in AWS EKS for a Kubernetes Pod AutoScaler, but now on the EKS it works out of the box.

Update the roles/monitoring/tasks/main.yml:

...
- git:
repo: "https://github.com/kubernetes-sigs/metrics-server.git"
dest: "/tmp/metrics-server"

- name: "Install metrics-server"
command: "kubectl apply -f /tmp/metrics-server/deploy/kubernetes/"
...

Deploy, check pods in the kube-system namespace:

$ kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
aws-node-s7pvq 1/1 Running 0 4h42m

kube-proxy-v9lmh 1/1 Running 0 4h42m
kube-state-metrics-6c4d4dd64–78bpb 1/1 Running 0 31m
metrics-server-7668599459-nt4pf 1/1 Running 0 44s

The metrics-server pod is here - good.

And try the top node in a couple of minutes:

$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-10–1–47–175.eu-west-2.compute.internal 47m 2% 536Mi 14%
ip-10–1–57–13.eu-west-2.compute.internal 58m 2% 581Mi 15%

And for pods:

$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
redis-6d9cf9d8cb-dfnn6 2m 5Mi
reloader-reloader-55448df76c-wsrfv 1m 7Mi

That’s all, in general.

Useful links

Configs

Misc

--

--