Monitoring on Kubernetes: Custom Metrics and Autoscaling

Published in

ITNEXT

10 min readFeb 5, 2018

Click here to share this article on LinkedIn »

After earlier post about Metric Collection Agents, the next logical step is to write of what data can be collected and how. In general, there are two categories of telemetry: metric snapshots and events. Metric snapshots are scraped periodically, while events can occur at any time.

With Prometheus adoption for monitoring on Kubernetes, metric collection and reliability engineering are gaining more attention. It is important to remember that Prometheus is pull based system, which is ideal for cluster monitoring and Kubernetes. It supports federation and can be used for complex multi-cluster topologies. The alternative to pull(or scrape) is push model, where data is being sent to time series database. Using push model with Prometheus has disadvantages and increases complexity as described in product documentation here.

There is no winner of contest between push and pull model, each of them has pros and cons. However, it is important to consider broader scenarios of metric collection, not just on Kubernetes. In addition, monitoring hybrid environments, which may not have Kubernetes would require deploying Prometheus inside DMZ to access scrape targets. While some enterprises may be willing to do that, others would prefer push metrics outside security boundary to time series database.

One of key aspects of telemetry collection for any type of metric is time resolution. While using Prometheus to scrape metrics from targets every 5–10 seconds may be sufficient for some types of metrics, it is may be not acceptable for others. The frequency of change for a given metric is important factor for the sampling. Some may be familiar with this theorem. While observing changes using scraping is fine for cluster monitoring, it may not be enough for in-depth analysis of all metrics to describe performance of your application. Therefore ability to collect different types of metrics with desired time resolution needs hybrid approach using both: push and pull.

This challenge is not new. In 2015 Netflix introduced Vector. The purpose of Vector was to give reliability engineer ability to analyze set of metrics collected on the host level.

When Sonar monitoring agent was built to unify metrics collection (with main focus on Windows containers), it included support for exposing collected metrics to Prometheus scraping(pull) and writing them to Akumuli time series database (push).

Akumuli, which stands by its name “accumulate” in Esperanto is time series database that has lowest resource consumption and ability to ingest large number of events. This makes it good option for cluster-wide application or host level metrics that require to achieve time resolution not possible using periodic scraping. In addition, Akumuli is simple: it pre-allocates volumes to keep only latest available metrics so it never runs out of space. The metric structure in Akumuli is identical to Prometheus: numeric value, time stamp and labels.

Time series databases should be simple and fast. Thus, analysis in real time (continuous queries, etc.) for time series in Akumuli is performed by Sonar. Results of every query executed by Sonar on periodic basis can be written to Akumuli, exposed to Prometheus or sent to other destinations. Configuration of Sonar continuous queries can be changed at runtime without impact on time series database.

This offers good separation between time series database, metric collection agents and real time processing jobs for gathering and analysis of collected data. Thus, Sonar can be used as monitoring agent and analysis engine with Akumuli time series database.

Using Akumuli and Sonar together offers the following:

Gather metrics that require high time resolution, including signals (IoT), events and even snapshots on different levels: application, host.
Lower resource consumption for using either push or pull. Sonar supports sending metrics to Akumuli via TCP or UDP.
Analyze metrics with high time resolution on application or host level for early anomaly detection and alerting.
Reduce metrics volume by downsampling and exposing results to cluster-level: use Prometheus when needed.
Flexibility to configure label and re-label metrics during collection and analysis (downsampling, etc.).

In other words, using Akumuli as time series database and Sonar as monitoring agent and processing engine together can simplify metric collection, sampling, re-labeling and exposing collected time series to other destiations, including Prometheus. This opens many new scenarios, including ones for Kubernetes:

IoT type metrics can be collected, merged and downsampled in environments that require low latency for ingesting large number of highly frequent events. Manufacturing and robotics are just some of such examples.
Scaling modernized Windows applications on Kubernetes. This can be done right now, by deploying Sonar agent on Windows container to expose custom metrics with or without downsampling. Thus, metrics can be gathered from performance counters, WMI, SQL Server, MySQL and extended to other sources.
Supporting federation: using processing engine allows downsampling metrics and sending them to another instance of time series database or exposing to Prometheus. This is determined by configuration that you can change at any time. Thus, federation from one instance of Akumuli to another is now possible.
Easy to backup and restore time series data. Akumuli uses very few files as volumes in comparison to Prometheus.
Low cost of change at runtime with flexible selection of metric labels, preserving series original time stamps. Similar to Prometheus, Sonar supports re-labeling data original time series for continuous queries.
Using anomaly detection for original time series on application/host level to predict or determine root cause of problem. Not all of these metrics may be suitable for exposing to Prometheus.

Consider simple example: order processing. Each order with authorized payment needs to create shipping. The back-end service for shipping has to be scaled on demand to avoid consuming resources without large number of orders. The information about orders pending for shipping is stored in the database and can be used to scale this pod. Thus, the goal is to find the way to scale a pod based on orders data in the database. Below are the steps to accomplish this:

Prerequisites

To enable exposing custom metrics from Prometheus to HPA on Kubernetes, follow the instructions described this repository on GitHub. After deploying metrics server, custom metrics API, Prometheus and running example to scale, the below steps show how to do expose order processing custom metric to HPA with downsampling.

Step 1: Create MySQL database and Orders table

Provisioning MySQL on Kubernetes with Helm can be accomplished using the following command:

helm install stable/mysql --name mysql01  --set mysqlRootPassword=Pass@word1

Next, create shipping database and orders table:

mysql -u root -p
mysql> source create-tables.mysql

The script is very simple:

CREATE DATABASE IF NOT EXISTS `orders_shipping`;
USE `orders_shipping`;
DROP TABLE IF EXISTS `orders`;
CREATE TABLE orders (
 `id` MEDIUMINT NOT NULL AUTO_INCREMENT,
        `tid` VARCHAR(32) NOT NULL,
  `state` VARCHAR(32) NOT NULL,
 `createdAt` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
 PRIMARY KEY(`id`),
 UNIQUE KEY(`tid`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;USE `orders_shipping`;
INSERT INTO orders(tid,state) VALUES('12345','Pending');

At this point, new orders can be emulated by simple insert SQL statement that with data to be used later for HPA:

INSERT INTO orders(tid,state) VALUES('12345','Pending');
Query OK, 1 row affected (0.05 sec)

Step 2: Deploy Akumuli time series database to k8s

This example will use Akumuli time series database in addition to Prometheus( see prerequisites). Clone charts repository available on GitHub. Assuming you created directory with local path ~/github.com/infragravity/charts, Akumuli can be deployed using below command:

helm install ~/github.com/infragravity/charts/stable/akumuli --name=aku --set image.repository=akumuli/akumuli,image.tag=skylake

The number of volumes and their size for Akumuli storage volume can be set in command line or values.yml, included in this chart.

Step 3: Configure and Deploy Sonar monitoring agent to k8s

First, let’s choose query for polling for orders in pending state within a specific time interval:

select count(*) as pending_orders_total from orders_shipping.orders where state='Pending' and timestampdiff(MINUTE,createdAt,NOW())>1;

Using this query, Sonar agent can be configured to use MySQL input adapter as shown in orders.config file, included as part of this Helm chart:

<?xml version="1.0"?>
<configuration>
  <configSections>
    <section name="Sonar" type="Infragravity.Sonar.SonarConfigurationSection, Sonar"/>
  </configSections>
  <connectionStrings>
    <add name="akumulidb" providerName="akumuli" connectionString="Data Source = tcp://aku-akumuli:8282;Initial Catalog=main;User Id =; Password =; Application Name = default;Max Pool Size=200;Packet Size=2048;Connection Timeout=100"/>
    <add name="mysqldb" providerName="mysql" connectionString="Server=mysql01-mysql;Database=orders_shipping; User Id=root; Password={$mysql_password};Encrypt=false;" />
  </connectionStrings>
  <Sonar>
    <Runtime scrapeIntervalSeconds="5" skipSSLCheck="true" threads="1"/>
    <InputAdapters>
        <add provider="mysql" type="Samples.Sonar.Adapters.MySql.MySqlAdapterFactory" path="Samples.Sonar.Adapters.MySql.dll" />
    </InputAdapters>
    <Schedules> 
        <add name="m01" query="orders_pending" input="mysqldb" intervalSeconds="10" output="akumulidb" />
    </Schedules>
    <Servers>
    </Servers>   
    <Queries>
        <add name="orders_pending" type="sql"
        filter="SELECT count(*) as total from orders_shipping.orders where state='Pending' and timestampdiff(MINUTE,createdAt,NOW())>1;">
                <Tags>                   
                    <add name="type" value="orders" readonly="true" />
                </Tags>
                <Instances>              
                </Instances>
                <Values>
                </Values>
        </add>                
    </Queries>
  </Sonar>
 </configuration>

Next, use Helm chart to deploy Sonar agent with custom MySQL database adapter:

helm install ~/github.com/infragravity/charts/stable/sonar --name orders-agent --set image.repo=infragravity/sample-mysql,image.tag=latest,config.name=samples/custom-metrics/orders.config,config.log_level=Debug

Finally you check out pod log to verify it is able to query data from the database and send result to Akumuli time series database.

Step 4: Configure and Deploy Sonar runtime to k8s

Now that we have data collected in Akumuli, the next step is to deploy continuous query for downsampling and exposing data to Prometheus. After that, custom metrics API service will be able to access metrics for orders needed for auto scaling. To accomplish this, use configuration file for continuous query orders-cq.config:

<?xml version="1.0"?>
<configuration>
  <configSections>
    <section name="Sonar" type="Infragravity.Sonar.SonarConfigurationSection, Sonar"/>
  </configSections>
  <connectionStrings>
    <add name="input-akumuli-http" providerName="akumuli-http-receive" connectionString="Server=http://aku-akumuli/api/query; Connect Timeout=5;" />
  </connectionStrings>
  <Sonar>
    <Runtime scrapeIntervalSeconds="5" skipSSLCheck="true" threads="1"/>
    <InputAdapters>
        <add provider="akumuli-http-receive" type="Infragravity.Sonar.Adapters.Akumuli.Http.InputAdapterFactory,Infragravity.Sonar.Adapters.Akumuli.Http" />
    </InputAdapters>
    <Schedules> 
        <add name="a01" query="aku-test" input="input-akumuli-http" intervalSeconds="20" />
    </Schedules>
    <Servers>
    </Servers>   
    <Queries>
        <add name="aku-test" type="raw" timestamp="ts"
        filter="{ 'group-aggregate':
                    {
                    'metric': 'orders-pending-total',
                    'step': '20s',
                    'func': ['mean','min','max']
                    },
                    'range': { 'from': 'timeshift(20s)','to': 'timeshift(0s)'},
                    'output': { 'format': 'csv','timestamp': 'raw'  },
                    'limit' : '100',
                    'order-by':'series',
                    'apply':[] }">
            <Tags>                   
               <add name="namespace" value="default" readonly="true" />
               <add name="deployment" value="podinfo" readonly="true" />
            </Tags>
            <Values>
            </Values>
            <Labels>
                <add name="source" regex="type=(.+)" targetLabel="metric" replacement="$2"/>
            </Labels>
        </add>
    </Queries>
  </Sonar>
 </configuration>

As you can see, query fetches downsampled value of metric from Akumuli, using timeshift() function in Sonar. The value “ts” configured for continuous query allows preserve timestamps from Akumuli time series database. Next, it applies tags to result time series for associating them with an artifact in Kubernetes using deployment and namespace tags. In this example collected metrics will be related to deployment named “podinfo” in default namespace . After results are labeled, metrics are exposed to Prometheus, which discovers Sonar target endpoint using annotations.

Next, deploy Sonar for running continuous query:

helm install ~/github.com/infragravity/charts/stable/sonar \
 --name orders-cq \
 --set image.repo=infragravity/sonar,image.tag=edge,config.name=samples/custom-metrics/orders-cq.config,config.log_level=Debug

Once deployed, verify that custom metrics API can query this metric from Prometheus after a minute or less:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/deployments.extensions/podinfo/akutest_orders_pending_total_mean" | jq .

The result should be similar to below, showing just one order in pending state:

{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/deployments.extensions/podinfo/akutest_orders_pending_total_mean"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Deployment",
        "name": "podinfo",
        "apiVersion": "extensions/__internal"
      },
      "metricName": "akutest_orders_pending_total_mean",
      "timestamp": "2018-02-05T18:53:45Z",
      "value": "1"
    }
  ]
}

Step 5: Create autoscaling policy for HPA on k8s

As of now metric is exposed, and we need to create policy for autoscaler wth the name of deployment:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: orders-shipping
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: podinfo
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Object
    object:
      target:
        kind: Deployment
        name: podinfo
      metricName: akutest_orders_pending_total_mean
      targetValue: 4

As you can see, the custom metric can trigger scaling deployment using metadata in Sonar continuous query. Next, deploy the policy

kubectl create -f ./custom-metrics-hpa.yml

In a minute or so you will see HPA recognizing metric:

>kubectl get hpa     
NAME              REFERENCE            TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
orders-shipping   Deployment/podinfo   1 / 4       1         2         2          1m

Step 5: Emulate pending orders and scale

At this point all is needed is emulate orders that can be done by modifying orders table in MySQL database. By adding 5 more orders, observe HPA status for scaling policy created in the earlier step:

kubectl describe hpa
Name:                                                         orders-shipping
Namespace:                                                    default
Labels:                                                       <none>
Annotations:                                                  <none>
CreationTimestamp:                                            Mon, 05 Feb 2018 11:47:27 -0800
Reference:                                                    Deployment/podinfo
Metrics:                                                      ( current / target )
  "akutest_orders_pending_total_mean" on Deployment/podinfo:  6 / 4
Min replicas:                                                 1
Max replicas:                                                 4
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 3
  ScalingActive   True    ValidMetricFound    the HPA was able to succesfully calculate a replica count from Deployment metric akutest_orders_pending_total_mean
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  23s   horizontal-pod-autoscaler  New size: 3; reason: Deployment metric akutest_orders_pending_total_mean above target

Done.

kubectl describe hpa
Name:                                                         orders-shipping
Namespace:                                                    default
Labels:                                                       <none>
Annotations:                                                  <none>
CreationTimestamp:                                            Mon, 05 Feb 2018 11:47:27 -0800
Reference:                                                    Deployment/podinfo
Metrics:                                                      ( current / target )
  "akutest_orders_pending_total_mean" on Deployment/podinfo:  6 / 4
Min replicas:                                                 1
Max replicas:                                                 4
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  the last scale time was sufficiently old as to warrant a new scale
  ScalingActive   True    ValidMetricFound  the HPA was able to succesfully calculate a replica count from Deployment metric akutest_orders_pending_total_mean
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
Events:
  Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Normal   SuccessfulRescale             11m                 horizontal-pod-autoscaler  New size: 3; reason: Deployment metric akutest_orders_pending_total_mean above target
  Normal   SuccessfulRescale             7m                  horizontal-pod-autoscaler  New size: 4; reason: Deployment metric akutest_orders_pending_total_mean above target

At this point, number of order records can be reduced for scaling deployment down.

Summary

First, credit goes to the great work done to enable custom metrics from Prometheus, described in prerequisites).

The example above could be even more compact by using one Sonar deployment for gathering data from MySQL database, continuous query for down sampling and exposing data to Prometheus. Also, metrics that can change infrequently do not need downsampling and can be gathered by Sonar and exposed to Prometheus.

While this example above is using pull method, you can also use push when scaling needs to be based on events from order processing. This allows to achieve symmetry for many types of application level metrics that require events and high time resolution. An example for demonstrating push using same topology will be discussed in the future posts.