Observability strategies to not overload engineering teams — OpenTelemetry Strategy

OpenTelemetry provides capabilities to democratize observability data and empowers engineering teams.

Nicolas Takashi
ITNEXT
Published in
5 min readNov 15, 2022

One of the strategies I mentioned in my first post, Observability strategies to not overload engineering teams, is leveraging the OpenTelemetry auto instrumentation approach, to help us achieve observability without requesting engineering efforts.

OpenTelemetry Logo

Today, I'll show you how to collect metrics and traces from a python service without code changes.

To keep the content cleaner, I will leave some settings like the Prometheus configuration file.

What is OpenTelemetry Auto Instrumentation

OpenTelemetry provides an auto-instrumentation feature, that aims to act as an agent collecting telemetry data from a non-instrumented application.

This is what most of the O11y vendors such as New Relic and Data Dog does to collect telemetry data and push it to their platforms, this is a valuable feature because engineering teams can achieve observability with zero instrumentation effort.

Demo application

To help us achieve what we need to understand how the auto instrument works, I've created the following simple python service.

Python Service

Docker Image

This Docker image has all the required dependencies including OpenTelemetry auto instrumentation packages.

Dockerfile

Before we move forward, I would like to highlight the cmd entry, where we're decorating the python server.py command with the opentelemetry-instrument command, this is what is going to do the auto instrumentation work, we don't need to change anything else on the application side to collect telemetry data.

Service Infrastructure

Using the following configuration, let’s build the docker image we created above.

The opentelemetry-instrument accepts a couple of flags or environment variables, that allow you to configure protocols and also properties, for more information about the available environment variables, please check this link.

service docker-compose

Above, we're configuring the traces exporter to use the otel format, setting the service name that will be present on the traces as server , and providing the opentelemetry-collector endpoint.

Observability Infrastructure

Now let's add the building blocks to provide the observability tooling infrastructure.

OpenTelemetry Collector

The Opentelemetry collector is the component that will receive, process, and also export the telemetry data produced by the python application to the backends such as Prometheus and Jaeger.

OpenTelemetry Collector Configuration

The configuration below will receive the traces produced by the application, and then process all spans to export them to jaeger.

Opentelemetry Collector Configuration

Since we have all the spans passing through the collector, we can leverage a processor called spanmetrics that exposes metrics about the number of calls and also latency for every operation using the Prometheus standard.

This approach helps us to generate metrics based on spans and have two different telemetry data out of the box.

Docker-Compose file

Now we have the OpenTelemetry Collector configuration, we can spin up Jaeger and Prometheus, using the following configuration

docker-compose.yaml

I just would like to highlight the environmental variables in the jaeger service, since we have the spanmetrics processor active on the OpenTelemetry Collector, we can leverage the SPM feature from jaeger, for more information please check this link.

The Final Result

It’s time to see the outcome; all of those configurations will support us in collecting telemetry data that will be useful for the entire company to start adopting observability without requiring engineering efforts.

HTTP Load Testing

To see properly the telemetry data, I'll create a small load on the service using Vegeta.

HTTP load test

Jaeger Tracing

On the Tracing view, we can track the flow of requests across your platform, and gather useful data that will assist teams to understand performance issues as well as complex distributed architecture.

Jaeger SPM

The Jaeger Service Performance Monitor provides a service-level aggregation, as well as an operation-level aggregation within the service, of Request rates, Error rates, and Durations (P95, P75, and P50), also known as RED metrics.

Jaeger SPM

This tab is filled with the information created by the span metrics processor on the Opentelemetry collector, and this is also available in Prometheus as we can see below.

Prometheus Metrics

Prometheus UI

As described above, the spanmetrics the processor creates two metrics such as calls_total and latency_bucket

Conclusion

This is a very simple example, and the main idea is to provide insights into what type of telemetry data could be collected using OpenTelemetry auto instrumentation.

The code is available on my GitHub account, feel free to look at and explore it by running it in your local environment.

Let me know if you’re leveraging this on your company to collect telemetry data or aim to use it.

Thanks 😃

Support me keeping creating nice content

Sign up to discover human stories that deepen your understanding of the world.

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Written by Nicolas Takashi

I love to speak, teach, and write about distributed systems, cloud computing, architecture, systems engineering, and APIs.

No responses yet

What are your thoughts?