Monitoring your docker container’s logs the Loki way

Yashvardhan Kukreja
ITNEXT
Published in
8 min readDec 17, 2020

--

Hi everyone. Recently, one of my friends stumbled across a task of monitoring a docker container’s logs with convenience.

Now, of course, the docker logs command can be used to track any container’s logs but not monitor them. What I mean by monitoring is, say, visualising the number of stderr’s happening in the application in your container per hour on a line chart or any other kind of visualisation with the help of your logs.

So, in this blog, I am going to take you through the journey of doing the above task with ease.

https://grafana.com/img/home_panel_collage1.png

Let’s first drill down the problem

Our container is throwing its logs somewhere, like, in some sort of a .log file. All we need is a tool which can collect and aggregate those logs and query those logs to find insights like the number of 4xx’s or the number of “err” logs, etc.

And another tool to take those numbers and visualise them in a graph or chart.

Where can we find our container’s logs?

Every container’s logs are getting stored somewhere inside it and as any container is, basically, a process running over its host, there must be some place in that host where that container’s logs must be getting stored.

And, well, that place is — /var/lib/docker/<container-id>/<container-id>-json.log

where, the <container-id> is the ID of the container whose logs you want.

So, all the STDOUT/STDERR logs which we want to track will be found here.

Let’s explore the tools which we’ll use

Loki

It’s a fantastic log aggregator developed by Grafana Labs. If you are aware of Prometheus, then think of Loki as Prometheus but just for logs (instead of metrics).

https://grafana.com/docs/loki/latest/logo_and_name.png

You can query a massive ton and variety of logs on Loki with great performance considering the fact that it doesn’t index the entire contents of log streams fed to it but instead, it indexes the logs by just a bunch of labels fed to it by promtail. Speaking of which…

Promtail

It is an agent (another one by the Grafana folks) which will read up the contents of the log file/files and ship those logs to a Loki instance. Also, with promtail, you can further add labels to different kinds of logs.

For example, if you are making promtail read ten *.log files, then, you can make promtail label them with some identifier/label so that, in the future, when you monitor and query them, it would be convenient for you and performant on Loki’s side, as the logs would be indexed on those labels.

If you feel baffled, no worries. I’ll walk you through this whole thing to give you a clearer idea.

Grafana

It is a tool for visualising the data sitting in different kinds of data stores. You can visualise and monitor data sitting in Prometheus, Cloudwatch, InfluxDB and Loki too (and many other sources).

We will be creating a dashboard of graphs/charts/panels (whatever you call them) to “monitor” the logs of our docker container.

https://stitch-microverse.s3.amazonaws.com/uploads/domains/grafana-logo.png

A small heads up for the people who might be thinking, “why is this guy going to use promtail? He could have simply done this task by using loki’s docker driver client”. Thanks for your concern :P and yes, I am very well aware about utilising loki’s docker driver. Still, I am going for the longer route of using promtail here because with this article I also intend to convey the reader how to setup and configure promtail and ship logs with it :D

Still, if you wish to delve into the rabbit hole of loki docker driver, do check this out :)

Rest, I am looking forward to writing an article on implement our task using loki-docker-driver too :D

Installation

Loki

Promtail

Grafana

Setting up stuff

Loki

Create the loki config file with the name loki-config.yaml

Dont worry, you don’t need to understand this YAML. These are just some internal configurations for Loki and this is provided readymade by the Grafana folks.

Now, run loki

loki -config.file loki-config.yaml

Minimize this terminal and open another terminal for other stuff like promtail.

Promtail

We have to first write a config file which will tell our promtail process details about which log file to read, which labels to attach to the logs, where is the loki server to which these logs are supposed to be thrown, etc.

So, in our case, I have a container

► docker container ps
CONTAINER ID IMAGE
561b53013031 chentex/random-logger:latest

Using this container ID, we can find the exact location of its logs on the host.

And that is

/var/lib/docker/containers/561b53013031*/561b53013031*.log

Why the wildcard (*) along with the container id?

The container ID above is not the full container ID. It is truncated to fit to the display. But we do not need the full container id. So, by writing 561b53013031* , we are telling our computer to find the container starting with 561b53013031*

Now, on the basis of the above information, we will write the config for our YAML (promtail-config.yaml)

Now, run promtail with sudo (because the log file it’s trying to scrape requires sudo privileges to access)

sudo promtail -config.file promtail-config.yaml

Grafana

  • Start grafana by entering the following command
sudo systemctl start grafana-server 
  • This will start grafana at http://localhost:3000
  • Go to that address and login with the username “admin” and password “admin”
  • And you’ll see this
  • Now, take your cursor to the navigation drawer on the left, and hover it over the gear icon (second last one) and click on data sources.
  • Click on “Add data source” and search for Loki and Click on it.
  • A form will open up and just enter the Name as “Loki” (or whatever)and URL as “http://localhost:3100” (because that’s the address at which our loki server is running)
  • Then, click on “Save and Test”. That’s how it should look

That’s it with the setup. I hope you are still with me xD

Don’t worry 95% of the job is done :P

Finally, the fun part — Visualizations

We will create a dashboard which will contain multiple graphs like,

  • a line chart denoting the trend of the count of “INFO” and “DEBUG” logs of that container.
  • a line chart denoting the trend of the 5 minute rate of total number of logs of that container.
  • and some other stuff :P

In the above screenshot, go to the “+” icon on the left drawer, and click on “Dashboard”

Now, click on the “Add new panel” button. That will open the console for designing one panel.

Panel is, basically, a graph or chart

First panel — Log counts visualization

We will make this panel to show a line chart denoting the trend of the count of “INFO” and “DEBUG” and “ERROR” level logs of our container.

Here’s a query for find “INFO” level logs of our container. Add to the query field under the panel window.

count_over_time( {job="my-container"} |~ "INFO" [5m])

Remember our promtail configuration. We labelled all our container logs with job: my-container and another label host: localhost . So, the {job="my-container"} is telling loki to query only the logs which correspond to the above label, and that corresponds to all the logs of our container.

And the |~ “INFO” means that from all the above logs, filter only those log lines, which have the word “INFO” in them.

Finally, the count_over_time and [5m] are telling loki the fetch the count of the above “INFO” level logs in the last 5 minutes of any instant. So, if I execute this query, at 10:00PM, it will return me the count of “INFO” level logs streamed from 9:55PM to 10:00PM (last 5 minutes).

Click on the “Add query” button and similarly, paste the following query for fetching the 5 minute count of “DEBUG” level logs.

count_over_time( {job="my-container"} |~ "DEBUG" [5m])

Click on the “Add query” button and similarly, paste the following query for fetching the 5 minute count of “DEBUG” level logs.

count_over_time( {job="my-container"} |~ "ERROR" [5m])

That’s how it will look

Edit the panel title (if you want to) and click on “Apply” on top-right.

Dashboard, after applying the above panel

Second panel — 10-minute-rate of total logs

This panel will display the 10-minute-rate of the count of total logs being generated by the container.

Click on this button (top-right) to create a new panel in the same dashboard and then, click on “Add New Panel”

Use the following query to fetching the 10-minute-rate of total logs.

rate( {job="my-container"} [1m] )*10*60

Here, the {job="my-container"} is fetching all the logs of the container.

rate and [1m] are fetching the “per-second” rate of the number of the logs ingested over the last 1 minute. For example, let’s say, from 9:59PM to 10:00PM (last 1 minute from 10:00PM), 120 logs entries were added. So, rate and [1m] would give the per-second rate = 120/60 = 2

Finally, the 10*60 for converting the above “per-second” rate to “10-minute-rate” (10 minutes = 10*60 seconds)

Edit the panel title and apply it.

You can even create a log panel which will actually show the logs (text ones) on that panel window.

For example, the following query would tail the “ERROR” logs

{job="my-container"} |~ "ERROR"

How does the dashboard look in the end

That’s it, I guess

Thanks a lot if you reached till here.

I hope you liked this article and understood how to monitor a container’s logs with Loki/Promtail/Grafana. Also, as I mentioned, I am looking forward to writing an article on accomplishing the same above with more convenience by utilising loki-docker-driver client. So, stay tuned for that :)

If you have any doubts or other questions, shoot them over in the comments below or we can connect too:

Twitter — https://twitter.com/yashkukreja98

LinkedIn — https://www.linkedin.com/in/yashvardhan-kukreja

Adios!

--

--

Software Engineer @ Red Hat | Masters @ University of Waterloo | Contributing to Openshift Backend and cloud-native OSS