Ephemeral Jobs Monitoring Using Prometheus PushGateway

Ramesh Lingappan
ITNEXT
Published in
6 min readJul 14, 2020

--

Prometheus Architecture

Within the software systems, most often than not we wanted to spin up some temporary services or jobs that can be terminated as soon as it performed a specific task. For example, if we want to send out user notifications everyday morning, then we don't want that service to be running all day long, rather a service can spin up at a specific time, perform the task, and shutdown thereby effectively saving resources & cost.

As with any other long-running services, these ephemeral (or in Kubernetes term Batch or Cron) Jobs also needs to be monitored to gather critical metrics and for Altering when something goes wrong.

This article describes how we can monitor these short-lived jobs by explaining

  • What is the problem with Prometheus
  • What is PushGateway & why it is necessary
  • Demo using a sample application

Prometheus Architecture:

A short introduction about Prometheus if you haven’t known it already,

Prometheus is a Open-source metrics-based monitoring system that can record Multidimensional time-series data. It supports powerful querying, dashboard, and Alerting. Offers several integrations to connect to a variety of systems.

How does it work?

Prometheus uses a Pull model (also called Scraping) to collect metrics, meaning the Prometheus server will reach out to specified services by calling their configured HTTP endpoint to pull those metrics.

For example, the following configuration defined in prometheus.yml file tells the Prometheus servers to fetch metrics every 5s on the specified endpoint

scrape_configs:- job_name: 'auth_server'scrape_interval: 5sstatic_configs:- targets: ['auth.server.com:8080/metrics']

Problem?

Scarping is good for long-running services since those services will be available for a long time for the Prometheus servers to make a request and collect their metrics. This architecture is good, it helps to achieve a few other benefits such as health checks…

--

--