Technical Tutorial

Run Asynchronous Tasks in New Kubernetes Pods with NodeJS

Learn how to launch new Kubernetes pods to run a resource-intensive task asynchronously outside the API process using NodeJS.

Alexandre Olive

Published in

ITNEXT

11 min readOct 16, 2023

When you start creating APIs as a junior developer, you build everything synchronously. You don’t think in advance that the data in production is much bigger, that the forEach on your test data will behave differently on a million lines. If you’re lucky, everything breaks when you release your code. If you’re unlucky, it will take days of users complaining of random crashes before you find the culprit piece of code that overloads your API.

I know that because I’ve been through it enough to learn the lesson — especially when your company at the time has a three-week delay on release, but that’s another subject. I still get “javascript heap out of memory” nightmares occasionally.

Then, you grow as a developer. You learn about asynchronous design with message queues and workers. Your user-facing APIs don’t have to crash when too many resource-intensive tasks are triggered simultaneously.

Thanks to my current job, I discovered the next level of asynchronous setup we will build in the article: how to use new Kubernetes pods to do resource-intensive tasks asynchronously outside the current API. The goal is to prevent overloading the user-facing endpoint and cause slowness in your application.

This setup will be overkill for most applications; it depends on your use cases. For example, you might need it if you want to run that process in a particular kind of node, like a GPU-optimized one.

If you want to read more about the architecture around my company’s use case using this system design, check out the link below.

Video Streaming at Scale with Kubernetes and RabbitMQ

Deep dive into the problems video streaming sites face and how they can architect their infrastructure to manage the…

alexandreolive.medium.com

What we are going to build.

If I wanted to build the whole system design production ready in one article, it would be a thirty-minute read minimum, with too many concepts to introduce simultaneously.

So, in this article, we will set up a Kubernetes cluster locally, in which we will launch pods from a NodeJS API. The pods will be Docker containers running NodeJS scripts that will wait a fixed amount of time to mimic a lengthy task and then call the API back to update the task’s status.

It might be scary when I say it like that, but you don’t need advanced knowledge of those subjects to follow along with me.

A schema representing the application we are going to build. The user calls an API with a resource-intensive task, and the API will then use it’s connection to the Kubernetes cluster to start new pods that will run this task. When finished, the pod will then call back the API — Custom schema with an API launching new Kubernetes pods on demand

The API is purposely outside of the cluster in this schema, as I don’t want to get into Kubernetes connectivity with services or ingress just yet.

There also is no queue system on purpose to make it easier to write in a single article. In a typical setup, the API will not launch the pod itself; it will add a message in a queue, and the queue’s worker will launch the new pod.

I will write a follow-up article where I implement the API inside Kubernetes. If you’re a YAML enjoyer, Kubernetes initiated, and want production-ready implementation; it will be made for you. (I’ll update this post with the link once it’s online).

We need a Kubernetes cluster.

Running a Kubernetes cluster in the cloud costs money, and I don’t know you, but I don’t like to spend money. Thanks to the awesome creators of Kubernetes, you can run your own Kubernetes cluster locally with Minikube.

The installation process is super easy. Make sure you select the correct operating system and architecture (you also need Docker installed).

Stop at the “3 — Interact with your cluster” step. You are good to go if you can run the command kubectl get po.

That’s it for this article; it’s that easy. We got the Kubernetes cluster setup out of the way; we can now get to the actual coding.

An image of the word “simple” written with scrabble letters. — Photo by Pablo Arroyo on Unsplash

Create the asynchronous process.

Before getting into the API creation, we must build a Docker container as our resource-intensive task.

It will be composed of 2 files: first will be a javascript file that runs the actual process, and the other will be the Dockerfile to create the docker container.

Create a file called index.mjs

const main = async () => {
    console.log('Starting process...');

    // Awaits fifty seconds before proceeding.
    await new Promise(resolve => setTimeout(resolve, 50000));
    try {
        // Makes a HTTP POST to our webhook environment variable
        await fetch(process.env.WEBHOOK, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                // Sends back the TASK_ID to the API that we passed as an environment variable.
                taskId: process.env.TASK_ID,
            }),
        });
    } catch (err) {
        console.error(err);
    }
    console.log('Process finished.');
    process.exit(0);
}

main();

Let’s break it down.

await new Promise(resolve => setTimeout(resolve, 50000)); wait fifty seconds before it continues to simulate a compute-intensive task.
await fetch(process.env.WEBHOOK, ... makes an HTTP POST call to our API—the process.env.WEBHOOK variable is an environment variable containing our API endpoint. It will be added to the pod when we launch it, so the API that launches it decides which endpoint is called when the process is finished. Same for the process.env.TASK_ID variable.
process.exit(0); kill the container at the end.

Create the Dockerfile

FROM node:20-alpine

# Create working directory
WORKDIR /app/

# Copy the javascript file in the docker container
COPY index.mjs /app/

# Execute app inside the container
CMD [ "node", "index.mjs" ]

To build a container, we need a Dockerfile. This simple Dockerfile copies and runs the javascript file we created inside the container made of a node:20-alpine image.

If you want to use your image when we launch it in the Kubernetes pod, you can build and push this Dockerfile by following the readme.md file I wrote here. If not, I’ll provide a working Docker image of the process later in the article.

Let’s launch pods from an API.

Let’s get to the good part: we want to launch a pod from an API.

We will use ExpressJS as it’s lightweight, easy to use, and easy to understand.

Let’s initiate a new project with npm initinside a new folder. You can press enter for every question; it does not matter for our test.

You can add "type": "module" inside the package.json to use module type imports. That’s how it’s done nowadays; CommonJS days are coming to END! Where are you hiding developers still using CommonJS ?! Sorry, I lost control; let’s get back to the tutorial.

You must also add a new script in the package.json to start the API (if you are using an older node version, remove — watch).

{
  ...
  "scripts": {
    ...;
    "start": "node --watch index.mjs"
  },
  ...
}

We need four packages for our API to work:

Express: framework we use for our API setup.
@kubernetes/client-node: Kubernetes library to interact with our cluster through code
body-parser: Node.js body-parsing middleware
uuid: a library that generates uuid

We can install them with:

npm i @kubernetes/client-node express uuid body-parser

Create a file called index.mjs

Next to your package.json, you can create a file called index.mjs. You can add the following code inside. I won’t dive into it, as it’s just the default setup for Express. Here is a get-started-with-express article if you want.

import express from 'express';
import bodyParser from 'body-parser';
import fs from 'fs';

const app = express();
app.use(bodyParser.urlencoded({ extended: false }))
app.use(bodyParser.json())

// THIS IS WHERE OUR NEXT CODE WILL GO !

app.listen(3000, () => {
    console.log(`Example app listening on port ${port}`)
})

We will need three endpoints:

POST /start: it will launch the process in a Kubernetes pod.
POST /webhook: to get the validation from the pod that the operation ended.
GET /:taskId: to check the current status of the process using a generated ID.

Connect to the Kubernetes cluster

To connect to the cluster we will use the @kubernetes/client-node library.

import k8s from '@kubernetes/client-node';

const kc = new k8s.KubeConfig();
kc.loadFromDefault();

const k8sApi = kc.makeApiClient(k8s.CoreV1Api);

If you have only minikube installed locally, using kc.loadFromDefault() is enough to connect—no need to pass a cluster configuration. The k8sApi variable will allow us to access functions that interact with the cluster.

Create a new pod endpoint.

Alright, we’ve reached the most complicated part of the article. Before jumping into the code, let’s list all the information we need to launch our pod.

Every pod in Kubernetes needs a name; that’s the rule. We will name it asynchronous-process-${uuid()}. That’s a weird name, I know. It sounds like Elon Musk’s child's name, like a younger brother of X Æ A-Xii. uuid()will be imported from the library uuid to generate a unique ID.
Every pod needs a container image as well. You can use your own if you built it earlier or the one I provide: alexandreolivepro/fake-js-process-fifty-seconds:final
As we saw before, we must pass two environment variables to our pods. The pod name will act as our task ID and the webhook endpoint of our API to update the task status.

const createNewPod = async (podName) => {
  const { body: pod } = await k8sApi.createNamespacedPod('default', {
      metadata: {
          name: podName,
      },
      spec: {
          restartPolicy: 'Never',
          hostAliases: [
              {
                  ip: '192.168.65.2', // THIS NEEDS TO BE REPLACED BY YOUR OWN IP
                  hostnames: ['host.minikube.internal'],
              }
          ],
          containers: [{
              name: podName,
              image: "alexandreolivepro/fake-js-process-fifty-seconds:final",
              env: [
                  { name: 'TASK_ID', value: podName },
                  { name: 'WEBHOOK', value: `http://host.minikube.internal:3000/webhook` },
              ],
          }],
      },
  });
  return pod;
}

Here, you can find the code for the function createNewPodthat will create the pod — who would have thought with that name?

We call the function createNamespacePod, with the first parameter being the namespace. Namespaces are Kubernetes way of isolating groups of resources; we use the default one here to keep it simple.

Since our API is running outside of the Kubernetes cluster, we need to add a host alias for our pods to be able to call the API from inside the cluster. If we tried to call localhost from inside the pod, it would be Minikube’s localhost and not our local machine.

To find your machine's IP address, you can run minikube ssh grep host.minikube.internal /etc/hosts | cut -f1 and replace mine I set in the code.

Finally, let’s wrap our createNewPod function in an Express endpoint definition to be able to call it.

app.post('/start', async (req, res) => {
    try {
        const podName = `asynchronous-pod-${uuid()}`;

        await createNewPod(podName);
 
        // WE CREATE A JSON FILE TO STORE THE TASK STATUS ON THE SYSTEM
        fs.writeFileSync(`./${podName}.json`, JSON.stringify({ status: 'START' }));
        
        res.status(200).json({
            name: pod.metadata.name,
            namespace: pod.metadata.namespace,
            uid: pod.metadata.uid,
        });
    } catch (err) {
        res.status(500).json(err);
    };
});

We also create a JSON file to store the task's status on the file system. The webhook will update it when the asynchronous process ends. In an actual setup, this file would be a database.

This part was getting long, and I thought everyone who reached this point needed a little “cat” break — isn’t he cute?

We are now at a point where we can create a pod through the API. But before we go and try it out, I’d like first to implement the endpoint to retrieve the pod's status. It’s straightforward and will help us check that our pod creation worked without running Kubernetes CLI commands.

app.get('/:taskId', async (req, res) => {
    // Get the pods information from the cluster using the pod name.
    const { body: pod } = await k8sApi.readNamespacedPod(req.params.taskId, 'default');
    
    // Also retrieve the JSON file we stored in the file system for our task.
    const data = fs.readFileSync(`./${req.params.taskId}.json`, { encoding: 'utf-8' });
    
    // Return data from the JSON file + pod status information. 
    res.json({ data: JSON.parse(data), pod: { status: pod.status.phase, startTime: pod.status.startTime } });
});

First, we retrieve the pod in the cluster using the pod name, which is the task ID (remember Elon Musk’s child from earlier ?) that we get from the URL param. Using the same name, we retrieve our JSON file created when we started the pod and send back both information as the API result.

Okay, let’s try it out now. Launch your API with npm run start and use your favorite API testing tool to do a simple POST on /start. For me, this tool is still Postman. It should return our taskId, namespace, and uid. Which means it started successfully.

A screenshot from the software “Postman” showing a result from our endpoint “/start” — Postman screenshot with the /start endpoint result

If you take the taskIdfrom the body’s response and do a GET on /:taskId

A screenshot from the software “Postman” showing the result from our endpoint “/:taskId” with the pod information started. — Postman screenshot with the /:taskId endpoint result

We get our status entered manually, plus the technical pod status information. Hell yeah!

The issue is that the process will fail because we have yet to initiate the webhook endpoint.

Final webhook endpoint

We have a final endpoint to add. The one that the process will call once it’s finished to update our fake JSON database.

// Create a POST /webhook endpoint for our api
app.post('/webhook', async (req, res) => {
    try {
        // Build a result object that will be stored in our JSON file.
        const body = {
            status: 'FINISHED',
            message: 'Process finished successfully.',
            endDate: new Date().toISOString(),
        };
        // Write the object into the JSON file in the file system. 
        fs.writeFileSync(`./${req.body.taskId}.json`, JSON.stringify(body));
        res.status(200).json(body)
    } catch (err) {
        res.status(500).json(err);
    };
})

There’s not much to say here; we’ve seen most of the code used already. We are just using the taskId sent in the body to update the JSON file with a new status.

If you retry the process, POST /start to create a new pod and call the GET endpoint to retrieve the task status. You should see it in a running state, but if you wait fifty seconds more it should return the new “finished” status.

A screenshot of the software “Postman” showing the result of our endpoint /:taskId with our process finished

It lasted just over fifty seconds between the start and end times, as planned.

You can find the whole code for this article here.

It took a lot of work to set up an asynchronous process like that, and we skipped a lot of mandatory things for an actual production setup.

You have to consider that running an asynchronous process means it can fail at many points while running. Networks are not infallible; containers can crash unexpectedly, database writes can fail, and many other issues can occur. Those are problems you must already consider in a traditional application but with many more possible failure points.

It must be built from the ground up with excellent error management, retries if anything fails, easily accessible logs, and debugging tools. So, use it cautiously and only if your use case needs it.

Despite all those difficulties, it’s a valuable tool to have in your arsenal. I know I loved this concept when I was introduced to it.

Thank you for reading until the end. It was a long and technical post that I decided to split into more than one article to make it easier to read. The follow-up will involve running everything in the Kubernetes cluster with multiple nodes to mimic a production environment in Minikube; stay tuned.

Technical Tutorial

Run Asynchronous Tasks in New Kubernetes Pods with NodeJS

Learn how to launch new Kubernetes pods to run a resource-intensive task asynchronously outside the API process using NodeJS.

Video Streaming at Scale with Kubernetes and RabbitMQ

Deep dive into the problems video streaming sites face and how they can architect their infrastructure to manage the…

What we are going to build.

We need a Kubernetes cluster.

Create the asynchronous process.

Create a file called index.mjs

Create the Dockerfile

Let’s launch pods from an API.

Create a file called index.mjs

Connect to the Kubernetes cluster

Create a new pod endpoint.

Final webhook endpoint

Written by Alexandre Olive