Running Kubeflow on Powerful GPU enabled instances on AWS-EKS

Kubeflow on GPU Enabled AWS-EKS Cluster

Published in

ITNEXT

8 min readApr 23, 2019

Ever since Google created Kubernetes as an open source container orchestration tool, it has seen it blossom in ways it might never have imagined. As the project gains in popularity, we are seeing many adjunct programs develop. Kubeflow is designed to bring machine learning to Kubernetes containers. The project’s goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures.

The GPU has evolved from just a graphics chip into a core component of deep learning and machine learning. CPUs are designed for more general computing workloads. GPUs in contrast are less flexible, however GPUs are designed to compute in parallel the same instructions, this parallel architecture in a GPU is well adapted for vector and matrix operations. Machine learning is a field with intense computational requirements and the choice of your GPU will fundamentally determine users learning experience. With no GPU this might look like months of waiting for an experiment to finish, or running an experiment for a day or more only to see that the chosen parameters were off and the model diverged.

With increasing number of HPC applications, AI powered applications and the broad availability of GPUs in public cloud, there is a need for open-source Kubernetes to be GPU-aware. With Kubernetes on NVIDIA GPUs, software developers and DevOps engineers can build and deploy GPU-accelerated deep learning training or inference applications to heterogeneous GPU clusters at scale and seamlessly. GPU support in Kubernetes is facilitated by NVIDIA device plugin which exposes the GPUs on the host to the container space.

NVIDIA and others provide AMIs for GPU-based accelerated computing instances on AWS. This enables developers to linearly scale their model training performance on AWS EC2 instances accelerating preprocessing and remove data transfer bottlenecks, and rapidly improve the quality of their machine learning models. Nvidia’s CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

GPU enabled Kubernetes Nodes on EKS

AWS offers GPU-powered EC2 instances that can be used with EKS available in four AWS regions. Powered by up to eight NVIDIA Tesla V100 GPUs, the P3 instances are designed to handle compute-intensive machine learning, deep learning, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, and genomics workloads.

Users can choose the specific flavor while creating a EKS cluster for the worker nodes.

Specifying GPU Nodes as Kubernetes Nodes in EKS

With the cluster above the Kubernetes nodes will be equipped with 4 Tesla V100-SXM2 GPU’s as shown below:

NVIDIA device plugin for Kubernetes

The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automatically:

Expose the number of GPUs on each nodes of your cluster
Keep track of the health of your GPUs
Run GPU enabled containers in your Kubernetes cluster.

The plugin will be configured as a default runtime on the nodes:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

NVIDIA GPUs can now be consumed via container level resource requirements using the resource name nvidia.com/gpu:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: cuda-container
      image: nvidia/cuda:9.0-devel
      resources:
        limits:
          nvidia.com/gpu: 2 # requesting 2 GPUs
    - name: digits-container
      image: nvidia/digits:6.0
      resources:
        limits:
          nvidia.com/gpu: 2 # requesting 2 GPUs

The plugin runs as a daemonset on Kubernetes cluster as shown below:

With the device plugin installed, Kubernetes nodes can now see the NVIDIA GPU’s as regular resources as shown below:

Kubeflow on EKS

While it’s possible to run machine learning workloads with CPU instances, GPU instances have thousands of CUDA cores, which significantly improve performance in the training of deep neural networks and processing large data sets. Kubeflow can be deployed on EKS consuming high powered GPU based EC2 instances.

* tf-hub-0: JupyterHub web application that spawns and manages Jupyter notebooks.
* tf-job-operator, tf-job-dashboard: Runs and monitors TensorFlow jobs in Kubeflow.
* ambassador: Ambassador API Gateway that routes services for Kubeflow.
* centraldashboard: Kubeflow central dashboard UI.
centraldashboard: Kubeflow central dashboard UI.

Consuming GPU’s — Kubeflow

As discussed above any pod which is part of Kubeflow can consume available resources using the resources flag “nvidia.com/gpu”. Kubeflow adds some resources to the cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks that allows users to create and share documents that contain live code, equations, visualizations and narrative text.

JupyterHub lets users manage authenticated access to multiple single-user Jupyter notebooks. JupyterHub delegates the launching of single-user notebooks to pluggable components called “spawners”. JupyterHub has a sub-project named kubespawner, maintained by the community, that enables users to provision single-user Jupyter notebooks backed by Kubernetes pods — the notebooks themselves are Kubernetes pods. kubeform_spawner extends kubespawner to enable users to have a form to specify cpu, memory, gpu, and desired image. Spawners configuration allows users to choose number of GPU’s to be consumed for a Jupyter Server.

Setting Resource Limits — Number of GPU’s

For example the Spawner above is configured to use 3 GPU’s as resource_limit. This creates a pod on Kubernetes with the specified resource limits substituting the value to “nvidia.com/gpu:3” and the specific JupyterHub pod can see 3 GPU’s once created as shown below:

As shown above the specific application can utilize three GPU’s as configured in the spawner options.

Taking an example form upstream Kubeflow — GitHub Issue Summarization: automatic summary generator using a public dataset of GitHub Issues. This specific example constitutes multiple steps like get the data, preprocess dataset, perform training of a TensorFlow NLP model etc. The same can be loaded and executed using the JupyterHub application created above on Kubernetes.

Training using GPU’s

In the example below a user configures a JupyterHub application with 1 GPU as a resource limit in Spawner options.

This allows the application to see only one GPU out of the four GPU’s on the node from the application end.

Application GPU Usage based on Resource Configuration

Running the training with the configuration above:

As seen above the train_model utilizes a single GPU for training and the utilization spikes up on a single GPU out of 4 available GPU’s as shown below:

GPU instances have thousands of CUDA cores, which significantly improve performance in the training of the enormous data above.

Similarly users can choose number of GPU’s based on the requirement and size of the training data_set. An example with 3 GPU’s is shown below:

Notebook Application on Kubernetes using 3 GPU’s

As seen above the process utilizes three GPU’s but evenly not distributed across GPU’s as parallel programming or Multi GPU construct has to be enabled with the code base to properly utilize all the GPU’s in parallel fashion.

NVIDIA’s CUDA (parallel computing platform and application programming interface model created by Nvidia) constitutes multiple models to enable users to use Multiple GPU’s to harness the real power of Multi GPU enabled systems like the AWS EC@ instance above. For example .cuda() accepts a device id where users can assign GPUs on per task basis in a same training_model.

model1 = nn.DataParallel(model1).cuda(device=0)
model1_feat = model1(input_image)model2 = nn.DataParallel(model2).cuda(device=1)
model2_feat = model2(model1_feat, input_feat)

With this users can fully utilize the GPU’s in the system like shown below where the training tasks are spanned across GPU’s:

Training distributed across GPU’s with CUDA

Toolkits like Kubeflow really reinforces the dream where running AI tasks and serving them is not just limited to a handful of organizations but it is easily accessible to everyone. For now, Artificial Intelligence and Deep Learning create tremendous market opportunities on a global scale and are aimed at improving our lives and the world around us. Choosing the right type of hardware for deep learning tasks is a widely discussed topic. With more and more public cloud providers like AWS, GCE and Azure are enabling users to easily procure these infrastructures leverage considerable gains both from financial and time perspectives.