Automate your Kubernetes cluster bootstrapping with Rancher and Ansible and speed up your pipeline

How Ansible and Rancher can help to streamline the Kubernetes cluster creation process.

Alexis Sotto Maior
ITNEXT

--

“Full speed” by maekke is licensed under CC BY-NC-ND 2.0

There are many reasons why you would like to automatically bootstrap a Kubernetes Cluster. Whether for testing your application or simply because you don’t want to spend too much time on configuration, automating your cluster creation its an obvious decision for leveraging DevOps. Here, we will describe how you can boost your productivity with Rancher and Ansible and integrate this automation into your k8s Continuous Delivery process.

On this post, we described a project for a CI/CD pipeline in Kubernetes which relies on Rancher and Ansible to automate clusters creation for testing purposes. The use case was to bootstrap a temporary k8s environment where we could deploy, test and validate a new feature before moving it to staging and production. This transient cluster is then decommissioned whenever the feature is merged with the master branch.

Ephemeral Kubernetes clusters are great for testing. In a few minutes, it is possible to deploy a highly available and a fully operational environment, creating a small-scale production-like cluster. Besides your application’s functional requirements, you can validate its underlying configuration, such as secrets, config maps, volumes, auto-scaling policies, etc. Furthermore, you can apply chaos testing to anticipate cluster failures, perform stress tests to analyze its reliability and security test to mitigate your cluster vulnerabilities.

Rancher, in the DevOps landscape, offers a great solution for Kubernetes Cluster setup and management. From a single Rancher installation, you can manage hundreds of clusters (or 1 million, as from Rancher 2.5), integrating observability, security, logging and user management services in an all-in-one platform. On top of that, Rancher provides cluster templating and an API which we can interact with automation tools like Ansible.

Cluster Templating in Rancher is one of the core functionalities that we used in our automation. Rancher bootstrapping process relies on RKE (Rancher Kubernetes Engine) to configure a new cluster based on configurations described in a cluster.yaml file. From Kubernetes version to control plane attributes, all configuration is packaged in this file. Additionally, with Rancher, you can create an RKE Template from an existing cluster. This feature enhances control, as we can bootstrap clusters with the exact same configuration as the production environment.

An Ansible Playbook for Rancher Kubernetes Bootsrapping

Having an API in Rancher and an RKE template makes our job easier. With Ansible, we can automate the steps required to create a new Kubernetes cluster in Rancher and add some post-configurations, such as DNS entry from ingress and Helm Chart deployments. This playbook, for example, interacts with Rancher to deploy a cluster from a template.

Here is the information that we need before running our first playbook. This must be passed to Ansible as variables:

  1. rancher_host: URL of your rancher installation.
  2. rancher_clustername: the name of your new bootstrapping cluster.
  3. rancher_user: Rancher username with “create cluster” permission.
  4. rancher_pass: Rancher password for your rancher_user.
  5. rancher_cluster_template_id: The id of your RKE Template (ex: ctr-abcd)
  6. rancher_cluster_roles: A list of roles that each node in your cluster will assume (controlplane, etcd and/or worker).
  7. An Ansible inventory file with the nodes that will be added to the cluster.

The steps executed by the playbook are:

Create Rancher Cluster Ansible Playbook
Fig 1. Create Rancher Cluster Ansible Playbook Workflow

If everything goes as planned, the cluster will be ready and available under Rancher UI. The average time for a successful 3-node (Centos 7, 4 core, 6GB RAM) cluster bootstrapping is 17 minutes, but can be slightly shorter if you set up a single-node cluster. This is because of the node registration process, where each node downloads all required containers (including Rancher agent container), execute all the installation and interact with Rancher master for configuration.

This process requires, however, that you already have your Ansible Inventory ready, with VMs configured and docker installed. This is not flexible and definitely requires manual configuration. What we did in our automation is to rely on dynamic inventory and Ansible Roles to fully prepare our nodes before running this part of the playbook.

As our private cloud is VMWare-based, we used this dynamic inventory plugin, that queries vCenter and returns a list of VMs based on defined criteria (we use tags for that). Although, if you opt to create your VMs on the fly, you can also use Ansible Roles like this one. The amazing thing about dynamic inventory in Ansible is the large availability of plugins, whether you are on-premises or on public clouds, such as AWS, Azure, GCP, etc.

Let us add those new steps to the playbook’s workflow:

Fig 2. Create Rancher Cluster Ansible Playbook Workflow with dynamic node prep.

Ok, so now we have an automation that creates a Kubernetes Cluster on Rancher based on RKE Templates and that leverages dynamic inventory in Ansible to create or reuse VMs based on specific criteria. But, to get the most of it, we also want to deploy our application on this cluster, register the ingress endpoint in DNS and tell our Devs that their cluster is ready, with their new featured deployed.

Fig 3. Full Workflow with post-operation tasks.

The whole process took around 20 minutes — 17 minutes for the k8s cluster bootstrapping + 3 minutes for post-operation tasks (Do you wish a fast-track? Keep reading!)

Overall, this process, integrated with our CI/CD pipeline, not only dynamically creates the cluster but also sync Helm Chats deployed in the production of our main application and, finally, rolls out the feature that we want to test/validate with our clients. Additionally, we execute automated tasks to validate the configuration, check the performance, security and chaos testing. It has been so far a huge ally for our continuous delivery process.

Bonus: Speed up you Kubernetes dev/test environment with k3s clusters

Ok, so you don’t want to wait 20 minutes… Me neither. A restless DevOps Engineer will always be looking for an alternative, due to his/her covenant with reliability and performance which, in my opinion, is the essence of improved automation and CI/CD engineering.

What we did recently was to create a k3s cluster instead of an RKE cluster and import it into our Rancher installation.

K3s is a lightweight certified Kubernetes distribution focused on edge computing. It wraps all Kubernetes components in a single and small binary which also simplifies the cluster bootstrapping process. Despite its size and simplicity, it still provides basic and additional features, such as ingress controller network policy and containerd as the container runtime.

For the sake of test environments, especially for software functional testing, k3s offers a great solution for ephemeral environments and, thus, perfect for CI/CD Integration.

From our last workflow, we had to adapt the “Manage Rancher Cluster” Role to create a standalone k3s in a single VM and then import it on Rancher. This process is slightly different from the original. While the former bootstraps Kubernetes clusters from Rancher (known as Custom clusters), this new Ansible role first creates the k3s cluster straight into the VM node and only then import into Rancher server (imported cluster).

Fig 4. Create k3s Cluster Ansible Playbook Workflow

The following image shows each step duration of our drone.io-based CI/CD pipeline for deploying a new feature in both k8s and k3s dev/test clusters. Check the differences between both environment setup times.

CI/CD pipeline in drone.io

By leveraging k3s as our test environment we noticed a marked fall from 20 minutes to only 6 minutes for clusters bootstrapping and post-operation tasks. This means improved pipeline speed that will enable and stimulate developers to deliver code more often and test software sooner.

--

--