Declarative, Kubernetes-style APIs to Cluster Creation, Configuration, and Management

Multi-Cloud and Multi-Cluster Declarative Kubernetes Cluster Creation and Management with Cluster API (CAPI — v1alpha3)

Gokul Chandra
Published in
14 min readJul 24, 2020

--

The Cluster API (CAPI) is a Kubernetes project that brings declarative, Kubernetes-style APIs to cluster creation. CAPI does this by using Custom Resource Definitions to extend the API exposed by the Kubernetes API Server, allowing users to create new resources such as Clusters (representing a Kubernetes cluster) and Machines (representing the machines that make up the Nodes that form the cluster). A Controller for each resource is then responsible for reacting to changes to these resources to bring up the cluster. The API is designed in such a way that different infrastructure providers can integrate with it to provide their environment specific logic.

A user communicates with a Kubernetes cluster referred as “Management Cluster”, that contains the Cluster-API machinery and other supporting components. The user can request entire clusters to be created and manage their lifecycle by creating cluster resource objects associated with the cloud providers they want to use and the created clusters are referred as “Workload Clusters”. Management Cluster is also where one or more Infrastructure Providers (like AWS, CGP, Azure, vSphere, Openstack etc.) and Bootstrap Providers (as of now Kubeadm is officially supported, users can implement their custom bootstrap provider using operator framework) run. Information of all cloud provider objects created persists in Management cluster.

CAPI — Management Cluster and Workload Clusters

The project maintains multiple infrastructure providers to use CAPI on multitude of cloud providers. Users can implement custom providers by creating controllers and reconciliation using Kubebuilder (operator build framework). Cluster Controller and Machine Controller together form provider controller manager. The machine spec is consumed by the Cluster API component — machine controller. This controller calls machine actuator which is responsible for creation / updation / deletion of machines. Actuators (reconciliation logic) are provider specific.

CAPi — Cluster Controller and Machine Controller

Bootstrap providers are used for bootstrapping (installing k8s) the Kubernetes control plane nodes and worker nodes once the machines and other components are created by provider. As of now Kubeadm is the officially supported and maintained bootstrap provider.

Basic Cluster API is the framework used for all providers. It is maintained independently. Cluster API provider implements operation details of different infrastructures for Cluster API framework. With provider, Cluster API abstracts overall management functions of a cluster and its machines out of operation details of different 3rd party infrastructure vendors.

Architecture and Components

CAPI — Architecture

Kubernetes manages different resources by objects and controllers. A controller typically manages a resource to make its objects actual state to match the desired state supplied by specification.

The Cluster API consists of a shared set of controllers in order to provide a consistent user experience. In order to support multiple cloud environments, some of these controllers (e.g. Cluster, Machine) call provider specific actuators to implement the behavior expected of the controller. Other controllers (e.g. MachineSet) are generic and operate by interacting with other Cluster API (and Kubernetes) resources. The task of the provider implementor is to implement the actuator methods so that the shared controllers can call those methods when they must reconcile state and update status. When an actuator function returns an error, if it is RequeueAfterError then the object will be requeued for further processing after the given RequeueAfter time has passed.

CAPI comprise multiple controllers and the core components are as follows:

Cluster

Cluster provides a way to define common Kubernetes cluster configurations, such as pod network CIDR and service network CIDR, as well as provider-specific management of shared cluster infrastructure. Cluster contains the details required by the infrastructure provider to create a Kubernetes cluster (e.g. CIDR blocks for pods, services).

CAPI — Cluster

Machine

Machine is responsible for describing an individual Kubernetes node. There is only minimal configuration exposed at the common configuration level (mainly Kubernetes version information), and additional configuration is exposed through the embedded ProviderSpec.

CAPI — Machine

Machine Deployments

MachineDeployments allow for the managed deployment and rollout of configuration changes to groups of Machines (node-groups), very much like how traditional k8s Deployments work. This allows for rolling out updates to a node configuration or even rolling out version upgrades for worker nodes in the cluster in an orchestrated manner. It also allows for rolling back to the previous configuration. It is important to note that MachineDeployments should not be used for managing Machines that make up the Kubernetes Control Plane for a given managed Cluster, since they do not provide any assurances that the control plane remains healthy when the MachineDeployment is updated or scaled.

CAPI — Machine Deployment

Machine Health Check

A MachineHealthCheck is responsible for remediating unhealthy Machines.

CAPI — Health Check

All the above controllers accept declarative configuration from users using CRD’s (custom resource definitions). Cluster API v1alpha3 release added many significant features such as declarative control plane management (Kubeadm-based Control Plane (KCP) provides a declarative API to deploy, manage and scale the Kubernetes control plane, including etcd), cross domain control placement (multi-az/failure-domain placement of controller nodes), experimental machine-pools (like AutoScaling Groups), automated replacement of un-healthy nodes using machine health check etc. and few add-on’s such as cluster resource sets.

Apart from the above core components, an infrastructure provider specific controller, a bootstrap provider controller, webhook controllers for validation, Cluster API uses cert-manager to manage the certificates it needs for its webhooks which are deployed in separate namespaces as part of deploying CAPI.

The clusterctl CLI tool handles the lifecycle of a Cluster API on management cluster and automates fetching the YAML files defining CAPI components, provider components and installing them. With clusterctl users can scaffold (clusterctl config cluster command returns a YAML template for creating a workload cluster) configuration templates required to create a cluster based on the infrastructure provider and bootstrap provider installed. Additionally it encodes a set of best practices in managing providers, that helps the user in avoiding mis-configurations or in managing day 2 operations such as upgrades.

Following are the different components created on management cluster upon CAPI installation:

CAPI — Namespaces

Controller managers of respective components are deployed in the associated namespaces:

CAPI — Components

Following are the set of CRD’s applied on the management cluster which include core CAPI CRD’s and bootstrap provider CRD’s.

CAPI — Custom Resource Definitions

capi-controller-manager is deployed in ‘capi-system’ namespace and watches for objects: Clusters, Machine, MachineDeployments, MachineHealthCheck, MachineSet and MachinePools (experimental).

capi-system — capi-controller-manager
capi-system — capi-controller-manager

kubeadm-bootstrap-controller-manager is deployed in ‘capi-kubeadm-bootstrap-system’ namespace and watches for objects KubeadmConfig and KubeadmConfigTemplates.

capi-kubeadm-bootstrap-system — capi-kubeadm-botstrap-controller-manager
capi-kubeadm-bootstrap-system — capi-kubeadm-botstrap-controller-manager

kubeadm-control-plane-controller-manager is deployed in ‘capi-kubeadm-control-plane-system’ namespace and watches for kubeadm control-plane resources in the cluster.

capi-kubeadm-control-plane-system — capi-kubeadm-control-plane-controller-manager
capi-kubeadm-control-plane-system — capi-kubeadm-control-plane-controller-manager

Cluster deployment on AWS using CAPI with CAPA (Cluster API Provider for AWS)

Once a management cluster has been established, users can use CAPI via management cluster to instantiate one or more workload clusters. All supported providers provide bootstrapping using ‘clusterctl’ which returns a YAML template for deploying provider specific controllers and other supporting objects. Users can use the CLI to install providers (a single clusters can hold multiple clusters as each provider has distinct cluster and machine spec) on CAPI enabled clusters (for example: clusterctl init — infrastructure aws — this deploys all CAP components to the cluster).

CAPA is cluster api provider for AWS which enables users to use AWS as platform for creating Kubernetes clusters using CAPI, here machine spec creates EC2 instances and all other objects such as VPC, subnets, route tables etc. will also be created. Users can refer to CRD spec to customize most parts of the cloud provider specific configuration.

Prerequisites include IAM entities allowing the bootstrap and management clusters to create resources on AWS. Other variables such as regions, ssh-key, machine-type can be provided as environment variables which will be substituted to the configuration created using clusterctl. Admin account credentials are stored in a secret.

capa-controller-manager is deployed in ‘capa-system’ namespace and watches for objects: AWSCluster, AWSMachine, AWSMachineDeployments and when the same are created on the management cluster, they will be translated to AWS EC2 instances. The controller reconciles and assures that all the related objects are created.

capa-system — capa-controller-manager
capa-system — capa-controller-manager

A management cluster in the topology below is an on-premises cluster with CAPI deployed, clusterctl is used to install CAPA components. Users can use clusterctl to create base configuration templates with a predefined list of Cluster API objects; Cluster, Machines, Machine Deployments, etc. which are required for a specific provider. A single management cluster can hold multiple providers and ‘ — infrastructure’ flag can be used in case of multi-provider cluster. All objects can be scoped in a namespace which will be handy when a management cluster manages workload clusters at scale.

Cluster API Provider for AWS

Following are the provider specific CRD’s. In one or other way some of them map to the core components of CAPI like cluster and machine.

CAPA — Custom Resource Definitions

The manifests below are used to create a Cluster and associated machines on AWS. All the objects are scoped under ‘aws-cluster’ namespace.

CAPA — Configuration Spec

Objects refer to other objects (for example cluster include references to KubeadmContolPlane and AWSCluster), a sample mapping is shown below:

CAPA — Objects

CAPA controller manager creating objects:

CAPA — controller-manager

CAPA controller manager creating machines (ec2 instances):

CAPA — controller-manager

CAPA enables users to use existing VPC or by default it creates a VPC with private and public subnets. In a multi-az setting users can provide a map in AWSCluster providing subnet information for each zone.

CAPA — VPC

Subnets:

CAPA — Subnets

Route Tables:

CAPA — Route Tables

Security Groups:

CAPA — Security Groups

Elastic IPs attached to three controllers:

CAPA — EIP for Controllers

A HA control plane, with v1alpha3 Kubeadm-based Control Plane (KCP) provides a declarative API to deploy and scale the Kubernetes control plane, including etcd in a Kubernetes Deployment replica scaling. This approach eliminates manual intervention in removing etcd members while scaling down, KCP automates deployment, scaling, and upgrades. As shown below all machines are created based on the config specification.

CAPA — Kubernetes Cluster

Control planes can be controlled and configured separately by having separate AWSMachineTemplates and referring the same in KubeadmControlPlane specification.

CAPA — Control Plane Machines

Master nodes placed across failure domains (availability-zones):

CAPA — Control Plane Machines — Cross AZ

Worker nodes can be controlled and configured separately by having separate AWSMachineTemplate and referring the same in MachineDeployment specification. A MachineDeployment orchestrates deployments over a fleet of MachineSets (immutable abstraction over Machines).

CAPA — Node Machines

Once all the machines are bootstrapped with required networking the capi-kubeadm-control-plane-controller bootstraps the master nodes.

CAPA — Kubeadm Controlplane Bootstrap

The capi-kubeadm-bootstrap-controller-manager takes care of joining nodes to the bootstrapped master nodes. There are feature requests and roadmap milestones to enable cloud provider specific managed Kubernetes service controlplane (EKS, AKS, GKE etc.) that can be managed with CAPI, with this users can manage managed Kubernetes services with CAPI.

CAPA — Kubeadm Node Bootstrap

All secret information along with the kubeconfig are stored as secrets, the same can be moved to workload clusters using ‘clusterctl move <cluster_name>’. Network solution is an add-on similar to traditional Kubeadm bootstrap procedure.

CAPA — Secrets and Kubeconfig

All state information persists in management cluster and the cluster spec status field stores the information (resource id’s, names etc.). All the resources created by the infrastructure-provider are tagged which are used in associations and while deletion of a cluster to make sure no resource created is left behind.

CAPA — Spec-Status
CAPA — Spec-Status

Cluster deployment on Azure using CAPI with CAPZ (Cluster API Provider for Azure) — Multicloud Deployments from a Single Management Cluster

A single management cluster can be used to host multiple infrastructure providers. CAPZ is deployed in ‘capz-system’ namespace. Users can use namespace scoping to easily separate clusters based on the infrastructure provider (or any criterion), for example in the scenario below all the azure cluster based configuration is deployed on ‘azure-cluster’ namespace.

A capz-controller-manager is deployed and watches for Azure specific objects (AzureCluster, AzureMachine etc.) in the cluster.

CAPZ — Controller Manager

The bootstrap credentials secret contain required service principal with associated roles allowing to create the resources (similar to ARM).

CAPZ — Controller Manager

Azure provider specific CRDs in the cluster:

CAPZ — Custom Resource Definitions

CAPZ on the management cluster creating required infrastructure on Azure:

CAPZ — Kubernetes Cluster

Virtual machines on Azure created by CAPZ on management cluster, once the infrastructure creation is complete the KCP and KBP will bootstrap a full-fledged HA Kubernetes cluster.

CAPZ — Machines

With the above configuration in place users can manage multiple clusters across multiple cloud providers. A sample of listing clusters, machines and control-plane on management cluster scoped under a namespace is as below:

CAPI — Multi Infrastructure Provider

Day Two Operations — Zero Downtime Upgrades and Scaling

Kubernetes cluster level day two operations include version upgrades, machine image upgrades, patches etc. ZDT is a significant aspect that is crucial for live environments to be always available. CAPA uses a strategy where there is no need for users to follow a blue/green approach (having two identical full-fledged clusters).

All upgrade scenarios (Control plane and Nodes — Kubernetes version upgrades, Control plane and Node Image upgrades/change, Spec changes) follow Rolling update strategy very similar to Kubernetes deployment where the update take place with zero downtime by incrementally updating instances with new ones. As of now the only mode supported is RollingUpdate.

CAPI — Upgrade Strategy

The kubeadm-bootstrap provider uses ‘kubeadm upgrade’ utility to perform the Kubernetes related upgrades and infrastructure-provider enables rolling updates on the cloud-provider end. Clusterctl also can be used to perform upgrades using the same kubeadm CLI notation (clusterctl upgrade plan/apply).

Kubeadm upgrade utility — sequence of steps:

Kubeadm — Upgrade

Zero Downtime Upgrades

To upgrade the Kubernetes control plane version, users can modify the KubeadmControlPlane resource’s Spec.Version field. This will trigger a rolling upgrade of the control plane.

For example, In the above created AWS workload cluster there are three replicas of controllers with version v1.17.3:

CAPI — Control Plane Upgrade

Modifying version v1.17.3 to v1.17.5 in the KubeadmControlPlane spec:

CAPI — Control Plane Upgrade

Rolling upgrade of control plane nodes:

CAPI — Control Plane Upgrade — Rolling Update

Similarly, workload machines are grouped under one or more MachineDeployments MachineDeployments will transparently manage MachineSets and Machines to allow for a seamless scaling experience. A modification to the MachineDeployments spec will begin a rolling update of the workload machines. All nodes are cordoned and drained as part of the upgrade.

Rolling upgrade of nodes:

CAPI — Node Upgrade — Rolling Update
CAPI — Node Upgrade — Machine Deployment

Rolling upgrade — Kubernetes Version:

CAPI— Kubernetes Version Upgrade

Scaling Nodes

The scaling works the same was as replica_count in Kubernetes deployment spec work. With v1alpha3 there is an experimental addition of ‘machinepool’ which is implemented on cloud-provider end as autoscaling group, even without it users can change the ‘replicas’ field in either KubeadmControlPlane (to scale master nodes) or MachineDeployment (to scale nodes) to seamlessly scale the number of instances in the cluster.

For example, changing the existing ‘replicas:3’ to ‘replicas:4’ in MachineDeployment spec will start rolling new nodes in the cluster:

CAPI— Node Scaling
CAPI — Node Scaling — Machine Deployments
CAPI — Node Scaling
CAPI — Node Scaling

Changing the Base Image

The instances base image change also follows the same rolling update strategy where the newer machines with the changed image will be installed and the old instances are replaced using cordon and drain.

For example, in AWS scenario ‘ami (Amazon Machine Image)’ field in machine spec can be modified to change the base image on both master and nodes.

CAPI — Changing Machine Base Image

This triggers a rolling update of nodes with newer image specified above:

CAPI — Changing Machine Base Image

User Data / Scripting

KubeadmConfigTemplate and KubeadmControlPlane enables users to execute commands in two stages: ‘preKubeadmCommands’ and ‘postKubeadmCommands’. This enables users to provide cloud-init kind of instructions using the declarative spec. For example, this can be used to automatically deploy any add-ons by packaging the VM image with the required manifests and invoking them with ‘postKubeadmCommands’.

CAPI — Commands

The intention is for Cluster API users to have a unified platform that manages Kubernetes clusters and their associated infrastructure needs across multiple cloud providers and hopefully on-premises, too. Without having to manage complex pipelines with different cloud specific fabrication, users can use CAPI to provision and manage the whole lifecycle of Kubernetes cluster across multiple providers in a easy declarative fashion, in one or other way this provides zero touch provisioning experience and allows users to manage all the spatially distributed Kubernetes clusters using a central cluster entity.

Cluster API adoption should provide portability across different infrastructures (with some configuration details specific to each cloud provider). For users, this means a relatively consistent and reliable user experience, regardless of the cloud provider or on-premises environment.

--

--