Deploy Kubernetes (K8s) on Amazon AWS using mixed on-demand and spot instances

Published in

ITNEXT

12 min readApr 14, 2022

Deploy in a few minutes an high available Kubernetes cluster on Amazon AWS using mixed on-demand and spot instances.

Please note, this is only an example on how to Deploy a Kubernetes cluster. For a production environment you should use EKS or ECS.

The scope of this repo is to show all the AWS components needed to deploy a high available K8s cluster.

Requirements

Terraform - Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files.
Amazon AWS Account - Amazon AWS account with billing enabled
kubectl - The Kubernetes command-line tool (optional)
aws cli optional

You need also:

one VPC with private and public subnets
one ssh key already uploaded on your AWS account
one bastion host to reach all the private EC2 instances

For VPC and bastion host you can refer to this repository.

Infrastructure overview

The final infrastructure will be made by:

two autoscaling group, one for the kubernetes master nodes and one for the worker nodes
two launch template, used by the asg
one internal load balancer (L4) that will route traffic to Kubernetes servers
one external load balancer (L7) that will route traffic to Kubernetes workers
one security group that will allow traffic from the VPC subnet CIDR on all the k8s ports (kube api, nginx ingress node port etc)
one security group that will allow traffic from all the internet into the public load balancer (L7) on port 80 and 443
one S3 bucket, used to store the cluster join certificates
one IAM role, used to allow all the EC2 instances in the cluster to write on the S3 bucket, used to share the join certificates
one certificate used by the public LB, stored on AWS ACM. The certificate is a self signed certificate.

Kubernetes setup

The installation of K8s id done by kubeadm. In this installation Containerd is used as CRI and flannel is used as CNI.

You can optionally install Nginx ingress controller and Longhorn.

To install Nginx ingress set the variable install_nginx_ingress to yes (default no). To install longhorn set the variable install_longhorn to yes (default no). NOTE if you don’t install the nginx ingress, the public Load Balancer and the SSL certificate won’t be deployed.

In this installation is used a S3 bucket to store the join certificate/token. At the first startup of the instance, if the cluster does not exist, the S3 bucket is used to get the join certificates/token.

Before you start

Note that this tutorial uses AWS resources that are outside the AWS free tier, so be careful!

Project setup

Clone this repo and go in the example/ directory:

git clone https://github.com/garutilorenzo/k8s-aws-terraform-cluster
cd k8s-aws-terraform-cluster/example/

Now you have to edit the main.tf file and you have to create the terraform.tfvars file. For more detail see AWS provider setup and Pre flight checklist.

Or if you prefer you can create an new empty directory in your workspace and create this three files:

terraform.tfvars
main.tf
provider.tf

The main.tf file will look like:

variable "AWS_ACCESS_KEY" {}variable "AWS_SECRET_KEY" {}variable "environment" {
  default = "staging"
}variable "AWS_REGION" {
  default = "<YOUR_REGION>"
}module "k8s-cluster" {
  ssk_key_pair_name      = "<SSH_KEY_NAME>"
  uuid                   = "<GENERATE_UUID>"
  environment            = var.environment
  vpc_id                 = "<VPC_ID>"
  vpc_private_subnets    = "<PRIVATE_SUBNET_LIST>"
  vpc_public_subnets     = "<PUBLIC_SUBNET_LIST>"
  vpc_subnet_cidr        = "<SUBNET_CIDR>"
  PATH_TO_PUBLIC_LB_CERT = "<PAHT_TO_PUBLIC_LB_CERT>"
  PATH_TO_PUBLIC_LB_KEY  = "<PAHT_TO_PRIVATE_LB_CERT>"
  install_nginx_ingress  = true
  source                 = "github.com/garutilorenzo/k8s-aws-terraform-cluster"
}output "k8s_dns_name" {
  value = module.k8s-cluster.k8s_dns_name
}output "k8s_server_private_ips" {
  value = module.k8s-cluster.k8s_server_private_ips
}output "k8s_workers_private_ips" {
  value = module.k8s-cluster.k8s_workers_private_ips
}

For all the possible variables see Pre flight checklist

The provider.tf will look like:

provider "aws" {
  region     = var.AWS_REGION
  access_key = var.AWS_ACCESS_KEY
  secret_key = var.AWS_SECRET_KEY
}

The terraform.tfvars will look like:

AWS_ACCESS_KEY = "xxxxxxxxxxxxxxxxx"
AWS_SECRET_KEY = "xxxxxxxxxxxxxxxxx"

Now we can init terraform with:

terraform initInitializing modules...
- k8s-cluster in ..Initializing the backend...Initializing provider plugins...
- Finding latest version of hashicorp/template...
- Finding latest version of hashicorp/aws...
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed by HashiCorp)
- Installing hashicorp/aws v4.9.0...
- Installed hashicorp/aws v4.9.0 (signed by HashiCorp)Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.Terraform has been successfully initialized!You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Generate self signed SSL certificate for the public LB (L7)

NOTE If you already own a valid certificate skip this step and set the correct values for the variables: PATH_TO_PUBLIC_LB_CERT and PATH_TO_PUBLIC_LB_KEY

We need to generate the certificates (sel signed) for our public load balancer (Layer 7). To do this we need openssl, open a terminal and follow this step:

Generate the key:

openssl genrsa 2048 > privatekey.pem
Generating RSA private key, 2048 bit long modulus (2 primes)
.......+++++
...............+++++
e is 65537 (0x010001)

Generate the a new certificate request:

openssl req -new -key privatekey.pem -out csr.pem
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:IT
State or Province Name (full name) [Some-State]:Italy
Locality Name (eg, city) []:Brescia
Organization Name (eg, company) [Internet Widgits Pty Ltd]:GL Ltd
Organizational Unit Name (eg, section) []:IT
Common Name (e.g. server FQDN or YOUR name) []:testlb.domainexample.com
Email Address []:email@you.comPlease enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

Generate the public CRT:

openssl x509 -req -days 365 -in csr.pem -signkey privatekey.pem -out public.crt
Signature ok
subject=C = IT, ST = Italy, L = Brescia, O = GL Ltd, OU = IT, CN = testlb.domainexample.com, emailAddress = email@you.com
Getting Private key

This is the final result:

lscsr.pem  privatekey.pem  public.crt

Now set the variables:

PATH_TO_PUBLIC_LB_CERT: ~/full_path/public.crt
PATH_TO_PUBLIC_LB_KEY: ~/full_path/privatekey.pem

AWS provider setup

Follow the prerequisites step on this link. In your workspace folder or in the examples directory of this repo create a file named terraform.tfvars:

AWS_ACCESS_KEY = "xxxxxxxxxxxxxxxxx"
AWS_SECRET_KEY = "xxxxxxxxxxxxxxxxx"

Pre flight checklist

Once you have created the terraform.tfvars file edit the main.tf file (always in the example/ directory) and set the following variables.

Deploy

We are now ready to deploy our infrastructure. First we ask terraform to plan the execution with:

terraform plan...
...
      + name                   = "k8s-sg"
      + name_prefix            = (known after apply)
      + owner_id               = (known after apply)
      + revoke_rules_on_delete = false
      + tags                   = {
          + "Name"        = "sg-k8s-cluster-staging"
          + "environment" = "staging"
          + "provisioner" = "terraform"
          + "scope"       = "k8s-cluster"
          + "uuid"        = "xxxxx-xxxxx-xxxx-xxxxxx-xxxxxx"
        }
      + tags_all               = {
          + "Name"        = "sg-k8s-cluster-staging"
          + "environment" = "staging"
          + "provisioner" = "terraform"
          + "scope"       = "k8s-cluster"
          + "uuid"        = "xxxxx-xxxxx-xxxx-xxxxxx-xxxxxx"
        }
      + vpc_id                 = "vpc-xxxxxx"
    }Plan: 25 to add, 0 to change, 0 to destroy.Changes to Outputs:
  + k8s_dns_name            = (known after apply)
  + k8s_server_private_ips  = [
      + (known after apply),
    ]
  + k8s_workers_private_ips = [
      + (known after apply),
    ]Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.

now we can deploy our resources with:

terraform apply...+ tags_all               = {
          + "Name"        = "sg-k8s-cluster-staging"
          + "environment" = "staging"
          + "provisioner" = "terraform"
          + "scope"       = "k8s-cluster"
          + "uuid"        = "xxxxx-xxxxx-xxxx-xxxxxx-xxxxxx"
        }
      + vpc_id                 = "vpc-xxxxxxxx"
    }Plan: 25 to add, 0 to change, 0 to destroy.Changes to Outputs:
  + k8s_dns_name            = (known after apply)
  + k8s_server_private_ips  = [
      + (known after apply),
    ]
  + k8s_workers_private_ips = [
      + (known after apply),
    ]Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.Enter a value: yes...
...Apply complete! Resources: 25 added, 0 changed, 0 destroyed.Outputs:k8s_dns_name = "k8s-ext-<REDACTED>.elb.amazonaws.com"
k8s_server_private_ips = [
  tolist([
    "172.x.x.x",
    "172.x.x.x",
    "172.x.x.x",
  ]),
]
k8s_workers_private_ips = [
  tolist([
    "172.x.x.x",
    "172.x.x.x",
    "172.x.x.x",
  ]),
]

Now on one master node you can check the status of the cluster with:

ssh -j bastion@<BASTION_IP> ubuntu@172.x.x.xWelcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-1021-aws x86_64)* Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantageSystem information as of Wed Apr 13 12:41:52 UTC 2022System load:  0.52               Processes:             157
  Usage of /:   17.8% of 19.32GB   Users logged in:       0
  Memory usage: 11%                IPv4 address for cni0: 10.244.0.1
  Swap usage:   0%                 IPv4 address for ens3: 172.68.4.2370 updates can be applied immediately.Last login: Wed Apr 13 12:40:32 2022 from 172.68.0.6
ubuntu@i-04d089ed896cfafe1:~$ sudo su -root@i-04d089ed896cfafe1:~# kubectl get nodes
NAME                  STATUS   ROLES                  AGE     VERSION
i-0033b408f7a1d55f3   Ready    control-plane,master   3m33s   v1.23.5
i-0121c2149821379cc   Ready    <none>                 4m16s   v1.23.5
i-04d089ed896cfafe1   Ready    control-plane,master   4m53s   v1.23.5
i-072bf7de2e94e6f2d   Ready    <none>                 4m15s   v1.23.5
i-09b23242f40eabcca   Ready    control-plane,master   3m56s   v1.23.5
i-0cb1e2e7784768b22   Ready    <none>                 3m57s   v1.23.5root@i-04d089ed896cfafe1:~# kubectl get ns
NAME              STATUS   AGE
default           Active   5m18s
ingress-nginx     Active   111s # <- ingress controller ns
kube-node-lease   Active   5m19s
kube-public       Active   5m19s
kube-system       Active   5m19s
longhorn-system   Active   109s  # <- longhorn nsroot@i-04d089ed896cfafe1:~# kubectl get pods --all-namespaces
NAMESPACE         NAME                                          READY   STATUS      RESTARTS        AGE
ingress-nginx     ingress-nginx-admission-create-v2fpx          0/1     Completed   0               2m33s
ingress-nginx     ingress-nginx-admission-patch-54d9f           0/1     Completed   0               2m33s
ingress-nginx     ingress-nginx-controller-7fc8d55869-cxv87     1/1     Running     0               2m33s
kube-system       coredns-64897985d-8cg8g                       1/1     Running     0               5m46s
kube-system       coredns-64897985d-9v2r8                       1/1     Running     0               5m46s
kube-system       etcd-i-0033b408f7a1d55f3                      1/1     Running     0               4m33s
kube-system       etcd-i-04d089ed896cfafe1                      1/1     Running     0               5m42s
kube-system       etcd-i-09b23242f40eabcca                      1/1     Running     0               5m
kube-system       kube-apiserver-i-0033b408f7a1d55f3            1/1     Running     1 (4m30s ago)   4m30s
kube-system       kube-apiserver-i-04d089ed896cfafe1            1/1     Running     0               5m46s
kube-system       kube-apiserver-i-09b23242f40eabcca            1/1     Running     0               5m1s
kube-system       kube-controller-manager-i-0033b408f7a1d55f3   1/1     Running     0               4m36s
kube-system       kube-controller-manager-i-04d089ed896cfafe1   1/1     Running     1 (4m50s ago)   5m49s
kube-system       kube-controller-manager-i-09b23242f40eabcca   1/1     Running     0               5m1s
kube-system       kube-flannel-ds-7c65s                         1/1     Running     0               5m2s
kube-system       kube-flannel-ds-bb842                         1/1     Running     0               4m10s
kube-system       kube-flannel-ds-q27gs                         1/1     Running     0               5m21s
kube-system       kube-flannel-ds-sww7p                         1/1     Running     0               5m3s
kube-system       kube-flannel-ds-z8h5p                         1/1     Running     0               5m38s
kube-system       kube-flannel-ds-zrwdq                         1/1     Running     0               5m22s
kube-system       kube-proxy-6rbks                              1/1     Running     0               5m2s
kube-system       kube-proxy-9npgg                              1/1     Running     0               5m21s
kube-system       kube-proxy-px6br                              1/1     Running     0               5m3s
kube-system       kube-proxy-q9889                              1/1     Running     0               4m10s
kube-system       kube-proxy-s5qnv                              1/1     Running     0               5m22s
kube-system       kube-proxy-tng4x                              1/1     Running     0               5m46s
kube-system       kube-scheduler-i-0033b408f7a1d55f3            1/1     Running     0               4m27s
kube-system       kube-scheduler-i-04d089ed896cfafe1            1/1     Running     1 (4m50s ago)   5m58s
kube-system       kube-scheduler-i-09b23242f40eabcca            1/1     Running     0               5m1s
longhorn-system   csi-attacher-6454556647-767p2                 1/1     Running     0               115s
longhorn-system   csi-attacher-6454556647-hz8lj                 1/1     Running     0               115s
longhorn-system   csi-attacher-6454556647-z5ftg                 1/1     Running     0               115s
longhorn-system   csi-provisioner-869bdc4b79-2v4wx              1/1     Running     0               115s
longhorn-system   csi-provisioner-869bdc4b79-4xcv4              1/1     Running     0               114s
longhorn-system   csi-provisioner-869bdc4b79-9q95d              1/1     Running     0               114s
longhorn-system   csi-resizer-6d8cf5f99f-dwdrq                  1/1     Running     0               114s
longhorn-system   csi-resizer-6d8cf5f99f-klvcr                  1/1     Running     0               114s
longhorn-system   csi-resizer-6d8cf5f99f-ptpzb                  1/1     Running     0               114s
longhorn-system   csi-snapshotter-588457fcdf-dlkdq              1/1     Running     0               113s
longhorn-system   csi-snapshotter-588457fcdf-p2c7c              1/1     Running     0               113s
longhorn-system   csi-snapshotter-588457fcdf-p5smn              1/1     Running     0               113s
longhorn-system   engine-image-ei-fa2dfbf0-bkwhx                1/1     Running     0               2m7s
longhorn-system   engine-image-ei-fa2dfbf0-cqq9n                1/1     Running     0               2m8s
longhorn-system   engine-image-ei-fa2dfbf0-lhjjc                1/1     Running     0               2m7s
longhorn-system   instance-manager-e-542b1382                   1/1     Running     0               119s
longhorn-system   instance-manager-e-a5e124bb                   1/1     Running     0               2m4s
longhorn-system   instance-manager-e-acb2a517                   1/1     Running     0               2m7s
longhorn-system   instance-manager-r-11ab6af6                   1/1     Running     0               119s
longhorn-system   instance-manager-r-5b82fba2                   1/1     Running     0               2m4s
longhorn-system   instance-manager-r-c2561fa0                   1/1     Running     0               2m6s
longhorn-system   longhorn-csi-plugin-4br28                     2/2     Running     0               113s
longhorn-system   longhorn-csi-plugin-8gdxf                     2/2     Running     0               113s
longhorn-system   longhorn-csi-plugin-wc6tt                     2/2     Running     0               113s
longhorn-system   longhorn-driver-deployer-7dddcdd5bb-zjh4k     1/1     Running     0               2m31s
longhorn-system   longhorn-manager-cbsh7                        1/1     Running     0               2m31s
longhorn-system   longhorn-manager-d2t75                        1/1     Running     1 (2m9s ago)    2m31s
longhorn-system   longhorn-manager-xqlfv                        1/1     Running     1 (2m9s ago)    2m31s
longhorn-system   longhorn-ui-7648d6cd69-tc6b9                  1/1     Running     0               2m31s

Public LB check

We can now test the public load balancer, nginx ingress controller and the security group ingress rules. On your local PC run:

curl -k -v https://k8s-ext-<REDACTED>.elb.amazonaws.com/
*   Trying 34.x.x.x:443...
* TCP_NODELAY set
* Connected to k8s-ext-<REDACTED>.elb.amazonaws.com (34.x.x.x) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=IT; ST=Italy; L=Brescia; O=GL Ltd; OU=IT; CN=testlb.domainexample.com; emailAddress=email@you.com
*  start date: Apr 11 08:20:12 2022 GMT
*  expire date: Apr 11 08:20:12 2023 GMT
*  issuer: C=IT; ST=Italy; L=Brescia; O=GL Ltd; OU=IT; CN=testlb.domainexample.com; emailAddress=email@you.com
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55c6560cde10)
> GET / HTTP/2
> Host: k8s-ext-<REDACTED>.elb.amazonaws.com
> user-agent: curl/7.68.0
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 404 
< date: Tue, 12 Apr 2022 10:08:18 GMT
< content-type: text/html
< content-length: 146
< strict-transport-security: max-age=15724800; includeSubDomains
< 
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host k8s-ext-<REDACTED>.elb.amazonaws.com left intact

404 is a correct response since the cluster is empty.

Deploy a sample stack

We use the same stack used in this repository. This stack need longhorn and nginx ingress.

To test all the components of the cluster we can deploy a sample stack. The stack is composed by the following components:

MariaDB
Nginx
Wordpress

Each component is made by: one deployment and one service. Wordpress and nginx share the same persistent volume (ReadWriteMany with longhorn storage class). The nginx configuration is stored in four ConfigMaps and the nginx service is exposed by the nginx ingress controller.

Deploy the resources with:

kubectl apply -f https://raw.githubusercontent.com/garutilorenzo/k3s-oci-cluster/master/deployments/mariadb/all-resources.yml
kubectl apply -f https://raw.githubusercontent.com/garutilorenzo/k3s-oci-cluster/master/deployments/nginx/all-resources.yml
kubectl apply -f https://raw.githubusercontent.com/garutilorenzo/k3s-oci-cluster/master/deployments/wordpress/all-resources.yml

NOTE to install WP and reach the wp-admin path you have to edit the nginx deployment and change this line:

env:
  - name: SECURE_SUBNET
    value: 8.8.8.8/32 # change-me

and set your public ip address.

To check the status:

root@i-04d089ed896cfafe1:~# kubectl get pods -o wide
NAME                         READY   STATUS    RESTARTS   AGE     IP            NODE                  NOMINATED NODE   READINESS GATES
mariadb-6cbf998bd6-s98nh     1/1     Running   0          2m21s   10.244.2.13   i-072bf7de2e94e6f2d   <none>           <none>
nginx-68b4dfbcb6-s6zfh       1/1     Running   0          19s     10.244.1.12   i-0121c2149821379cc   <none>           <none>
wordpress-558948b576-jgvm2   1/1     Running   0          71s     10.244.3.14   i-0cb1e2e7784768b22   <none>           <none>root@i-04d089ed896cfafe1:~# kubectl get deployments
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
mariadb     1/1     1            1           2m32s
nginx       1/1     1            1           30s
wordpress   1/1     1            1           82sroot@i-04d089ed896cfafe1:~# kubectl get svc
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP    14m
mariadb-svc     ClusterIP   10.108.78.60    <none>        3306/TCP   2m43s
nginx-svc       ClusterIP   10.103.145.57   <none>        80/TCP     41s
wordpress-svc   ClusterIP   10.103.49.246   <none>        9000/TCP   93s

Now you are ready to setup WP, open the LB public ip and follow the wizard. NOTE nginx and the Kubernetes Ingress rule are configured without virthual host/server name.

To clean the deployed resources:

kubectl delete -f https://raw.githubusercontent.com/garutilorenzo/k3s-oci-cluster/master/deployments/mariadb/all-resources.yml
kubectl delete -f https://raw.githubusercontent.com/garutilorenzo/k3s-oci-cluster/master/deployments/nginx/all-resources.yml
kubectl delete -f https://raw.githubusercontent.com/garutilorenzo/k3s-oci-cluster/master/deployments/wordpress/all-resources.yml

Clean up

Before destroy all the infrastructure DELETE all the object in the S3 bucket.

terraform destroy