Provision Volumes on Kubernetes and Nomad using Ceph CSI

Kidong Lee
ITNEXT
Published in
10 min readMay 7, 2021

--

If you want to run stateful applications on Kubernetes, CSI(Container Storage Interface) has an important role in provisioning volumes dynamically. Generally speaking, CSI is used to provision volumes from storages not only for Kubernetes, but also for all the other container orchestrators such as Mesos, Nomad.
Here, I am going to talk about volume provisioning for Kubernetes and Nomad from Hashicorp using Ceph CSI from the external Ceph Storage.

The important components which I have used for this post are as follows.

Provision Volumes on Kubernetes

In this example, you will see how to provision volumes from the external Ceph Storage on Kubernetes using Ceph CSI.

First, let’s install helm to deploy Ceph CSI chart on Kubernetes.

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

Create a namespace for Ceph CSI.

kubectl create namespace ceph-csi-rbd; 

Cloning Ceph CSI and switching to v3.3.1.

git clone https://github.com/ceph/ceph-csi.git;
cd ceph-csi;
git checkout v3.3.1;
# move to rbd chart.
cd charts/ceph-csi-rbd;

Optionally, you can change the csi version to v1beta1 for Kubernetes v1.17.x

sed -i 's/storage.k8s.io\/betav1/storage.k8s.io\/v1beta1/g' templates/csidriver-crd.yaml;

Now, we need chart values file to deploy ceph csi. It looks like this.

cat <<EOF > ceph-csi-rbd-values.yaml
csiConfig:
- clusterID: "c628ebf1-d03f-4806-9941-8b5840338b14"
monitors:
- "10.0.0.3:6789"
- "10.0.0.4:6789"
- "10.0.0.5:6789"
provisioner:
name: provisioner
replicaCount: 2
EOF

clusterID can be obtained by the command.

sudo ceph fsid;

And monitors for ceph monitors can be obtained with this.

sudo ceph mon dump;

Let’s install Ceph CSI chart on Kuberntes.

helm install \
--namespace ceph-csi-rbd \
ceph-csi-rbd \
--values ceph-csi-rbd-values.yaml \
./;
kubectl rollout status deployment ceph-csi-rbd-provisioner -n ceph-csi-rbd;
helm status ceph-csi-rbd -n ceph-csi-rbd;

Now, you have ceph csi driver ready to provision volumes.

Before creating Ceph Storage Class, several resources need to be created in Ceph Storage.

To mount volumes on Kubernetes from external Ceph Storage, A pool needs to be created first. Create a pool in the ceph.

sudo ceph osd pool create kubePool 64 64

And initialize the pool as block device.

sudo rbd pool init kubePool

To access the pool with the policy, you need a user. In this example, admin user for the pool will be created. Let’s create an admin user for the pool kubePool , and encode the generated key as base64.

sudo ceph auth get-or-create-key client.kubeAdmin mds 'allow *' mgr 'allow *' mon 'allow *' osd 'allow * pool=kubePool' | tr -d '\n' | base64;
QVFEMWxvTmdzMTZyRVJBQTZJalBHcDBWUi8wcUd6TW9sSmlaTXc9PQ==

Encode the admin user kubeAdmin as base64.

echo "kubeAdmin" | tr -d '\n' | base64;
a3ViZUFkbWlu

Now, admin user and key have been created for the pool kubePool .

Let’s create a secret resource for the admin user.

cat > ceph-admin-secret.yaml << EOF
apiVersion: v1
kind: Secret
metadata:
name: ceph-admin
namespace: default
type: kubernetes.io/rbd
data:
userID: a3ViZUFkbWlu
userKey: QVFEMWxvTmdzMTZyRVJBQTZJalBHcDBWUi8wcUd6TW9sSmlaTXc9PQ==
EOF
# create a secret for admin user.
kubectl apply -f ceph-admin-secret.yaml;

Finally, we are ready to create Ceph Storage Class. Let’s create it.

# ceph storage class.
cat > ceph-rbd-sc.yaml <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd-sc
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: rbd.csi.ceph.com
parameters:
clusterID: c628ebf1-d03f-4806-9941-8b5840338b14
pool: kubePool
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: ceph-admin
csi.storage.k8s.io/provisioner-secret-namespace: default
csi.storage.k8s.io/controller-expand-secret-name: ceph-admin
csi.storage.k8s.io/controller-expand-secret-namespace: default
csi.storage.k8s.io/node-stage-secret-name: ceph-admin
csi.storage.k8s.io/node-stage-secret-namespace: default
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- discard
EOF
kubectl apply -f ceph-rbd-sc.yaml;

clusterID is the value of sudo ceph fsid , pool is the value of kubePool which has been created above.

Now, you are ready to provision volumes for the stateful applications using Ceph Storage Class with ease.

Let’s create a pod with pvc to mount the volume from the external ceph storage using ceph storage class.

## create pvc and pod.
cat <<EOF > pv-pod.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-rbd-sc-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: ceph-rbd-sc
---
apiVersion: v1
kind: Pod
metadata:
name: ceph-rbd-pod-pvc-sc
spec:
containers:
- name: ceph-rbd-pod-pvc-sc
image: busybox
command: ["sleep", "infinity"]
volumeMounts:
- mountPath: /mnt/ceph_rbd
name: volume
volumes:
- name: volume
persistentVolumeClaim:
claimName: ceph-rbd-sc-pvc
EOF

kubectl apply -f pv-pod.yaml;

Take a look at the storageClassName which is the value of ceph-rbd-sc . The volume will be provisioned dynamically with ceph storage class, and this example pod will mount the provisioned volume with the path /mnt/ceph_rbd .

Let’s check the volume mount in the container.

kubectl exec pod/ceph-rbd-pod-pvc-sc -- df -k | grep rbd;
/dev/rbd0 1998672 6144 1976144 0% /mnt/ceph_rbd

As seen, a ceph rbd volume has been mounted to the path /mnt/ceph_rbd in the container.

And, check if an image is created in the ceph pool.

sudo rbd ls -p kubePool;
csi-vol-c545c641-a4b3-11eb-b242-26d41aad22d3

For now, you have seen how to provision volumes from the external Ceph Storage using CSI on Kubernetes.

Provision Volumes on Nomad

It is simple to provision volumes dynamically on kubernetes, as you have seen above. What about on Nomad?

First, you need to add the following to the nomad client configuration.

plugin "docker" {
config {
allow_privileged = true
}
}

With this, docker containers can be run as privileged on Nomad Client Nodes.

As mentioned before, CSI can be run not only on Kubernetes, but also on all the other container orchestrators. CSI consists of controller and node . Let’s first create a Ceph CSI Controller job.

cat <<EOC > ceph-csi-plugin-controller.nomad
job "ceph-csi-plugin-controller" {
datacenters = ["dc1"]
group "controller" {
network {
port "metrics" {}
}
task "ceph-controller" {
template {
data = <<EOF
[{
"clusterID": "62c42aed-9839-4da6-8c09-9d220f56e924",
"monitors": [
"10.0.0.3:6789",
"10.0.0.4:6789",
"10.0.0.5:6789"
]
}]
EOF
destination = "local/config.json"
change_mode = "restart"
}
driver = "docker"
config {
image = "quay.io/cephcsi/cephcsi:v3.3.1"
volumes = [
"./local/config.json:/etc/ceph-csi-config/config.json"
]
args = [
"--type=rbd",
"--controllerserver=true",
"--drivername=rbd.csi.ceph.com",
"--endpoint=unix://csi/csi.sock",
"--nodeid=\${node.unique.name}",
"--instanceid=\${node.unique.name}-controller",
"--pidlimit=-1",
"--logtostderr=true",
"--v=5",
"--metricsport=\$\${NOMAD_PORT_metrics}"
]
}
resources {
cpu = 500
memory = 256
}
service {
name = "ceph-csi-controller"
port = "metrics"
tags = [ "prometheus" ]
}
csi_plugin {
id = "ceph-csi"
type = "controller"
mount_dir = "/csi"
}
}
}
}
EOC

clusterID and monitors should be changed to your values. This Ceph CSI Controller will be created as the type of service .

And create a Ceph CSI Node job.

cat <<EOC > ceph-csi-plugin-nodes.nomad
job "ceph-csi-plugin-nodes" {
datacenters = ["dc1"]
type = "system"
group "nodes" {
network {
port "metrics" {}
}

task "ceph-node" {
driver = "docker"
template {
data = <<EOF
[{
"clusterID": "62c42aed-9839-4da6-8c09-9d220f56e924",
"monitors": [
"10.0.0.3:6789",
"10.0.0.4:6789",
"10.0.0.5:6789"
]
}]
EOF
destination = "local/config.json"
change_mode = "restart"
}
config {
image = "quay.io/cephcsi/cephcsi:v3.3.1"
volumes = [
"./local/config.json:/etc/ceph-csi-config/config.json"
]
mounts = [
{
type = "tmpfs"
target = "/tmp/csi/keys"
readonly = false
tmpfs_options = {
size = 1000000 # size in bytes
}
}
]
args = [
"--type=rbd",
"--drivername=rbd.csi.ceph.com",
"--nodeserver=true",
"--endpoint=unix://csi/csi.sock",
"--nodeid=\${node.unique.name}",
"--instanceid=\${node.unique.name}-nodes",
"--pidlimit=-1",
"--logtostderr=true",
"--v=5",
"--metricsport=\$\${NOMAD_PORT_metrics}"
]
privileged = true
}
resources {
cpu = 500
memory = 256
}
service {
name = "ceph-csi-nodes"
port = "metrics"
tags = [ "prometheus" ]
}
csi_plugin {
id = "ceph-csi"
type = "node"
mount_dir = "/csi"
}
}
}
}
EOC

This Ceph Node Job is the type of system , that is, ceph csi node containers will be created on all the nomad client nodes.

Let’s run Ceph CSI Controller and Node jobs on Nomad.

nomad job run ceph-csi-plugin-controller.nomad;
nomad job run ceph-csi-plugin-nodes.nomad;

To see the status of ceph csi plugin.

nomad plugin status ceph-csi;
ID = ceph-csi
Provider = rbd.csi.ceph.com
Version = v3.3.1
Controllers Healthy = 1
Controllers Expected = 1
Nodes Healthy = 2
Nodes Expected = 2
Allocations
ID Node ID Task Group Version Desired Status Created Modified
b6268d6d 457a8291 controller 0 run running 1d21h ago 1d21h ago
ec265d25 709ee9cc nodes 0 run running 1d21h ago 1d21h ago
4cd7dffa 457a8291 nodes 0 run running 1d21h ago 1d21h ago

Now, it is ready to mount volumes from the external ceph storage using ceph csi driver.

Before going on, load the RBD module on all the nomad client nodes.

sudo modprobe rbd;
sudo lsmod |grep rbd;
rbd 83733 0
libceph 306750 1 rbd

Let’s create a ceph pool myPool and the admin user myPoolAdmin .

# Create a ceph pool:
sudo ceph osd pool create myPool 64 64
sudo rbd pool init myPool;
# create admin user for pool.
sudo ceph auth get-or-create-key client.myPoolAdmin mds 'allow *' mgr 'allow *' mon 'allow *' osd 'allow * pool=myPool'
AQCKf4JgHPVxAxAALZ8ny4/R7s6/3rZWC2o5vQ==

Now we need a volume to be registered on Nomad, create a volume.

cat <<EOF > ceph-volume.hcl
type = "csi"
id = "ceph-mysql"
name = "ceph-mysql"
external_id = "0001-0024-62c42aed-9839-4da6-8c09-9d220f56e924-0000000000000009-00000000-1111-2222-bbbb-cacacacacaca"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
fs_type = "ext4"
}
plugin_id = "ceph-csi"
secrets {
userID = "myPoolAdmin"
userKey = "AQCKf4JgHPVxAxAALZ8ny4/R7s6/3rZWC2o5vQ=="
}
context {
clusterID = "62c42aed-9839-4da6-8c09-9d220f56e924"
pool = "myPool"
imageFeatures = "layering"
}
EOF

userID and userKey are the values created above, and clusterID is Ceph Cluster ID, and pool is the value of myPool created before.

Take a look at external_id which is the unique ID for the individual volume. This ID is based on CSI ID Format. Let’s see the convention of the external_id .

<csi-id-version>-<cluster-id-length>-<cluster-id>-<pool-id>-<uuid>

With this convention, the following external_id can be divided to the seperate parts.

0001-0024-62c42aed-9839-4da6-8c09-9d220f56e924-0000000000000009-00000000-1111-2222-bbbb-cacacacacaca
  • CSI ID Version: 0001
  • Cluster ID Length: 0024
  • Cluster ID: 62c42aed-9839–4da6–8c09–9d220f56e924
  • Pool ID: 0000000000000009
  • UUID: 00000000–1111–2222-bbbb-cacacacacaca

For the Pool ID, you can get the ID of the pool myPool in ceph.

sudo ceph osd lspools
1 cephfs_data
2 cephfs_metadata
3 foo
4 bar
5 .rgw.root
6 default.rgw.control
7 default.rgw.meta
8 default.rgw.log
9 myPool
10 default.rgw.buckets.index
11 default.rgw.buckets.data

The ID of myPool is 9.

The UUID is the unique ID for the volume. If you want to create new volume from the same pool, you need to set new UUID.

Before registering a volume on Nomad, an image in the pool myPool needs to be created before.

sudo rbd create csi-vol-00000000-1111-2222-bbbb-cacacacacaca --size 1024 --pool myPool --image-feature layering;

Take a look at the name of the image csi-vol-00000000–1111–2222-bbbb-cacacacacaca . Ceph CSI parameter volumeNamePrefix has the default value of csi-vol- . The rest is the UUID of external_id based on CSI ID format mentioned above. The name of image to be created has the following convention.

<volume-name-prefix><uuid>

Now, register the volume on Nomad.

nomad volume register ceph-volume.hcl;

To see the status of the volume registered.

nomad volume status;
Container Storage Interface
ID Name Plugin ID Schedulable Access Mode
ceph-mys ceph-mysql ceph-csi true single-node-writer

Let’s mount the volume provisioned by Ceph CSI with running the example MySQL Server job on Nomad.

cat <<EOF > mysql-server.nomad
job "mysql-server" {
datacenters = ["dc1"]
type = "service"
group "mysql-server" {
count = 1
volume "ceph-mysql" {
type = "csi"
read_only = false
source = "ceph-mysql"
}
network {
port "db" {
static = 3306
}
}
restart {
attempts = 10
interval = "5m"
delay = "25s"
mode = "delay"
}
task "mysql-server" {
driver = "docker"
volume_mount {
volume = "ceph-mysql"
destination = "/srv"
read_only = false
}
env {
MYSQL_ROOT_PASSWORD = "password"
}
config {
image = "hashicorp/mysql-portworx-demo:latest"
args = ["--datadir", "/srv/mysql"]
ports = ["db"]
}
resources {
cpu = 500
memory = 1024
}
service {
name = "mysql-server"
port = "db"
check {
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
}
EOF
# run mysql job.
nomad job run mysql-server.nomad;

Let’s check if the volume from Ceph RBD is mounted with entering the allocated mysql server container.

nomad alloc exec bfe37c92 sh
# df -h
Filesystem Size Used Avail Use% Mounted on
...
/dev/rbd0 976M 180M 781M 19% /srv
...

As seen, the volume ceph-mysql has been mounted to the path /srv .

Now, we will check if the MySQL data is lost or not after resubmitting mysql server job on Nomad.

Let’s connect to MySQL Server in the container.

mysql -u root -p -D itemcollection;
... type the password of MYSQL_ROOT_PASSWORD
mysql> select * from items;
+----+----------+
| id | name |
+----+----------+
| 1 | bike |
| 2 | baseball |
| 3 | chair |
+----+----------+

Let’s add some rows.

INSERT INTO items (name) VALUES ('glove');
INSERT INTO items (name) VALUES ('hat');
INSERT INTO items (name) VALUES ('keyboard');

Make sure the rows are inserted successfully.

mysql> select * from items;
+----+----------+
| id | name |
+----+----------+
| 1 | bike |
| 2 | baseball |
| 3 | chair |
| 4 | glove |
| 5 | hat |
| 6 | keyboard |
+----+----------+
6 rows in set (0.00 sec)

Now, stop mysql server job with this.

nomad stop -purge mysql-server;

And, submit mysql server job again.

nomad job run mysql-server.nomad

Check if the mysql data exists without loss. After entering the allocated mysql server container, type the following as done previously.

mysql -u root -p -D itemcollection;
... type the password of MYSQL_ROOT_PASSWORD
mysql> select * from items;
+----+----------+
| id | name |
+----+----------+
| 1 | bike |
| 2 | baseball |
| 3 | chair |
| 4 | glove |
| 5 | hat |
| 6 | keyboard |
+----+----------+
6 rows in set (0.00 sec)

Even if mysql server job has been restarted on Nomad, there is no data loss for mysql server.

You have seen how to provision volumes on Kubernetes and on Nomad using the same Ceph CSI for now.

To provision volumes from the external Ceph Storage using CSI, it is more simpler on Kubernetes than on Nomad. You can provision volumes on Kubernetes dynamically, but currently, Nomad does not support dynamic volume provisioning yet. Lately, Nomad 1.1 Beta release was announced with the improvement of CSI support, for instance, nomad volume create <volume-hcl> will create an image of the pool in ceph automatically.

References

--

--