Backup and Restore of Kubernetes Stateful Application Data with CSI Volume Snapshots

Zhimin Wen
ITNEXT
Published in
7 min readJan 18, 2021

--

With the GA release of volume snapshots, the CSI volume snapshot for stateful application data backup and restore is more mature. This paper explores how we can use the standard Kubernetes snapshot resources to backup and restore the data located on the PV.

Let's use the IBM event streams as the test target. It is running a statefulset K8s resource of Kafka based on the Strimizi Operator. Assume we have 3 replicas of the statefulset and the data are saved in the PVC named as data-es-kafka-0, data-es-kafka-1, data-es-kafka-2 respectively. The PVCs are provided by the Rook Ceph.

Volume snapshot class

Just as PVC can dynamically create the PV through the storage class, the volume snapshot content, the actual snapshot data, can be created dynamically through the snapshot class when a volume snapshot resource is requested.

In the Rook operator case, if the RBD snapshot class is not created yet, we need to apply the YAML file from GitHub source, rook/cluster/examples/kubernetes/ceph/csi/rbd/snapshotclass.yaml This will allow the snapshot to be dynamically through the CSI driver. The snapshot class name is defined as csi-rbdplugin-snapclass

State of the application

Before we take the snapshot of the data on the PVC, let's record down the current state of the application by showing the messages on the demo topic from the UI.

The offset of each partition is shown in the screen capture. The same data is expected when we restore with the snapshot at this moment.

Volume snapshot

With the snapshot class created, we can request the snapshot dynamically. For each of the Kafka broker’s PVC, create the snapshot with the following YAML file,

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: snapshot-data-es-kafka-0-1610889673
spec:
volumeSnapshotClassName: csi-rbdplugin-snapclass
source:
persistentVolumeClaimName…

--

--