Simple Backups for Small Kubernetes Clusters

Use Velero and Restic to protect your small clusters

Anthony Critelli
ITNEXT

--

Data backup and recovery is a critical part of any environment’s production strategy, and Kubernetes is no exception. However, backups are an area that don’t receive enough attention in discussions about Kubernetes. This is particularly true in very small Kubernetes environments, such lab or edge environments.

I use Kubernetes in my home lab, and I’ve come to rely on applications deployed in my cluster for many day-to-day tasks. Over time, I’ve developed an effective backup strategy for this small environment. In this article, I’ll discuss the backup approach and tools I use for my home lab. While this is a small lab environment, many of the principles can be applied to any small or medium-sized Kubernetes cluster.

My Environment

My Kubernetes home lab consists of a mix of Raspberry Pis and x86 hosts running the K3s Kubernetes distribution. Crucially, my environment lacks any distributed storage. All PersistentVolumeClaims are satisfied by the K3s Local Path Provisioner. Large production Kubernetes clusters often rely on distributed and highly-available storage systems. However, this isn’t always practical in small lab or edge environments, and it’s not unusual to see hostPath volumes.

Most of my workloads are ephemeral: I test out new applications, and I support a few services for my home network. However, I do have some persistent workloads that store important information on their PersistentVolumes. I need a solution that can back up both Kubernetes resources and persistent application data.

Overall, I have the following back up approach:

  1. All Kubernetes manifests are stored in Git. I use ArgoCD to deploy my applications using GitOps.
  2. Velero backs up the state of my Kubernetes cluster, including any resources running in the cluster.
  3. A CronJob runs a simple Restic script to back up persistent workloads. This is done on a per-workload basis.

I’ll skip a discussion about ArgoCD, because that is a topic that deserves its own coverage. It’s also not a requirement. Instead, I will focus on my use of Velero and Restic to back up cluster state and persistent storage.

Velero for Cluster State Backup

Velero is a comprehensive Kubernetes backup tool. It’s capable of backing up and restoring both Kubernetes resources and data on PersistentVolumes. Velero must store backup data somewhere, and it supports S3-compatible object storage. I use Backblaze B2 for my personal object storage because it’s S3-compatible and very inexpensive.

Velero is easy to set up in a small environment. It’s available through several package managers, or via a direct binary download. Follow the official installation instructions to download the Velero utility for your environment.

Once the Velero CLI is installed, I create a file with the the credentials for my Backblaze B2 bucket:

; credentials-velero.txt
[default]
aws_access_key_id=${BACKBLAZE_ACCESS_KEY_ID}
aws_secret_access_key=${BACKBLAZE_SECRET_ACCESS_KEY}

Next, I install Velero in my cluster and configure it to use my Backblaze B2 storage bucket:

velero install --use-node-agent \
--provider aws \
--bucket my-bucket-name \
--backup-location-config s3Url=https://s3.us-west-000.backblazeb2.com,region=us-west-000,s3ForcePathStyle=true \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--secret-file credentials-velero.txt \
--use-volume-snapshots=false \
--default-volumes-to-fs-backup

Finally, I schedule a backup to run every 6 hours in my cluster:

velero schedule create homek8s --schedule="@every 6h"

I occasionally check on the status of my backups using the velero backup get command:

$ velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
homek8s-20240108152746 Completed 0 7 2024-01-08 10:27:46 -0500 EST 29d default <none>
homek8s-20240108092745 Completed 0 7 2024-01-08 04:27:45 -0500 EST 29d default <none>
...

The velero backup describe command provides additional information, including any errors or warnings that were encountered when running the backup:

$ velero backup describe homek8s-20240108152746
Name: homek8s-20240108152746
Namespace: velero
Labels: velero.io/schedule-name=homek8s
velero.io/storage-location=default
Annotations: velero.io/resource-timeout=10m0s
velero.io/source-cluster-k8s-gitversion=v1.27.3+k3s1
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=27

Phase: Completed


Warnings:
Velero: error: /the server is currently unable to handle the request
error: /the server is currently unable to handle the request
Cluster: <none>
Namespaces:
argocd: resource: /pods name: /argocd-repo-server-bc9c646dc-bg5zc
trilium: resource: /pods name: /trilium-0
resource: /pods name: /trilium-backup-28407360-kkflv error: /backup for volume trilium-data is skipped: pod is not in the expected status, name=trilium-backup-28407360-kkflv, namespace=trilium, phase=Succeeded: pod is not running
resource: /pods name: /trilium-backup-28408800-2vmc9 error: /backup for volume trilium-data is skipped: pod is not in the expected status, name=trilium-backup-28408800-2vmc9, namespace=trilium, phase=Succeeded: pod is not running
resource: /pods name: /trilium-backup-28410240-nc7sm error: /backup for volume trilium-data is skipped: pod is not in the expected status, name=trilium-backup-28410240-nc7sm, namespace=trilium, phase=Succeeded: pod is not running

Namespaces:
Included: *
Excluded: <none>

Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto

Label selector: <none>

Or label selector: <none>

Storage Location: default

Velero-Native Snapshot PVs: auto
Snapshot Move Data: false
Data Mover: velero

TTL: 720h0m0s

CSISnapshotTimeout: 10m0s
ItemOperationTimeout: 4h0m0s

Velero does an excellent job backing up all of the resources in my Kubernetes cluster, but it does have a limitation: it cannot back up hostPath volumes. This is a challenge in my lab environment that relies on the K3s Local Path Provisioner. To work around this, I use Restic directly to back up data on PersistentVolumes.

Restic for Persistent Data

Backing up Kubernetes resources is important for restoring cluster state in the event of a catastrophic failure, but it doesn’t solve the problem of safeguarding application data. Since Velero cannot back up hostPath volumes, I need another solution for my small home cluster. I use Restic, a powerful backup tool, for a few reasons:

  1. I already use Restic heavily for my personal laptop backups
  2. It supports a wide variety of backup repositories, including S3-compatible storage. It even has direct support for Backblaze B2.
  3. Restic has an official container image, which makes it perfect for use as a Kubernetes CronJob.

I mentioned that most of my workloads are ephemeral, but I do have some applications that store persistent data. One of those is Trilium, a powerful note-taking app. I keep all of my notes in Trilium, and I want to make sure I can restore my Trilium data in the event of a cluster failure. Trilium uses a PersistentVolumeClaim to provide a location for its persistent data:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: trilium-data
namespace: trilium
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: local-path

The Trilium StatefulSet references this PVC as the location for its data. I use a PersistentVolumeClaim instead of a Volume Claim Template because I only have a single replica of the Trilium application:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: trilium
namespace: trilium
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: trilium
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/name: trilium
spec:
containers:
- name: trilium
image: "docker.io/zadam/trilium:0.61.15"
ports:
- containerPort: 8080
protocol: TCP
volumeMounts:
- mountPath: /home/node/trilium-data
name: trilium-data
env:
- name: TRILIUM_PORT
value: "8080"
restartPolicy: Always
volumes:
- name: trilium-data
persistentVolumeClaim:
claimName: trilium-data

I need my backup solution to safeguard the information at /home/node/trilium-data. Once Restic is installed, the first step is to initialize a new repository to store backups:

# Set Backblaze credentials as environment variables
export B2_ACCOUNT_ID=${BACKBLAZE_ACCOUNT_ID}
export B2_ACCOUNT_KEY==${BACKBLAZE_ACCOUNT_KEY}

# Initalize the repository. This will prompt for an encryption password.
# Keep this somewhere safe!
restic -r b2:my-bucket:/trilium-data init

PersistentVolumes can be mounted in multiple Pods simultaneously. Backups are a perfect use case for this functionality, since the backup job’s Pod can temporarily mount the PersistentVolume that it needs to back up and then unmount it when the backup is complete.

Backups must also run on a regular cadence, and a Kubernetes CronJob is ideal for this type of workload. I schedule a CronJob to run each morning at 0300 hours. It mounts the same PersistentVolume as the application workload, but it mounts it as readOnly to prevent the backup Pod from making any changes to the application data.

All of the necessary Restic configuration, such as the bucket and credentials, are provided as environment variables. The full CronJob specification is shown below.

---
apiVersion: batch/v1
kind: CronJob
metadata:
name: trilium-backup
spec:
# Every day at 0300
schedule: 0 3 * * *
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: restic
image: docker.io/restic/restic:0.16.2
command:
- restic
- "-r"
- "b2:$(BUCKET_NAME):/trilium-backup"
- "--verbose"
- "backup"
- "/home/node/trilium-data"
env:
- name: B2_ACCOUNT_ID
valueFrom:
secretKeyRef:
name: restic
key: B2_ACCOUNT_ID
- name: B2_ACCOUNT_KEY
valueFrom:
secretKeyRef:
name: restic
key: B2_ACCOUNT_KEY
- name: RESTIC_PASSWORD
valueFrom:
secretKeyRef:
name: restic
key: RESTIC_PASSWORD
- name: BUCKET_NAME
valueFrom:
secretKeyRef:
name: restic
key: BUCKET_NAME
volumeMounts:
- mountPath: /home/node/trilium-data
name: trilium-data
volumes:
- name: trilium-data
persistentVolumeClaim:
claimName: trilium-data

I regularly check on the status of the Jobs using kubectl:

# Check on the most recent job status
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
trilium-backup-28410240 1/1 31s 2d14h
trilium-backup-28411680 1/1 37s 29h
trilium-backup-28413120 1/1 34s 14h

# Check logs for a specific Job
$ kubectl logs job/trilium-backup-28410240
open repository
lock repository
no parent snapshot found, will read all files
load index files
start scan on [/home/node/trilium-data]
start backup on [/home/node/trilium-data]
scan finished in 17.793s: 58 files, 236.450 MiB

Files: 58 new, 0 changed, 0 unmodified
Dirs: 6 new, 0 changed, 0 unmodified
Data Blobs: 8 new
Tree Blobs: 7 new
Added to the repository: 3.650 MiB (2.468 MiB stored)

processed 58 files, 236.450 MiB in 0:21
snapshot 8b5f5603 saved

This solution is very simple and very flexible. I don’t need any custom scripts or containers to back up stateful data: Restic provides a container image that supports full configuration via environment variables. Restic is a mature, robust, and simple backup solution that is ideal for simple Kubernetes backup scenarios.

Wrapping Up

Backups are an important part of any environment, including workloads running in Kubernetes. While a variety of options are available to perform backups in larger or cloud-based environments, small Kubernetes clusters and edge environments also require careful consideration to properly safeguard their data.

Velero and Restic work together very well in these environments. Velero makes it easy to back up cluster state (and data, if you aren’t using hostPath volumes), while Restic makes application data backups a breeze. While this solution has some caveats, its a great start for those small Kubernetes clusters that many engineers have at home or in their development environments.

In this article, you saw how I implement my home lab backups. I’d love to hear your thoughts about how you keep your smaller clusters safely backed up in case disaster strikes.

--

--