Top 5 Kubernetes Challenges for Platform Teams

Using vCluster to build scalable, efficient, and manageable platforms

Published in
10 min readSep 8, 2024

Introduction

As more organizations move towards cloud-native architectures, platform teams increasingly choose to standardize their developer platforms on Kubernetes, driven by its flexibility and scalability.

According to the Cloud Native Computing Foundation’s 2023 survey, over 84% of organizations use or evaluate Kubernetes as their standard platform for container orchestration.

Cloud Native 2023: The Undisputed Infrastructure of Global Technology

However, in today’s fast-paced software development landscape, when creating efficient self-service platforms for developers we often grapple with the following 5 challenges:

1. Flexible Multi-Tenancy: Ensuring that development teams have the right amount of autonomy and isolation.

The challenge lies in balancing autonomy and isolation — giving teams the freedom to innovate without compromising security or risking resource conflicts, all while ensuring efficient management of resources across various teams and projects.

2. Kubernetes Clusters Management: Clusters and infrastructure management.

Managing Kubernetes clusters introduces complexity in maintaining consistent performance across environments, dealing with upgrades, and ensuring that scaling and deployments happen smoothly without downtime or breaking dependencies between services.

3. GitOps Integration: Automation for consistent, auditable deployments.

Challenges in ensuring workflow consistency across multiple environments, and auditing changes while maintaining speed and flexibility in continuous delivery pipelines.

4. Cost Management: Scaling efficiently while keeping costs in check.

The key challenge in scaling is to manage resource usage efficiently without incurring unnecessary costs, monitor cloud expenses as infrastructure scales, and ensure that autoscaling policies align with both performance needs and budget constraints.

5. Environment Consistency: Ensuring parity across dev, test, and production.

Maintaining consistency between environments is difficult as differences in configurations, data handling, and scaling can lead to unexpected issues when moving workloads from dev/test to production, often requiring constant monitoring to prevent drift.

In this blog post, we’ll explore how we can leverage vcluster to address those challenges when building self-service platforms for developers.

Definition: Virtual clusters are fully functional Kubernetes clusters nested inside a physical host cluster. They provide better isolation and flexibility to support multi-tenancy. Multiple teams can operate independently within the same physical infrastructure, minimizing conflicts, maximizing autonomy, and reducing costs.

Check out this blog if you are not familiar with vCluster fundamentals

Multi-Team Development

Let’s consider a scenario where three teams — Frontend, Backend, and Data Science — are working on different parts of a large application. Each team has unique requirements that can be met using vCluster’s flexible configuration options.

Teams vCluster isolation

We can create a separate isolated and different virtual Kubernetes environment with the command vcluster create my-vcluster — namespace team-xxx — values vcluster.yaml with the following configs.

Frontend Team

The Frontend team needs the latest Kubernetes version for new Ingress features, limited resources, and specific ConfigMaps and secrets synchronization.

sync:
toHost:
configMaps:
enabled: true
all: false
secrets:
enabled: true
all: false
pods:
enabled: true
rewriteHosts:
enabled: true
controlPlane:
distro:
k8s:
enabled: true
version: "v1.25.0"

This configuration ensures that the Frontend team has access to Kubernetes v1.25.0, with selective syncing of ConfigMaps and secrets, and pod host rewriting enabled for seamless integration with the host cluster.

Backend Team

The Backend team requires stability and is using an older Kubernetes version. They also need dedicated persistent volumes and quotas on memory and CPU usage.

sync:
toHost:
persistentVolumeClaims:
enabled: true
configMaps:
enabled: false
secrets:
enabled: true
controlPlane:
distro:
k8s:
enabled: true
version: "v1.23.6"
policies:
resourceQuota:
enabled: true
quota:
requests.cpu: "500m"
requests.memory: "1Gi"

This setup provides the Backend team with Kubernetes v1.23.6, ensures their persistent volume claims are synced to the host cluster, and implements resource quotas to manage their CPU and memory usage.

Data Science Team

The Data Science team needs specific CRDs for machine learning workflows and larger memory allocations.

sync:
toHost:
secrets:
enabled: true
configMaps:
enabled: true
controlPlane:
distro:
k8s:
enabled: true
version: "v1.24.3"
policies:
limitRange:
enabled: true
default:
memory: "4Gi"
defaultRequest:
memory: "2Gi"

This configuration gives the Data Science team Kubernetes v1.24.3, syncs both secrets and ConfigMaps, and sets up limit ranges to ensure their pods have access to the larger memory allocations required for machine learning tasks.

Benefits

By using vCluster with these tailored configurations, each team gets:

  1. Isolation: Each team works in its virtual cluster, preventing interference with other teams’ work.
  2. Customization: Teams can use the Kubernetes version and features that best suit their needs.
  3. Resource Control: Platform teams can allocate resources appropriately for each team’s requirements.
  4. Flexibility: Configurations can be easily updated as team needs evolve, without impacting other teams.

Kubernetes Clusters Management

Now let’s examine cluster management. For platform teams, managing multiple Kubernetes environments can be a complex and time-consuming task. vCluster provides solutions that significantly reduce this operational overhead.

Standardizing Environments

One of the primary challenges in managing multiple Kubernetes clusters is maintaining consistency across environments. vCluster addresses this by allowing platform teams to:

  • Create templated virtual clusters with pre-configured policies, resources, and tools.
  • Ensure all teams start with a baseline configuration that adheres to organizational standards.
  • Easily update these templates to roll out new standards across all environments.

Example of a standardized vCluster template:

controlPlane:
distro:
k8s:
version: "v1.24.0"
networking:
advanced:
clusterDomain: "acmee.platform.com"
policies:
networkPolicy:
enabled: true
resourceQuota:
enabled: true
quota:
requests.cpu: "2"
requests.memory: "4Gi"

This template ensures all virtual clusters start with the same Kubernetes version, network configuration, and basic resource limits.

Reducing Operational Overhead

vCluster allows platform teams to manage multiple “clusters” within a single physical cluster, which simplifies infrastructure management in several ways:

  • Centralized Management: Administer multiple virtual clusters from a single control plane.
  • Unified Monitoring and Logging: Aggregate logs and metrics from all virtual clusters in one place.
  • Simplified Backup and Restore: Perform backup and restore operations for multiple virtual clusters simultaneously.

Enabling Easy Updates

One of the most significant advantages of vCluster is the ability to roll out updates or new features to developer environments without affecting the underlying cluster:

  • Independent Upgrades: Upgrade the Kubernetes version of a vCluster without touching the host cluster or other virtual clusters.
  • Feature Rollouts: Introduce new features or configurations to specific virtual clusters as needed.
  • Rollback Capabilities: Easily revert changes if an update causes issues, without impacting other environments.

Example of upgrading a vCluster’s Kubernetes version:

controlPlane:
distro:
k8s:
enabled: true
version: "v1.25.0" # Updated from previous version

GitOps with vCluster

vCluster’s design aligns perfectly with the GitOps methodology, which emphasizes managing Kubernetes environments declaratively, through version control and automation.

GitOps brings a structured, repeatable approach to managing environments by defining, applying, and monitoring the desired state of infrastructure through code.

Linke every GitOps workflow we are following the GitOps Principles

  • Declarative
  • Versioned and Immutable
  • Pulled Automatically
  • Continuously Reconciled

Example GitOps Workflow with vCluster

  1. Platform teams define and store vCluster configurations in Git repositories.
  2. GitOps tools (e.g., Flux, ArgoCD) monitor these repositories for changes.
  3. Upon any approved pull request, the GitOps tool automatically applies the updates to the appropriate virtual clusters.
  4. The GitOps tool continuously ensures that the virtual clusters remain in the desired state as defined in Git, providing immediate feedback if any discrepancies arise.

Benefits

  • Version Control: All vCluster configurations are version-controlled, allowing for collaboration and auditing.
  • Automation: GitOps eliminates manual intervention by automating the process of applying configuration changes.
  • Consistency: Continuous reconciliation ensures that all virtual clusters are consistent with their Git-defined state.
  • Auditability: Full visibility and traceability of changes for compliance and governance.

By using GitOps with vCluster, platform teams can achieve a highly efficient, scalable, and secure method of managing Kubernetes environments, reducing errors and speeding up deployments.

Virtual clusters are not all or nothing, it’s possible to share some of the host cluster’s underlying services and infrastructure using the concept of shared services which is achieved by various syncing methods.

vCluster Shared Services

Key Scalability and Cost Benefits

Efficient Resource Usage

  • Multiple virtual clusters share underlying nodes
  • Improved resource utilization compared to separate physical clusters

Rapid Scaling

  • Quickly spin up new developer environments
  • No need to provision new infrastructure for each new “cluster”

Example vCluster creation with helm:

helm upgrade --install my-vcluster vcluster \
--repo https://charts.loft.sh \
--namespace my-vcluster \
--create-namespace \
--set persistence.enabled=true

Cost Optimization

  • Consolidate multiple “clusters” onto fewer actual Kubernetes clusters
  • Reduce cloud costs by maximizing resource utilization

Let’s imagine you need to provide isolated environments for 20 development teams:

  • Traditional approach: 20 separate Kubernetes clusters
  • vCluster approach: 1 large Kubernetes cluster hosting 20 virtual clusters

Sleep Mode

Virtual clusters can be put into sleep mode to temporarily scale down and remove workloads, saving resources when environments are not in use. The CLI commands allow you to easily sleep and wake virtual clusters as needed:

  • Wake: vcluster wakeup my-vcluster -n my-vcluster-namespace

This helps balance resource usage dynamically, keeping costs low while maintaining the ability to rapidly activate environments.

Simplified Sleep Mode Flow

Environment Consistency

Another challenge that Platform Teams face is keeping environment parity between stages. Additionally, there are on-demand environments, ephemeral testing environments etc. All those can be virtualized. Here are some of the common environment types.

  • Development Environment: Multiple virtual clusters for different teams, allowing isolated development spaces.
  • Staging Environment: virtual clusters that mimic the production environment for testing and validation.
  • Production Environment: Critical virtual clusters running live applications.
  • Upgrades and Testing: virtual clusters for development and staging can be independently upgraded to test new Kubernetes versions before applying them to production.
  • On-demand Environments: Teams can quickly spin up temporary environments for testing or specific projects without provisioning new infrastructure.
Applying Policies per Team/Environment

vCluster significantly simplifies cluster management by providing a centralized control plane, standardized templates, and streamlined operations. This reduces operational complexity for platform teams and enhances the responsiveness of development teams. In the next chapter, we’ll explore how vCluster integrates with GitOps workflows, further improving deployment processes.

  • Reduced Complexity: vCluster allows multiple “clusters” to be managed from a single control plane, minimizing the overhead of managing separate clusters.
  • Improved Consistency: Environments can be standardized using templates, ensuring uniform configurations across development, staging, and production.
  • Faster Updates and Rollbacks: Changes can be quickly applied and, if needed, reverted with minimal disruption, ensuring faster issue resolution.
  • Cost Efficiency: By running multiple virtual clusters on shared infrastructure, resource usage is optimized, reducing operational costs.
  • Enhanced Flexibility: Environments can be easily created, modified, or removed as required, giving platform teams the agility to meet changing business needs.

Bonus Challenge: Enhancing Developer Experience

Building a Platform is ultimately about developers that Platform Teams serve and focusing on their experience and reducing friction is often a subjective expectation.

Developers often require flexibility where teams can experiment with different Kubernetes versions or configurations without affecting others. They also need low friction flow with reduced wait times and self-service provisioning.

Flexibility and Self-Service

With virtual clusters, developers can easily create isolated Kubernetes environments for experimentation or testing without interfering with other teams’ configurations.

By integrating vCluster provisioning into a self-service portal (e.g., Backstage or custom internal tooling), developers can quickly spin up and manage environments on demand, reducing wait times for infrastructure teams to manually provision resources. Automating this process enables virtual clusters to be available within minutes, giving developers more autonomy and reducing bottlenecks.

Improved Testing and CI/CD

vCluster helps when creating isolated Kubernetes testing environments is required. For example, each PR or feature branch can have its own vCluster, ensuring truly isolated testing.

Example GitHub Actions workflow using vCluster:

name: Pull Request Checks
on:
pull_request:
branches:
- "main"
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- name: Install vCluster CLI
uses: loft-sh/setup-vcluster@main
- name: Create PR Virtual Cluster
env:
NAME: pr-${{ github.event.pull_request.number }}-${{ github.sha }}-${{ github.run_id }}
run: vcluster create $NAME --project default
- name: Deploy Application
run: kubectl apply -Rf ./kubernetes
- name: Wait for Deployment
run: kubectl rollout status deployments/my-app
- name: Run Tests
run: make e2e
- name: Delete PR Virtual Cluster
env:
NAME: pr-${{ github.event.pull_request.number }}-${{ github.sha }}-${{ github.run_id }}
run: vcluster delete $NAME --project default

By leveraging vCluster, organizations can provide developers with more autonomy and flexibility while ensuring robust testing and CI/CD processes. This leads to faster development cycles, improved code quality, and more efficient use of Kubernetes resources.

Closing Thoughts

As we’ve explored in this blog, vCluster offers a powerful solution to many of the challenges faced by platform engineers and DevOps teams in managing Kubernetes environments. Like with server virtualization, the technology is enabling countless use cases and will likely grow to encompass more ideas.

As Kubernetes continues to evolve, tools like vCluster will play an increasingly crucial role in helping organizations manage complex, multi-tenant environments efficiently. By adopting vCluster, teams can stay ahead of the curve in cloud-native infrastructure management.

Next Steps

  1. Try vCluster: Set up a test environment and experience the benefits firsthand.
  2. Engage with the Community: Join the vCluster Slack channel to learn from others and share your experiences.
  3. Implement in Your Organization: Start small with a pilot project, then scale your vCluster implementation as you see the benefits.

For more information and to get started, visit vcluster.com.

Thanks for taking the time to read this post. I hope you found it interesting and informative.

🔗 Connect with me on LinkedIn

🌐 Visit my Website

📺 Subscribe to my YouTube Channel

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Written by Piotr

My mission is to share enthusiasm and appreciation for digital technology. I want to inspire you to try tools, create and use software and learn new things.

Responses (1)

What are your thoughts?

Why not to use namespaces instead? What the benefits to use vcluster with all its complexity and dependencies?

--