Sandboxed Containers: Are Virtual Machines making a Comeback in a new Avatar?

Published in

ITNEXT

6 min readNov 8, 2021

Magawa, a Gambian pouched rat capable of detecting unexploded land mines, has discovered many land mines and other items of ordnance left over from the Cambodian Civil War. The rat was awarded the PDSA Gold Medal in Sep 2020, an animal bravery award that acknowledges the bravery and devotion to duty of animals. (Source: Wikipedia)

Introduction

Digital technologies like AI, Big data and Blockchain are fuelling the need for applications that can scale massively within microseconds. Developers within enterprises are under pressure to churn out new features within days. Hybrid multi-cloud is the new normal, and digital transformation is a requisite to remain competitive. Containers have emerged as an essential technology in taking on these challenges. The runaway success of containers with both developers and operators has seen rapid and wide-spread adoption of this technology in a wide variety of industries.

Containers are Awesome

Container technology has almost dethroned virtualization as the primary means of consuming computing resources, both on-prem and on cloud. A Virtual machine is essentially a hardware virtualization technology i.e. a piece of software called VMM, running on a bare-metal host, takes control of the hardware resources (CPU, memory, hard disk etc.) and presents them as multiple, discrete virtual resources. This allows for running multiple, discrete Virtual machines on the host, each having its own operating system.

Containers, on the other hand, do not virtualize the hardware but use advanced features of the Host’s OS Kernel (cgroups and namespaces) to create an isolated compute environment on the host. The application runs within this isolated computing environment. To the application, the container seems like a virtual machine, whereas to the host, the container is just another process. In a nutshell, you could take a bare metal server (or a virtual server) that runs an operating system and turn it into a platter of isolated containers. However, all these containers share the host’s OS Kernel.

Virtual Machines offer Strong Isolation

While containers are great, one should be cognizant of the fact that multiple containers running on a server, share the same OS Kernel. And, as with any piece of software, containerized applications could have security vulnerabilities. The source of these vulnerabilities could lie in the application source code or in one of the libraries that the application uses. In such a situation, if a malicious user takes advantage of these vulnerabilities and is able to hack into the container, there’s a high chance the intruder will be able to advance the attack and penetrate into the host’s kernel. If that happens, it could turn out to be catastrophic, since this exposes the rest of the containers on the host as well.

If the same application were to run inside a virtual machine, at the most the intruder could penetrate into the guest kernel of that VM alone, and will not be able to gain access to the server’s kernel. This way the security attack is confined to the affected VM alone and the rest of the VMs on the server are unaffected. In a nutshell, a VM provides a much higher degree of isolation when compared to a container, simply because it has its own kernel.

Sandboxed Containers: Light weight VMs

Because containers share more resources from the host, their usages of storage, memory, and CPU cycles are much more efficient than a VM. However, the downside of more sharing is the weaker trust boundary between the containers and the host. In general, virtualized hardware isolation creates a much stronger security boundary than namespace isolation used in containers. The risk of an attacker escaping a container (process) is much higher than the chances of escaping a VM. Several applications and use cases do require a high degree of isolation and at the same time require the benefits of containers i.e. the best of both worlds. For e.g., a system that provides services to multiple tenants would require strong isolation so that application data doesn’t gets leaked.

The obvious question that arises is: Is there a way I could run a containerized application and at the same time benefit from the high degree of isolation offered by VMs? The answer is a resounding “Yes”. Welcome to a new breed of containers called Sandboxed Containers.

In short, these are containers that have their own kernel, much like a Virtual machine. This layer of the kernel is called as a user-space kernel. However, this kernel is very light-weight in nature, written using modern programming techniques and is purpose-built only for acting as an additional layer of strong isolation between the container and the host.

Consequently, it can be dynamically configured on the fly and can be created and destroyed rapidly. Such Sandboxed containers can also play well with container tools like Docker and Kubernetes, if they comply to Open Container Initiative (OCI) and Container Runtime Interface (CRI) specifications.

In the next sections, we’ll delve into two such Sandbox container technologies that are widely used: Google gVisor and Kata Containers

Google gVisor

gVisor is an application kernel, written in Go, that implements a substantial portion of the Linux system call interface. It provides an additional layer of isolation between running applications and the host operating system.

It includes an OCI runtime called runsc that makes it easy to work with existing container tooling. The runsc runtime integrates with Docker and Kubernetes, making it simple to run sandboxed containers. gVisor can be used with Docker, Kubernetes, or directly using runsc

When you spin up a container using gVisor, the runsc container runtime does the plumbing work to first bring up an instance of the user-space kernel. The user-space kernel acts as another layer of defense by intercepting the system calls from the container and either serving the calls by itself or passing them to the host. The actual application is started on top of this user-space kernel. To make things more appealing, gVisor is an open source project, so you may add new features and enhancements when you see the need. It conforms to CRI specification as well, so gVisor containers can be run in Kubernetes.

Kata Containers

Kata Containers are as light and fast as containers and integrate with the container management layers — including popular orchestration tools such as Docker and Kubernetes (K8s) — while also delivering the security advantages of VMs.

Kata container is fully integrated with OCI, Container Runtime Interface (CRI), and Container Networking Interface (CNI). It supports various types of networking models (e.g., passthrough, MacVTap, bridge, tc mirroring) and configurable guest kernels so that the applications requiring special networking models or kernel versions can all run on it. Above figure shows how containers inside Kata VMs interact with the existing orchestration platforms.

Kata has a kata-runtime on the host to start and configure new containers. For each container in Kata VM, there is a corresponding Kata Shim on the host. Kata Shim receives API requests from the clients (e.g., docker or kubectl) and forwards the requests to the agent inside the Kata VM through VSock. Kata containers further make several optimizations to reduce the VM boot time.

Conclusion

Sandboxed Containers offer much stronger isolation when compared with normal containers. Such high degree of isolation offers an additional layer of security and can be useful in several use cases and industries. Such container technologies are compliant with OCI and CRI specifications, so they play well with existing container tools and Kubernetes. It’s amazing to see how Virtual Machines are making a comeback in a new Avatar!

Thanks for reading. If you like this article, please follow me on Twitter(@senthilch) and Medium(Senthil Raja Chermapandian)

Check out kube-fledged, a Kubernetes operator for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly