Benchmark results of Kubernetes network plugins (CNI) over 10Gbit/s network (Updated: August 2020)

Alexis Ducastel
ITNEXT
Published in
8 min readSep 8, 2020

--

New Version available : 2024

This article is a bit old now, it has been updated. Check it out !

https://itnext.io/benchmark-results-of-kubernetes-network-plugins-cni-over-40gbit-s-network-2024-156f085a5e4e#89d8-90c23c8caeb4-reply

This article is an update of my previous benchmark (2018 and 2019), now running Kubernetes 1.19 and Ubuntu 18.04 with CNI version up-to-date in August 2020.

Contents :

  1. Before we dive into metrics
  2. CNI MTU tuning
  3. CNI benchmark : Raw data
  4. CNI Encryption
  5. Summary
  6. Conclusion - my review

TL;DR : For those in a hurry, please check sections 3 and 4 for graphs, or sections 5 and 6 for my views on the benchmark results and their interpretation.

1) Before we dive into metrics…

1.1) What’s new since April 2019 ?

  • Put your own cluster to the test : You can now run the benchmark on your own cluster with the release of our “Kubernetes Network Benchmark” tool : knb (https://github.com/InfraBuilder/k8s-bench-suite).
  • Welcome to new challengers in the CNI battle :
    - “Antrea” from VMware Tanzu
    - “Kube-OVN” from alauda.io
  • New scenario : This benchmark covers “Pod-to-Pod” network performance, but also a new “Pod-to-Service” scenario which refers to a real-world test case. In practice, your API pod will consume database through a service, and not from the pod IP (Of course, we do test TCP and UDP for both scenarios)
  • Resource consumption : Now each test has its own resources comparison.
  • Applicative tests removed : We did not run HTTP, FTP and SCP tests anymore. Our fruitful collaboration with the community and CNI maintainers highlighted a gap between iperf TCP results and curl results, due to a delay in CNI startup (First few seconds at pod start, not revelant in real-world use case).
  • Open source : All the benchmark sources (scripts, cni yml, and raw results) are available on github : https://github.com/InfraBuilder/benchmark-k8s-cni-2020-08

1.2) Benchmark protocol

The whole protocol is detailed in https://github.com/InfraBuilder/benchmark-k8s-cni-2020-08/blob/master/PROTOCOL.md

Please note that the current article only focuses on Ubuntu 18.04, with default kernel.

1.3) Selection of CNIs for the benchmark

This benchmark aims at comparing CNIs that can be set up with a single yaml file (thus excluding all script-based install like VPP based CNIs, etc.)

The CNIs we will compare are listed below:

  • Antrea v.0.9.1
  • Calico v3.16
  • Canal v3.16 (Flannel network + Calico Network Policies)
  • Cilium 1.8.2
  • Flannel 0.12.0
  • Kube-router latest (2020–08–25)
  • WeaveNet 2.7.0

2) CNI MTU tuning

In the first place, we will check the impact of MTU detection on TCP performance :

MTU impact on TCP performance

An even more striking gap comes to light with UDP :

MTU Impact on UDP performance

Considering the HUGE impact on performance that is revealed here, we would like to send a message of hope to all CNI maintainers : please implement MTU auto-detection in CNIs. You will save kittens, unicorns, and even the cutest one : the little devops guy !

Nevertheless, if you really have to choose a CNI that does not implement auto-MTU, you will need to tune it yourself to preserve performance. Please note that applies to Calico, Canal and Weavenet.

My little message to CNI maintainers ….

3) CNI benchmark : Raw data

In this section, we will compare CNI with correct MTU (auto-detected or manually tuned). The main goal here is to present raw data in graphs.

Color code :

  • Gray : Reference (aka Bare metal)
  • Green : bandwidth > 9 500 Mbit/s
  • Yellow : bandwidth > 9 000 Mbit/s
  • Orange : bandwidth > 8 000 Mbit/s
  • Red : bandwidth < 8 000 Mbit/s
  • Blue : neutral (not-bandwidth related)

3.1) Idle

First thing is to establish the CNI consumption when the whole cluster… is taking a little nap ?

Idle resource consumption

3.2) Pod-to-Pod

In this scenario, the client Pod connects directly to the server pod on its IP address.

Pod-to-Pod scenario

3.2.1) TCP

The results for “Pod-to-PodTCP and related resource consumption are as follows :

3.2.2) UDP

The results for “Pod-to-PodUDP and related resource consumption are as follows :

3.3) Pod-to-Service

In this section, the client Pod connects to the server Pod via a ClusterIP service. This is more relevant to real-world use cases.

Pod-to-Service scenario

3.3.1) TCP

The results for “Pod-to-ServiceTCP and related resource consumption are as follows :

3.3.2) UDP

The results for “Pod-to-ServiceUDP and related resource consumption are as follows :

3.4) Network Policies

Among all the CNIs listed in this benchmark, the only one not fully supporting Network Policies is Flannel. All the others are correctly implementing Network Policies, both ingress and egress. Great job !

4) CNI Encryption

Among all the CNIs we tested, the following are able to encrypt inter-pod communication :

  • Antrea with IPsec
  • Calico with wireguard
  • Cilium with IPsec
  • WeaveNet with IPsec

4.1) Bandwidth

As there are less CNIs in this battle, let us recap all scenarios in one graph :

4.2) Resource consumption

In this section, we will examine the resources used by Pod-to-Pod communication, both TCP and UDP. There is no point in showing Pod-to-Service graphs here, as it provides no further information.

5) Summary

Let us try and recap all these graphs. We introduce a bit of subjectivity here by replacing actual values by qualifiers like “very fast”, “low”, etc.

Benchmark result summary (August 2020)

6) Conclusion - my review

That final part is subjective and conveys my own interpretation of the results.

I’m delighted to see CNI newcomers join in. Antrea played the game well by providing many features, even in early versions : auto-mtu, encryption option, and straightforward install.

Considering performance, all the CNIs did well, except Kube-OVN and Kube-Router. Regarding Kube-Router, it fails at detecting MTU, and I could not find anywhere in the documentation a way to tune it (there is an open issue about MTU config here).

Concerning resource consumption, Cilium still uses more RAM than its competitors, but the company openly targets large-scale clusters, which is not exactly the case with this 3 nodes benchmark. Kube-OVN is also RAM and CPU-intensive, it is still a pretty young CNI that relies on Open vSwitch (so does Antrea, but Antrea is lighter and performs better).

Network Policies are implemented by all the tested CNIs except Flannel. It is very likely that Flannel will never (ever) implement it as their purpose is clear as bell : the lighter, the better.

In addition, encryption performance is the real “wow effect” here. Calico is one of the oldest CNIs, but they did not offer encryption until a few weeks ago. They preferred wireguard instead of IPsec, and to say the least, it performs great and prodigious, completely outstanding other CNIs in this domain. Of course, it consumes a lot of CPU due the encryption load, but the bandwidth they achieve is totally worth it (remember that Calico encrypted perf is about 6x times better than Cilium, that ranks second). Moreover, you may also activate wireguard encryption anytime after deploying Calico on your cluster, and you may as well disable it for a short period of time, or forever. This is incredibly user-friendly. BUT ! We remind you that Calico is not able to auto-detect MTU for now (the feature is planned for a soon release), so do not forget to tune MTU if your network supports Jumbo Frames (MTU 9000).

Besides, please note that Cilium is able to encrypt the whole node-to-node traffic (not only pod traffic) which may be a very attractive feature for public-facing cluster nodes.

To conclude, here are my suggestions regarding the following use cases :

  • I need a CNI for extra small nodes cluster OR I don’t care about security
    Then go with Flannel, this is pretty much the lightest stable CNI.
    (It is also one of the oldest. According to legend, it was invented by a Homo-Kubernautus or a Homo-Containorus). You may as well be interested in the brilliant k3s project ! Check it out !
  • I need a CNI for my standard cluster
    Okay, Calico is your choice, do not forget to tune MTU if you have to. You can play with Network Policies, enable/disable encryption with ease, etc.
  • I need a CNI for my (very) large-scale cluster
    Well, this benchmark does not reflect big clusters’ behaviour. I would be glad to work on it, but we do not have hundreds of servers with 10Gbit/s connectivity. So the best option would be to run a customized benchmark on your nodes with at least Calico and Cilium.

Thanks for reading !

--

--

infraBuilder founder, BlackSwift founder, Kubernetes CKA and CKAD, devops meetup organizer, member of Build-and-Run group.