An illustrated guide to Kubernetes Networking [Part 2]
Everything I learned about the Kubernetes Networking


This is the second part of the series about Kubernetes Networking. If you haven’t read the Part 1 yet, I recommend checking that out first.
Previously on this series, we walked through the Kubernetes Networking Model. We observed how packets flow through pods on the same node, and across nodes. We also noticed the role linux network bridges and route tables play in the process.
Today, we’ll expand on these ideas and see how the overlay networks work. We will also understand how the ever-changing pods are abstracted away from apps running in Kubernetes and handled behind the scenes.
Overlay networks
Overlay networks are not required by default, however, they help in specific situations. Like when we don’t have enough IP space, or network can’t handle the extra routes. Or maybe when we want some extra management features the overlays provide. One commonly seen case is when there’s a limit of how many routes the cloud provider route tables can handle. For example, AWS route tables support up to 50 routes without impacting network performance. So if we have more than 50 Kubernetes nodes, AWS route table won’t be enough. In such cases, using an overlay network helps.
It is essentially encapsulating a packet-in-packet which traverses the native network across nodes. You may not want to use an overlay network since it may cause some latency and complexity overhead due to encapsulation-decapsulation of all the packets. It’s often not needed, so we should use it only when we know why we need it.
To understand how traffic flows in an overlay network, let’s consider an example of flannel, which is an open-source project by CoreOS.


(cross node pod-to-pop Traffic flow with flannel overlay network)
[click to zoom for a clearer image]
Here we see that it’s the same setup as before, but with a new virtual ethernet device called flannel0 added to root netns. It’s an implementation of Virtual Extensible LAN (VXLAN), but to linux, its just another network interface.
The flow for a packet going from pod1 to pod4 (on a different node) is something like this:
1. The packet leaves pod1’s netns at eth0 and enters the root netns at vethxxx.
2. It’s passed on to cbr0, which makes the ARP request to find the destination.
3a. Since nobody on this node has the IP address for pod4, bridge sends it to flannel0 because the node’s route table is configured with flannel0 as the target for the pod network range .
3b. As the flanneld daemon talks to the Kubernetes apiserver or the underlying etcd, it knows about all the pod IPs, and what nodes they’re on. So flannel creates the mappings (in userspace) for pods IPs to node IPs.
flannel0 takes this packet and wraps it in a UDP packet with extra headers changing the source and destinations IPs to the respective nodes, and sends it to a special vxlan port (generally 8472).


(notice the packet is encapsulated from 3c to 6b in previous diagram)
Even though the mapping is in userspace, the actual encapsulation and data flow happens in kernel space. So it happens pretty fast.
3c. The encapsulated packet is sent out via eth0 since it is involved in routing the node traffic.
4. The packet leaves the node with node IPs as source and destination.
5. The cloud provider route table already knows how to route traffic between nodes, so it send the packet to destination node2.
6a. The packet arrives at eth0 of node2. Due to the port being special vxlan port, kernel sends the packet to flannel0.
6b. flannel0 de-capsulates and emits it back in the root network namespace.
From here on, the path is same as it was in case of non-overlay network we saw in Part 1.
6c. Since IP forwarding is enabled, kernel forwards it to cbr0 as per the route tables.
7. The bridge takes the packet, makes an ARP request and finds out that the IP belongs to vethyyy.
8. The packet crosses the pipe-pair and reaches pod4 🏠
There could be slight differences among different implementations, but this is how overlay networks in Kubernetes work. There’s a common misconception that we have to use overlays when using Kubernetes. The truth is, it completely depends on the specific scenarios. So make sure you use it only when it’s absolutely needed.
That’s all for now. In the previous part we studied the foundation of Kubernetes Networking. Now we know how the overlay networks work. In the next parts, we will see what networking changes happen as pods come and go, and how the outbound and inbound traffic flows.
I’m still new to networking concepts in general, so I would appreciate feedback, especially if something is unclear or erroneous 🙂
I will be speaking at KubeCon North America 2017 about other networking concepts and UDP failures I encountered with Kubernetes in production, and how I investigated and fixed them.
If you’re attending, come say “Hi“ or hit me up on Twitter :)
-















