Steps to emulate Kubernetes Pod Network

Published in

ITNEXT

9 min readJul 27, 2022

Networking is the spine of Kubernetes, but it can be challenging to understand exactly how it is expected to work. There are 4 distinct networking problems to address:

Highly-coupled container-to-container communications: this is solved by Pods and localhost/127.0.0.1 loopback interface communications.
Pod-to-Pod communications: We will emulate one implementation of this.
Pod-to-Service communications: this is covered by services.
External-to-Service communications: this is covered by services.

There are 3 networks in a Kubernetes cluster:

Host network which connects all hosts/servers/vms/nodes of Kubernetes cluster node pool.
Pod Network which connects all pods whether on the same node or on different nodes of the node pool.
The Service network is used to connect to applications without worrying about the pod IPs of pods that cater to respective applications.

The objective of this post

There are many good posts by many bloggers which explain the pod network in detail. I would like to mention a few in case you are interested, you may like “A Guide to the Kubernetes Networking Model” and “Understanding Kubernetes networking: pods”.

Why am I writing this post? Because when I was trying to find a post that shows how to emulate a pod network, I didn't find a comprehensive one.

There are multiple ways to achieve the requirements laid by Kubernetes for pod networking. We can mainly differentiate between them on the basis of whether the pod network address space is part of the node pool’s subnet or the Pod network address space is separate and is not part of the node pool’s subnet. We will try to emulate the latter.

There are multiple ways to implement a pod network where address space is separate and is not part of the node pool’s subnet. The most commonly used one is the “Basic networking” using Linux Bridge and User defined routes along with IP forwarding. GKE route-based cluster (when network policy is disabled) and Azure AKS kubenet use the same technique to achieve pod networking.

We will emulate the pod network without Kubernetes using Linux commands. Basically, we will do what a “bridge” type plugin is asked to do by the kubelet on a cluster node when a pod is scheduled on the node. I will go through each step that is done in the background and will mention the command used to achieve the same.

Pod Network

Basic Requirements:

Containers within a pod should be able to communicate with each other.
Every Pod should have an IP address.
Every Pod should be able to communicate with every other pod on the same node.
Every pod should be able to communicate with every other pod on any other node of the node pool of the Kubernetes Cluster without NAT.

Test Configuration

We have 2 nodes node01 (10.128.0.2) and node02 (10.128.0.3) (VMs provisioned in GCP or Azure)
Host network CIDR 10.128.0.0/24
Pod Network CIDR 192.168.0.0/16 i.e CIDR 192.168.1.0/24 for node01 and CIDR 192.168.2.0/24 for node02 and so on (for other nodes if added to cluster)

Emulation of Pod Network

Linux namespaces (particularly network namespaces) make it easy to implement these requirements. A network namespace is assigned to a pod as soon as it is scheduled and it is done by Kubelet. That means one network namespace for each pod.

Let's start with one node i.e node01(10.128.0.2). Any two processes listening within a network namespace can connect to each other using the loopback interface (localhost/127.0.0.1), this solves the first requirement.

# create namespaces on node01ip netns add redapp
ip netns add blueapp
ip netns

We can connect two namespaces using a Linux Bridge which acts as a virtual switch. It works at Layer 2 of the OSI model which means it uses ARP protocol to identify the devices (interfaces of the namespaces) connected to it. To the host VM (node), the bridge appears as any other network interface. We assign an IP address to the bridge. So one Network Bridge for each host node.

# create virtual switch (bridge)ip link add bridge-if1 type bridge
ip link set dev bridge-if1 up
ip link# create peered veth for each namepsaceip link add veth-blue1 type veth peer name veth-blue1-br1
ip link add veth-red1 type veth peer name veth-red1-br1# attach pipe to namespace and bridgeip link set veth-blue1-br1 master bridge-if1
ip link set veth-red1-br1 master bridge-if1
ip link set veth-red1 netns redapp
ip link set veth-blue1 netns blueapp

To attach a namespace to the bridge, we create a virtual ethernet interface (veth) with a veth peer, then attach the veth to the namespace and the veth peer to the bridge. We then assign IP address to the veth (within the CIDR range as that of Bridge IP) which is going to be the IP of the pod using that network namespace. This fulfills the second requirement of having a unique IP. How these IPs are allocated depends on the type of pod networking we are using and particularly on IPAM (IP address management) plugin. We are assigning them manually just like the host-local IPAM plugin. We do the same for each pod’s namespace in each node.

# assign IP addr to interfaces and bridgeip addr add 192.168.1.10/24 dev bridge-if1
ip -n redapp addr add 192.168.1.11/24 dev veth-red1
ip -n blueapp addr add 192.168.1.12/24 dev veth-blue1# start the interfacesip -n redapp link set veth-red1 up
ip -n blueapp link set veth-blue1 up
ip link set dev veth-blue1-br1 up
ip link set dev veth-red1-br1 up

After the namespaces are connected to the bridge, the interfaces related to namespaces can communicate with each other. This fulfills the third requirement of pods on the same node(in namespaces) to be able to communicate with each other.

Linux kernel routing is set up to act as a simple router to connect the bridge to the host network. Any packets destined for 192.168.1.0/24 are routed to the bridge and any packets for any other destinations including the pods on other nodes are routed out of eth0/ens4 to the host network. To make namespaces able to connect to the outside world, we need to route the traffic out of eth0/ens4, and for that, we will have to create a routing rule and make the bridge the default gateway for namespaces in each host.

#add deafult gateway route for each namespaceip netns exec blueapp ip route add default via 192.168.1.10 dev veth-blue1
ip netns exec redapp ip route add default via 192.168.1.10 dev veth-red1

We are left with the last requirement i.e Every pod should be able to communicate with every other pod on any other node of the node pool of the Kubernetes Cluster. For that let's add another node node02. Please make sure to run all the above commands on node02 with different IP addresses and Pod network cidr.

This is more interesting in my opinion because the host network doesn't know anything about the pod network. It just knows about the IP address for interfaces eth0/ens4 for each node and the routes associated with them.

So if a pod on node01 tries to send the traffic to pod on node02, it has to use the host network. For example, a packet starts from redapp netns (192.168.1.11) and wants to reach redapp netns on the node02 (192.168.2.11). Since the default gateway is bridge IP (192.168.1.10), it goes to bridge and Linux kernel routing forwards it to eth0/ens4 interface where it finds the default gateway of the host and is routed out to the host network but the host network doesn't have a route for this destination address and doesn't know what to do with the packet.

There are 2 things we have to do (AKS Kubnet and GKE Route-based cluster take care of this automatically)

Enable IP forwarding on the eth0/ens4 interface of each host. To use a VM as the next hop for a route, the VM needs to receive packets having destinations other than itself. Because it forwards those packets, their sources will be different from their own internal IP. To accomplish this, you must enable IP forwarding for the VM. In Azure, you can enable it in the “configuration” of NIC attached to the node. For GCP VMs follow the link. On all nodes, execute “sysctl -w net.ipv4.ip_forward=1” as root.
User-defined routes need to be added to the host subnet for each pod cidr to direct traffic to the corresponding node. Here are the snapshots of what routes should look like. (Destination address of the next hops may differ from our case and the below snapshots are for illustration purposes)

After this is done, traffic can flow freely between pods on node01 to pods on node02 and vice versa which fulfills the fourth requirement. We connected two pod cidrs 192.168.1.0/24 (node01) and 192.168.2.0/24 (node02) and we can add more nodes if needed with each’s pod cidr being 192.168.x.0/128 (where x has to be different from what’s already taken). By connecting the pod cidr from all nodes, we get POD network with cidr 198.16.0.0/16.

However user-defined routes are subnet-specific, the broader network still doesn't know what to do with the packets to and from pod IPs. That's can be solved using SNAT. Linux kernel is configured (using iptables), so that any connection from a pod to an IP address outside of the cluster network is source NAT’ed to use the IP address of the node hosting the pod. As the broader network is aware of the node IPs, it can route the traffic normally across the broader network. Any returned traffic associated with the connection automatically gets mapped back to the original pod IP address by the Linux kernel, so the pod is unaware of any of this happened.

#on node01
iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.130.0.10/32 -j MASQUERADE

Testing the connectivity

We will launch a simple http server in redapp network namespace in node01 to emulate a container running in a pod and listening at port 8000

#setup for test on node01
apt-get update -y
apt-get install python3 -yip netns exec redapp python3 -m http.server 8000

Case1: Containers within a pod should be able to communicate with each other.

ip netns exec redapp curl http://127.0.0.1:8000

Case2: Every Pod should be able to communicate with every other pod on the same node.

#on node01ip netns exec blueapp ping 192.168.1.11
ip netns exec blueapp curl http://192.168.1.11:8000

Case3: Every pod should be able to communicate with every other pod on any other node of the node pool of the Kubernetes Cluster without NAT.

#on node02ip netns exec blueapp ping 192.168.1.11
ip netns exec blueapp curl http://192.168.1.11:8000
ip netns exec redapp ping 192.168.1.11
ip netns exec redapp curl http://192.168.1.11:8000

Make sure the GCP firewall allows ingress traffic to 192.168.1.0/24. There is no Firewall by default in Azure so no action is needed in that case.

Case4: All cluster nodes are able to connect to pods

#on node02ping 192.168.1.11
curl http://192.168.1.11:8000#on node01ping 192.168.1.11
curl http://192.168.1.11:8000

Case5: What if VM1 (10.130.0.10) from the broader network wants to connect to webapp running in redapp namespace? There is no route for redapp namespace (pod IP 192.168.1.11) known to the broader network. Either we have to add a route to the broader network or we can use DNAT (Destination NAT) since 10.130.0.10 is able to connect to the host IP (10.128.0.2).

#on node01
iptables -t nat -A PREROUTING -j DNAT -p tcp --dport 30001 -d 10.128.0.2 --to-destination 192.168.1.11:8000

To test try below on VM1 (VM outside Kubernetes cluster)

#on VM1ping 192.168.1.11
curl http://10.128.0.2:30001

It should work now.

This is the simplest implementation of pod networking. Kubernetes' official documentation discusses other implementations.

To make things easy to test, I have created a script that you can use to create bridges, add namespaces, delete namespaces and delete bridges. All you need is 2 micro-sized VMs in GCP or Azure in the same subnet.

Please read my other articles as well and share your feedback. If you like the content shared please like, comment, and subscribe for new articles.