Deep Dive into Kubernetes Networking: Part 2

9 min readMay 5, 2024

Author : Rishabh Dev Saini

In the previous blog of this series, we explored the different components that are required to facilitate networking inside the kubernetes environment. In this part of this series, we will take a look into how data packets travel between different kubernetes entities such as pods and services.

Container to Container Networking

In the Kubernetes ecosystem, a pod stands as the most compact deployable entity, with the flexibility to accommodate multiple coexisting containers. As previously elucidated, this design permits multiple containers within a single pod to seamlessly tap into the shared networking resources of that very pod. Consequently, these co-resident containers have the capacity to exchange messages with one another conveniently through the loopback interface (localhost).

Pod to Pod networking — on the same node

Each pod has been allocated its dedicated virtual network interface, and the opposing end of this interface is interconnected with a virtual bridge. This virtual bridge is situated within the root namespace of the node that hosts these pods. To illustrate this process, let’s envision a scenario where Pod 1 dispatches a data packet to Pod 2, adhering to the Kubernetes networking model’s IP addressing scheme.

Pod 1 initiates the transmission through its eth0 interface, which serves as its personal virtual network interface. Eth0 of Pod 1 links to veth0, the counterpart of the virtual Ethernet cable residing in the root namespace. Subsequently, at veth0, the data packet takes its course towards the virtual bridge, denoted as cbr0.

Within this bridge, an essential step unfolds as it conducts ARP (Address Resolution Protocol) resolution to pinpoint the correct network segment for packet forwarding. This determination directs the packet to its destined recipient — veth1, situated in Pod 2’s namespace. Upon arrival at veth1, the data packet is promptly relayed to the eth0 device within Pod 2’s namespace.

This orchestration ensures that the communication transpires via eth0 on the localhost interface, effectively routing traffic to its intended destination pod, as envisioned in the Kubernetes networking model.

Pod to Pod networking — on different nodes

After gaining insights into how packets are routed within a node from one pod to another, we now shift our focus to the inter-node packet routing process. In line with the Kubernetes model, it is imperative that Pod IPs remain accessible across the entire network, yet the model refrains from specifying the exact implementation method. Broadly, each node within a Kubernetes cluster is assigned a distinct CIDR block, delineating the range of IP addresses available to pods operating on that node. Once incoming traffic destined for the relevant CIDR block reaches a node, the responsibility for directing the packet to the appropriate pod lies squarely on that node’s shoulders. It’s noteworthy that aside from this inter-node routing, the remainder of the packet’s journey adheres to the same principles as when pods are located within the same node.

Handling the crucial task of routing data packets between nodes is where the CNI (Container Networking Interface) plugin comes into play. In the context of Calico, this involves an IP in IP encapsulation process commencing at the virtual bridge of the source node. During this encapsulation, the packet headers — containing the original source and destination IPs of both the source and destination pods — are enveloped by a fresh set of headers, bearing the new source and destination IPs that pertain to the source and destination nodes. Subsequently, upon reaching the destination node’s virtual bridge, these encapsulated packet headers are meticulously unencapsulated.

The bridge then dutifully engages the ARP (Address Resolution Protocol) protocol, which we previously explored, to facilitate the forwarding of the data packet to its intended recipient pod, seamlessly concluding the inter-node packet routing journey.

Pod to Service Networking

We’ve explored the mechanics of routing traffic from one pod to another within the Kubernetes cluster, a setup that operates efficiently until changes occur within the cluster. Given that pods represent transient entities in Kubernetes — created, deleted, or modified in response to application scaling, crashes, or node reboots — the IP addresses associated with these pods can undergo sudden alterations, potentially causing disruptions.

To tackle this issue, Kubernetes incorporates a vital feature : services.

In the Kubernetes landscape, services serve as an abstract layer encompassing a group of pods. They allocate a single, stable virtual IP address to this collection of pods. Consequently, any traffic directed towards the service’s virtual IP will reliably reach one of the pods belonging to the service. This ingenious design allows the pod set associated with a service to evolve dynamically over time, with clients requiring knowledge only of the service’s virtual IP, not the individual pod IPs.

When a new service is instantiated in Kubernetes, a distinct virtual IP, known as the clusterIP, is established. Throughout the cluster’s expanse, any traffic directed to this clusterIP undergoes load balancing, intelligently routing it to one of the pods linked to the service. This orchestration relies on the native capabilities of iptables and the netfilter framework, integral components woven into the fabric of the Linux operating system, effectively achieving load balancing within the Kubernetes cluster.

Netfilter and IPTables

Iptables functions as a versatile firewall system, organizing its rules into distinct tables based on the specific decision-making purposes of those rules. For instance, rules concerned with network address translation (NAT) find their place in the NAT table.

Presently, iptables comprises five tables:

Filter Table: This table holds the rules responsible for determining whether a packet should be permitted to reach its intended destination or denied entry.
NAT Table: The NAT table is dedicated to network address translation.
Mangle Table: Within the Mangle table, packet headers can be subject to various alterations. For instance, this table allows for modifications to the Time-To-Live (TTL) value of a data packet.
Raw Table: The Raw table introduces statefulness into the rule evaluation process. It assesses packets with respect to their relationships with other packets. This capability, established with the netfilter framework, enables iptables to view a data packet as part of an ongoing connection or session, rather than as isolated, discrete entities.
Security Table: The Security table is instrumental in assigning internal Linux security context marks to packets.

In each iptables table, rules are further organized into separate chains. These chains correspond to the netfilter hooks that trigger them.

The iptables framework operates by engaging with a set of packet filtering hooks within Linux’s networking stack. This collection of kernel-level hooks is collectively referred to as the netfilter framework. Each packet traversing the system interfaces with these hooks, enabling programs to interact with network traffic at key junctures in the system’s journey. Kernel modules associated with iptables register themselves with these hooks to ensure that network traffic adheres to the conditions stipulated by the firewall rules.

There are five distinct netfilter hooks with which programs can register:

NF_IP_PRE_ROUTING: This hook activates upon the arrival of incoming traffic as it enters the network stack, before any routing decisions are made regarding its destination.
NF_IP_LOCAL_IN: Triggered after an incoming packet has been routed, provided that the packet is destined for the local system.
NF_IP_FORWARD: Activation occurs after an incoming packet has been routed, but only if the packet is to be forwarded to another host.
NF_IP_LOCAL_OUT: This hook is engaged by any locally generated outbound traffic as soon as it encounters the network stack.
NF_IP_POST_ROUTING: Triggered by outgoing or forwarded traffic after routing decisions have been made but just before the data is transmitted on the network.

Kernel modules registering with these hooks must also specify a priority number, aiding in the determination of the order in which they will be invoked when the hook is triggered.

In the context of Kubernetes, the configuration of iptables rules is managed by kube-proxy. Kube-proxy continually monitors the Kubernetes API server for changes in the system’s state. When an alteration in the system’s configuration impacts the IP addresses of services or pods, iptables rules are dynamically updated to ensure accurate traffic routing. Specifically, these iptables rules monitor traffic headed for a service’s IP address, and upon identification, select a random associated pod to which the packet’s destination IP is changed from the service’s IP to the selected pod’s IP.

Example of NAT table chain KUBE-NODEPORTS :

Following the above chain further :

The chain now refers to the pods abstracted by the service, one of (if multiple pods present) which will be randomly selected when packet is destined for the service IP :

Routing a packet from a pod to a service follows a similar trajectory as observed in pod-to-pod communication. As the data packet progresses and arrives at the virtual bridge within the root namespace of the node, a notable divergence occurs. The ARP protocol encounters a challenge when resolving the service IP, subsequently directing the traffic towards the node’s network interface.

Before the data packet gains acceptance at eth0, it undergoes a pivotal filtering process through iptables. At this juncture, iptables faithfully executes the rules defined by kube-proxy, yielding an outcome where the packet’s destination IP undergoes a transformation to a randomly selected pod affiliated with the service. Once this modification takes place, the packet follows the identical route previously witnessed in pod-to-pod communication.

Internet to Service Networking

Up until this point, we’ve traced the journey of a data packet within the cluster, meticulously dissecting the path it traverses through the various system components. However, the time has come to introduce external traffic from the internet into our application, ensuring that our application efficiently serves end users.

This endeavor can be accomplished using one of two approaches:

Leveraging Load Balancers and Proxy Servers: In this approach, after DNS resolution, traffic converges at the proxy server. Guided by URL patterns and other rule configurations, the proxy server adeptly steers the traffic toward the appropriate service residing within the Kubernetes cluster.

Or,

Utilizing Ingress: Here, Ingress — an invaluable Kubernetes API object — takes center stage. Ingress empowers us to establish routing rules that effectively manage external user access to services ensconced within the Kubernetes cluster.

In the context of employing load balancers and proxy servers, upon completion of DNS resolution, traffic converges at the proxy server, where it undergoes intelligent routing based on URL attributes and other rule-based configurations. This routing mechanism ensures that the traffic finds its way to the precise service within the cluster, harmonizing the external user experience with the application’s offerings.

Employing the aforementioned configuration is discouraged, especially in the context of cloud-based Kubernetes clusters. This caution arises from the fact that, as the application scales up, necessitating the setup of multiple proxy servers, each additional configured proxy server contributes to the escalating costs within the cloud environment. In light of this, an alternative approach emerges as the more prudent choice: exposing the application via Ingress. This alternative proves particularly advantageous in scenarios where load balancing and routing rule configuration can be efficiently managed from within the Kubernetes cluster, minimizing cloud infrastructure expenses.

Ingress

The flow of traffic directed from the external realm into the Kubernetes cluster is aptly termed “Ingress.” This vital function is facilitated by a Kubernetes API object known as “Ingress,” which furnishes routing rules to govern how internet users gain access to services housed within the Kubernetes cluster. Ingress extends the reach of HTTP and HTTPS connectivity, bridging the external world to services within the cluster.

Ingress comprises two fundamental components: the Ingress controller and Ingress resources. Traffic routing is effectively orchestrated through the meticulously crafted rules defined within Ingress resources. Meanwhile, the Ingress controller assumes the role of a specialized load balancer and reverse proxy, diligently executing the directives laid out by the Ingress. Notably, one of the most prevalent Ingress controllers is Nginx — an ingeniously tailored image of Nginx expressly designed for this specific purpose.