Merbridge CNI Mode

This blog explains how CNI works in Merbridge.

The CNI mode is designed to better adapt to the service mesh functions. Before having this mode, Merbridge was limited to certain scenarios. The biggest problem was that it could not adapt to the sidecar annotations from the container injected by Istio, which led to Merbridge cannot exclude traffic from certain ports and IP ranges. Furthermore, Merbridge was only able to handle requests inside the pod, which means the external traffic sent to the pod was not handled.

Therefore, we have implemented the Merbridge CNI to address these issues.

Why CNI mode is needed

First, Merbridge had a small control plane before, which listened to pods resources, and wrote the current node IP into the map of local_pod_ips for use by connect. However, since the connect program only works at the host kernel layer, it won’t know which pod’s traffic is being processed. Thus, configurations like excludeOutboundPorts cannot be handled. In order to be able to adapt to the injected sidecar annotation excludeOutboundPorts, we need to let the eBPF program know which Pod’s request is currently being processed.

To this end, we have designed a method to cooperate with the CNI, through which you can get the current Pod IP to validate special configurations for the Pod.

Second, for early versions of Merbridge, only connect would process requests from the host, which had no problem for intra-node pod communication. However, it becomes problematic when traffic flows between different nodes. According to the previous logic, the traffic will not be modified during the cross-node communication, which will lead to the use of iptables at the end.

Here, we turned to the XDP program for processing the inbound traffic. The XDP program needs to mount a network card, which also needs to use CNI.

How does CNI work

This section will explore how CNI works and how to use CNI to solve the issues mentioned above.

How to use CNI to let eBPF have the current Pod IP

When a pod is created, we write Pod IP into the map mark_pod_ips_map through CNI, where the key is a random value, and the value is the Pod IP. Then, we listen to a special port 39807 in the NetNS of the current Pod, and write the key to the mark of this port socket using setsockopt.

In eBPF, we get the recorded mark information of port 39807 through bpf_sk_lookup_tcp, and use it to get the current Pod IP (also the current NetNS) from mark_pod_ips_map.

With the current Pod IP, we can determine the path to route traffic (such as excludeOutboundPorts) according to the configuration of this Pod.

In addition, we also optimized the quadruple conflicts by using bpf_bind to bind the source IP and using 127.0.0.1 as the destination IP, which also prepares for future support of IPv6.

How to handle ingress traffic

In order to handle inbound traffic, we introduced the XDP program, which works on the network card and can modify the original data packets.

We use the XDP program to modify the destination port as 15006 when the traffic reaches the Pod, so as to complete traffic forwarding.

At the same time, considering the possibility that the host directly accesses the Pod, and in order to reduce the scope of influence, we choose to attach the XDP program to the Pod’s network card. This requires the ability of CNI to perform additional operations when creating Pods

How to use CNI mode?

CNI mode is disabled by default. You need to enable it manually with the following command.

curl -sSL https://raw.githubusercontent.com/merbridge/merbridge/main/deploy/all-in-one.yaml | sed 's/--cni-mode=false/--cni-mode=true/g' | kubectl apply -f -

Notes

CNI mode is in beta

The CNI mode is a new feature that may not be perfect. We welcome your feedback and suggestions to help improve Merbridge.

If you are trying to do benchmark test using tools like Istio perf benchmark, it is suggested to enable the CNI mode. Otherwise the test results will be inaccurate.

Check whether the host can enable the hardware-checksum capability

In order to ensure the CNI mode works properly, the hardware-checksum capability is disabled by default, which may affect network performance. It is recommended to check whether you can enable this capability on the host before enabling the CNI mode. If yes, we suggest to set --hardware-checksum=true for best performance.

Test method: if ethtool -k <network card> | grep tx-checksum-ipv4 is on, it means enabled.