# Cilium CNI and Network Fabric * Status: accepted * Date: 2026-02-09 * Deciders: Billy * Technical Story: Select and configure the Container Network Interface (CNI) plugin for pod networking, load balancing, and service mesh capabilities on Talos Linux ## Context and Problem Statement A Kubernetes cluster requires a CNI plugin to provide pod-to-pod networking, service load balancing, and network policy enforcement. The homelab runs on Talos Linux (immutable OS) with heterogeneous hardware (amd64 + arm64) and needs L2 LoadBalancer IP advertisement for bare-metal services. How do we provide reliable, performant networking with bare-metal LoadBalancer support and observability integration? ## Decision Drivers * Must work on Talos Linux (eBPF-capable, no iptables preference) * Bare-metal LoadBalancer IP assignment (no cloud provider) * L2 IP advertisement for LAN services * kube-proxy replacement for performance * Future-proof for network policy enforcement * Active community and CNCF backing ## Considered Options 1. **Cilium** — eBPF-based CNI with kube-proxy replacement 2. **Calico** — Established CNI with eBPF and BGP support 3. **Flannel + MetalLB** — Simple overlay with separate LB 4. **Antrea** — VMware-backed OVS-based CNI ## Decision Outcome Chosen option: **Cilium**, because it provides an eBPF-native dataplane that replaces kube-proxy, L2 LoadBalancer announcements (eliminating MetalLB), and native Talos support. ### Positive Consequences * Single component handles CNI + kube-proxy + LoadBalancer (replaces 3 tools) * eBPF dataplane is more efficient than iptables on large clusters * L2 announcements provide bare-metal LoadBalancer without MetalLB * Maglev + DSR load balancing for consistent hashing and reduced latency * Strong Talos Linux integration and testing * Prometheus metrics and Grafana dashboards included ### Negative Consequences * More complex configuration than simple CNIs * eBPF requires compatible kernel (Talos provides this) * `hostLegacyRouting: true` workaround needed for Talos issue #10002 ## Deployment Configuration | | | |---|---| | **Chart** | `cilium` from `oci://ghcr.io/home-operations/charts-mirror/cilium` | | **Version** | 1.18.6 | | **Namespace** | `kube-system` | ### Core Networking | Setting | Value | Rationale | |---------|-------|-----------| | `kubeProxyReplacement` | `true` | Replace kube-proxy entirely — lower latency, fewer components | | `routingMode` | `native` | Direct routing, no encapsulation overhead | | `autoDirectNodeRoutes` | `true` | Auto-configure inter-node routes | | `ipv4NativeRoutingCIDR` | `10.42.0.0/16` | Pod CIDR for native routing | | `ipam.mode` | `kubernetes` | Use Kubernetes IPAM | | `endpointRoutes.enabled` | `true` | Per-endpoint routing for better granularity | | `bpf.masquerade` | `true` | eBPF-based masquerading | | `bpf.hostLegacyRouting` | `true` | Workaround for Talos issue #10002 | ### Load Balancing | Setting | Value | Rationale | |---------|-------|-----------| | `loadBalancer.algorithm` | `maglev` | Consistent hashing for stable backend selection | | `loadBalancer.mode` | `dsr` | Direct Server Return — response bypasses LB for lower latency | | `socketLB.enabled` | `true` | Socket-level load balancing for host-namespace pods | ### L2 Announcements Cilium replaces MetalLB for bare-metal LoadBalancer IP assignment: ``` CiliumLoadBalancerIPPool: 192.168.100.0/24 CiliumL2AnnouncementPolicy: announces on all Linux nodes ``` Key VIPs assigned from this pool: - `192.168.100.200` — k8s-gateway (internal DNS) - `192.168.100.201` — envoy-internal gateway - `192.168.100.210` — envoy-external gateway ### Multi-Network Support | Setting | Value | Rationale | |---------|-------|-----------| | `cni.exclusive` | `false` | Paired with Multus CNI for multi-network pods | This enables workloads like qbittorrent to use secondary network interfaces (e.g., VPN). ### Disabled Features | Feature | Reason | |---------|--------| | Hubble | Not needed — tracing handled by OpenTelemetry stack | | Gateway API | Offloaded to dedicated Envoy Gateway deployment | | Envoy (built-in) | Using separate Envoy Gateway for more control | ## Observability - **Prometheus:** ServiceMonitor enabled for both agent and operator - **Grafana:** Two dashboards via `GrafanaDashboard` CRDs (Cilium agent + operator) ## Links * Related to [ADR-0044](0044-dns-and-external-access.md) (L2 IPs feed gateway VIPs) * Related to [ADR-0002](0002-use-talos-linux.md) (Talos eBPF compatibility) * [Cilium Documentation](https://docs.cilium.io/) * [Talos Cilium Guide](https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/)