New ADRs: - 0043: Cilium CNI and Network Fabric - 0044: DNS and External Access Architecture - 0045: TLS Certificate Strategy (cert-manager) - 0046: Companions Frontend Architecture - 0047: MLflow Experiment Tracking and Model Registry - 0048: Entertainment and Media Stack - 0049: Self-Hosted Productivity Suite - 0050: Argo Rollouts Progressive Delivery - 0051: KEDA Event-Driven Autoscaling - 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS) - 0053: Vaultwarden Password Management README updated with table entries and badge count (53 total).
4.6 KiB
Cilium CNI and Network Fabric
- Status: accepted
- Date: 2026-02-09
- Deciders: Billy
- Technical Story: Select and configure the Container Network Interface (CNI) plugin for pod networking, load balancing, and service mesh capabilities on Talos Linux
Context and Problem Statement
A Kubernetes cluster requires a CNI plugin to provide pod-to-pod networking, service load balancing, and network policy enforcement. The homelab runs on Talos Linux (immutable OS) with heterogeneous hardware (amd64 + arm64) and needs L2 LoadBalancer IP advertisement for bare-metal services.
How do we provide reliable, performant networking with bare-metal LoadBalancer support and observability integration?
Decision Drivers
- Must work on Talos Linux (eBPF-capable, no iptables preference)
- Bare-metal LoadBalancer IP assignment (no cloud provider)
- L2 IP advertisement for LAN services
- kube-proxy replacement for performance
- Future-proof for network policy enforcement
- Active community and CNCF backing
Considered Options
- Cilium — eBPF-based CNI with kube-proxy replacement
- Calico — Established CNI with eBPF and BGP support
- Flannel + MetalLB — Simple overlay with separate LB
- Antrea — VMware-backed OVS-based CNI
Decision Outcome
Chosen option: Cilium, because it provides an eBPF-native dataplane that replaces kube-proxy, L2 LoadBalancer announcements (eliminating MetalLB), and native Talos support.
Positive Consequences
- Single component handles CNI + kube-proxy + LoadBalancer (replaces 3 tools)
- eBPF dataplane is more efficient than iptables on large clusters
- L2 announcements provide bare-metal LoadBalancer without MetalLB
- Maglev + DSR load balancing for consistent hashing and reduced latency
- Strong Talos Linux integration and testing
- Prometheus metrics and Grafana dashboards included
Negative Consequences
- More complex configuration than simple CNIs
- eBPF requires compatible kernel (Talos provides this)
hostLegacyRouting: trueworkaround needed for Talos issue #10002
Deployment Configuration
| Chart | cilium from oci://ghcr.io/home-operations/charts-mirror/cilium |
| Version | 1.18.6 |
| Namespace | kube-system |
Core Networking
| Setting | Value | Rationale |
|---|---|---|
kubeProxyReplacement |
true |
Replace kube-proxy entirely — lower latency, fewer components |
routingMode |
native |
Direct routing, no encapsulation overhead |
autoDirectNodeRoutes |
true |
Auto-configure inter-node routes |
ipv4NativeRoutingCIDR |
10.42.0.0/16 |
Pod CIDR for native routing |
ipam.mode |
kubernetes |
Use Kubernetes IPAM |
endpointRoutes.enabled |
true |
Per-endpoint routing for better granularity |
bpf.masquerade |
true |
eBPF-based masquerading |
bpf.hostLegacyRouting |
true |
Workaround for Talos issue #10002 |
Load Balancing
| Setting | Value | Rationale |
|---|---|---|
loadBalancer.algorithm |
maglev |
Consistent hashing for stable backend selection |
loadBalancer.mode |
dsr |
Direct Server Return — response bypasses LB for lower latency |
socketLB.enabled |
true |
Socket-level load balancing for host-namespace pods |
L2 Announcements
Cilium replaces MetalLB for bare-metal LoadBalancer IP assignment:
CiliumLoadBalancerIPPool: 192.168.100.0/24
CiliumL2AnnouncementPolicy: announces on all Linux nodes
Key VIPs assigned from this pool:
192.168.100.200— k8s-gateway (internal DNS)192.168.100.201— envoy-internal gateway192.168.100.210— envoy-external gateway
Multi-Network Support
| Setting | Value | Rationale |
|---|---|---|
cni.exclusive |
false |
Paired with Multus CNI for multi-network pods |
This enables workloads like qbittorrent to use secondary network interfaces (e.g., VPN).
Disabled Features
| Feature | Reason |
|---|---|
| Hubble | Not needed — tracing handled by OpenTelemetry stack |
| Gateway API | Offloaded to dedicated Envoy Gateway deployment |
| Envoy (built-in) | Using separate Envoy Gateway for more control |
Observability
- Prometheus: ServiceMonitor enabled for both agent and operator
- Grafana: Two dashboards via
GrafanaDashboardCRDs (Cilium agent + operator)
Links
- Related to ADR-0044 (L2 IPs feed gateway VIPs)
- Related to ADR-0002 (Talos eBPF compatibility)
- Cilium Documentation
- Talos Cilium Guide