Files
homelab-design/decisions/0035-arm64-worker-strategy.md

5.6 KiB

ARM64 Raspberry Pi Worker Node Strategy

  • Status: accepted
  • Date: 2026-02-05
  • Deciders: Billy
  • Technical Story: Integrate Raspberry Pi nodes into the Kubernetes cluster

Context and Problem Statement

The homelab cluster includes 5 Raspberry Pi 4/5 nodes (ARM64 architecture) alongside x86_64 servers. These low-power nodes provide:

  • Additional compute capacity for lightweight workloads
  • Geographic distribution within the home network
  • Learning platform for multi-architecture Kubernetes

However, ARM64 nodes have constraints:

  • No GPU acceleration
  • Lower CPU/memory than x86_64 servers
  • Some container images lack ARM64 support
  • Limited local storage

How do we effectively integrate ARM64 nodes while avoiding scheduling failures?

Decision Drivers

  • Maximize utilization of ARM64 compute
  • Prevent ARM-incompatible workloads from scheduling
  • Maintain cluster stability
  • Support multi-arch container images
  • Minimize operational overhead

Considered Options

  1. Node labels + affinity for workload placement
  2. Separate ARM64-only namespace
  3. Taints to exclude from general scheduling
  4. ARM64 nodes for specific workload types only

Decision Outcome

Chosen option: Option 1 + Option 4 hybrid - Use node labels with affinity rules, and designate ARM64 nodes for specific workload categories.

ARM64 nodes handle:

  • Lightweight control plane components (where multi-arch images exist)
  • Velero node-agent (backup DaemonSet)
  • Node-level monitoring (Prometheus node-exporter)
  • Future: Edge/IoT workloads

Positive Consequences

  • Clear workload segmentation
  • No scheduling failures from arch mismatch
  • Efficient use of low-power nodes
  • Room for future ARM-specific workloads
  • Cost-effective cluster expansion

Negative Consequences

  • Some nodes may be underutilized
  • Must maintain multi-arch image awareness
  • Additional scheduling complexity

Cluster Composition

Node Architecture Role Instance Type
bruenor amd64 control-plane -
catti amd64 control-plane -
storm amd64 control-plane -
khelben amd64 GPU worker (Strix Halo) -
elminster amd64 GPU worker (NVIDIA) -
drizzt amd64 GPU worker (RDNA2) -
danilo amd64 GPU worker (Intel Arc) -
regis amd64 worker -
wulfgar amd64 worker -
durnan arm64 worker raspberry-pi
elaith arm64 worker raspberry-pi
jarlaxle arm64 worker raspberry-pi
mirt arm64 worker raspberry-pi
volo arm64 worker raspberry-pi

Node Labels

# Applied via Talos machine config or kubectl
labels:
  kubernetes.io/arch: arm64
  kubernetes.io/os: linux
  node.kubernetes.io/instance-type: raspberry-pi
  kubernetes.io/storage: none  # No Longhorn on Pis

Workload Placement

DaemonSets (Run Everywhere)

These run on all nodes including ARM64:

DaemonSet Namespace Multi-arch
velero-node-agent velero
cilium-agent kube-system
node-exporter observability

ARM64-Excluded Workloads

These explicitly exclude ARM64 via node affinity:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - amd64
Workload Type Reason for Exclusion
GPU workloads No GPU on Pis
Longhorn Pis have no storage label
Heavy databases Insufficient resources
Most HelmReleases Image compatibility

ARM64-Compatible Light Workloads

Potential future workloads for ARM64 nodes:

Workload Use Case
MQTT broker IoT message routing
Pi-hole DNS ad blocking
Home Assistant Home automation
Lightweight proxies Traffic routing

Storage Exclusion

ARM64 nodes are excluded from Longhorn:

# Longhorn Helm values
defaultSettings:
  systemManagedComponentsNodeSelector: "kubernetes.io/arch:amd64"

Node label:

kubernetes.io/storage: none

Resource Constraints

Node Type CPU Memory Typical Available
Raspberry Pi 4 4 cores 4-8GB 3 cores, 3GB
Raspberry Pi 5 4 cores 8GB 3.5 cores, 6GB

Multi-Architecture Image Strategy

For workloads that should run on ARM64:

  1. Use multi-arch base images (e.g., alpine, debian)
  2. Build with Docker buildx:
    docker buildx build --platform linux/amd64,linux/arm64 -t myimage:latest .
    
  3. Verify arch support before deployment

Monitoring ARM64 Nodes

# Node resource usage by architecture
sum by (node, arch) (
  node_memory_MemAvailable_bytes{} 
  * on(node) group_left(arch) 
  kube_node_labels{label_kubernetes_io_arch!=""}
)

Future Considerations

  • Edge workloads: ARM64 nodes ideal for edge compute patterns
  • IoT integration: MQTT, sensor data collection
  • Scale-out: Add more Pis for lightweight workload capacity
  • ARM64 ML inference: Some models support ARM (TensorFlow Lite)