daviestechlabs/homelab-design

Fork 0

Files

Billy D. 80fb911e22 updating to match everything in my homelab.

2026-02-05 16:13:53 -05:00

5.6 KiB

Raw Blame History

ARM64 Raspberry Pi Worker Node Strategy

Status: accepted
Date: 2026-02-05
Deciders: Billy
Technical Story: Integrate Raspberry Pi nodes into the Kubernetes cluster

Context and Problem Statement

The homelab cluster includes 5 Raspberry Pi 4/5 nodes (ARM64 architecture) alongside x86_64 servers. These low-power nodes provide:

Additional compute capacity for lightweight workloads
Geographic distribution within the home network
Learning platform for multi-architecture Kubernetes

However, ARM64 nodes have constraints:

No GPU acceleration
Lower CPU/memory than x86_64 servers
Some container images lack ARM64 support
Limited local storage

How do we effectively integrate ARM64 nodes while avoiding scheduling failures?

Decision Drivers

Maximize utilization of ARM64 compute
Prevent ARM-incompatible workloads from scheduling
Maintain cluster stability
Support multi-arch container images
Minimize operational overhead

Considered Options

Node labels + affinity for workload placement
Separate ARM64-only namespace
Taints to exclude from general scheduling
ARM64 nodes for specific workload types only

Decision Outcome

Chosen option: Option 1 + Option 4 hybrid - Use node labels with affinity rules, and designate ARM64 nodes for specific workload categories.

ARM64 nodes handle:

Lightweight control plane components (where multi-arch images exist)
Velero node-agent (backup DaemonSet)
Node-level monitoring (Prometheus node-exporter)
Future: Edge/IoT workloads

Positive Consequences

Clear workload segmentation
No scheduling failures from arch mismatch
Efficient use of low-power nodes
Room for future ARM-specific workloads
Cost-effective cluster expansion

Negative Consequences

Some nodes may be underutilized
Must maintain multi-arch image awareness
Additional scheduling complexity

Cluster Composition

Node	Architecture	Role	Instance Type
bruenor	amd64	control-plane	-
catti	amd64	control-plane	-
storm	amd64	control-plane	-
khelben	amd64	GPU worker (Strix Halo)	-
elminster	amd64	GPU worker (NVIDIA)	-
drizzt	amd64	GPU worker (RDNA2)	-
danilo	amd64	GPU worker (Intel Arc)	-
regis	amd64	worker	-
wulfgar	amd64	worker	-
durnan	arm64	worker	raspberry-pi
elaith	arm64	worker	raspberry-pi
jarlaxle	arm64	worker	raspberry-pi
mirt	arm64	worker	raspberry-pi
volo	arm64	worker	raspberry-pi

Node Labels

# Applied via Talos machine config or kubectl
labels:
  kubernetes.io/arch: arm64
  kubernetes.io/os: linux
  node.kubernetes.io/instance-type: raspberry-pi
  kubernetes.io/storage: none  # No Longhorn on Pis

Workload Placement

DaemonSets (Run Everywhere)

These run on all nodes including ARM64:

DaemonSet	Namespace	Multi-arch
velero-node-agent	velero	✅
cilium-agent	kube-system	✅
node-exporter	observability	✅

ARM64-Excluded Workloads

These explicitly exclude ARM64 via node affinity:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/arch
                operator: In
                values:
                  - amd64

Workload Type	Reason for Exclusion
GPU workloads	No GPU on Pis
Longhorn	Pis have no storage label
Heavy databases	Insufficient resources
Most HelmReleases	Image compatibility

ARM64-Compatible Light Workloads

Potential future workloads for ARM64 nodes:

Workload	Use Case
MQTT broker	IoT message routing
Pi-hole	DNS ad blocking
Home Assistant	Home automation
Lightweight proxies	Traffic routing

Storage Exclusion

ARM64 nodes are excluded from Longhorn:

# Longhorn Helm values
defaultSettings:
  systemManagedComponentsNodeSelector: "kubernetes.io/arch:amd64"

Node label:

kubernetes.io/storage: none

Resource Constraints

Node Type	CPU	Memory	Typical Available
Raspberry Pi 4	4 cores	4-8GB	3 cores, 3GB
Raspberry Pi 5	4 cores	8GB	3.5 cores, 6GB

Multi-Architecture Image Strategy

For workloads that should run on ARM64:

Use multi-arch base images (e.g., alpine, debian)

Build with Docker buildx:

docker buildx build --platform linux/amd64,linux/arm64 -t myimage:latest .

Verify arch support before deployment

Monitoring ARM64 Nodes

# Node resource usage by architecture
sum by (node, arch) (
  node_memory_MemAvailable_bytes{} 
  * on(node) group_left(arch) 
  kube_node_labels{label_kubernetes_io_arch!=""}
)

Future Considerations

Edge workloads: ARM64 nodes ideal for edge compute patterns
IoT integration: MQTT, sensor data collection
Scale-out: Add more Pis for lightweight workload capacity
ARM64 ML inference: Some models support ARM (TensorFlow Lite)

5.6 KiB Raw Blame History