Files
homelab-design/decisions/0002-use-talos-linux.md
Billy D. 832cda34bd feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00

2.8 KiB

Use Talos Linux for Kubernetes Nodes

  • Status: accepted
  • Date: 2025-11-30
  • Deciders: Billy Davies
  • Technical Story: Selecting OS for bare-metal Kubernetes cluster

Context and Problem Statement

We need a reliable, secure operating system for running Kubernetes on bare-metal homelab nodes. The OS should minimize attack surface, be easy to manage at scale, and support our GPU requirements (AMD ROCm, NVIDIA CUDA, Intel).

Decision Drivers

  • Security-first design (immutable, minimal)
  • API-driven management (no SSH)
  • Support for various GPU drivers
  • Kubernetes-native focus
  • Community support and updates
  • Ease of upgrades

Considered Options

  • Ubuntu Server with kubeadm
  • Flatcar Container Linux
  • Talos Linux
  • k3OS (discontinued)
  • Rocky Linux with RKE2

Decision Outcome

Chosen option: "Talos Linux", because it provides an immutable, API-driven, Kubernetes-focused OS that minimizes attack surface and simplifies operations.

Positive Consequences

  • Immutable root filesystem prevents drift
  • No SSH reduces attack vectors
  • API-driven management integrates well with GitOps
  • Schematic system allows custom kernel modules (GPU drivers)
  • Consistent configuration across all nodes
  • Automatic updates with minimal disruption

Negative Consequences

  • Learning curve for API-driven management
  • Debugging requires different approaches (no SSH)
  • Custom extensions require schematic IDs
  • Less flexibility for non-Kubernetes workloads

Pros and Cons of the Options

Ubuntu Server with kubeadm

  • Good, because familiar
  • Good, because extensive package availability
  • Good, because easy debugging via SSH
  • Bad, because mutable system leads to drift
  • Bad, because large attack surface
  • Bad, because manual package management

Flatcar Container Linux

  • Good, because immutable
  • Good, because auto-updates
  • Good, because container-focused
  • Bad, because less Kubernetes-specific
  • Bad, because smaller community than Talos
  • Bad, because GPU driver setup more complex

Talos Linux

  • Good, because purpose-built for Kubernetes
  • Good, because immutable and minimal
  • Good, because API-driven (no SSH)
  • Good, because excellent Kubernetes integration
  • Good, because active development and community
  • Good, because schematic system for GPU drivers
  • Bad, because learning curve
  • Bad, because no traditional debugging

k3OS

  • Good, because simple
  • Bad, because discontinued

Rocky Linux with RKE2

  • Good, because enterprise-like
  • Good, because familiar Linux experience
  • Bad, because mutable system
  • Bad, because more operational overhead
  • Bad, because larger attack surface