Files
homelab-design/decisions/0002-use-talos-linux.md
Billy D. 832cda34bd feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00

98 lines
2.8 KiB
Markdown

# Use Talos Linux for Kubernetes Nodes
* Status: accepted
* Date: 2025-11-30
* Deciders: Billy Davies
* Technical Story: Selecting OS for bare-metal Kubernetes cluster
## Context and Problem Statement
We need a reliable, secure operating system for running Kubernetes on bare-metal homelab nodes. The OS should minimize attack surface, be easy to manage at scale, and support our GPU requirements (AMD ROCm, NVIDIA CUDA, Intel).
## Decision Drivers
* Security-first design (immutable, minimal)
* API-driven management (no SSH)
* Support for various GPU drivers
* Kubernetes-native focus
* Community support and updates
* Ease of upgrades
## Considered Options
* Ubuntu Server with kubeadm
* Flatcar Container Linux
* Talos Linux
* k3OS (discontinued)
* Rocky Linux with RKE2
## Decision Outcome
Chosen option: "Talos Linux", because it provides an immutable, API-driven, Kubernetes-focused OS that minimizes attack surface and simplifies operations.
### Positive Consequences
* Immutable root filesystem prevents drift
* No SSH reduces attack vectors
* API-driven management integrates well with GitOps
* Schematic system allows custom kernel modules (GPU drivers)
* Consistent configuration across all nodes
* Automatic updates with minimal disruption
### Negative Consequences
* Learning curve for API-driven management
* Debugging requires different approaches (no SSH)
* Custom extensions require schematic IDs
* Less flexibility for non-Kubernetes workloads
## Pros and Cons of the Options
### Ubuntu Server with kubeadm
* Good, because familiar
* Good, because extensive package availability
* Good, because easy debugging via SSH
* Bad, because mutable system leads to drift
* Bad, because large attack surface
* Bad, because manual package management
### Flatcar Container Linux
* Good, because immutable
* Good, because auto-updates
* Good, because container-focused
* Bad, because less Kubernetes-specific
* Bad, because smaller community than Talos
* Bad, because GPU driver setup more complex
### Talos Linux
* Good, because purpose-built for Kubernetes
* Good, because immutable and minimal
* Good, because API-driven (no SSH)
* Good, because excellent Kubernetes integration
* Good, because active development and community
* Good, because schematic system for GPU drivers
* Bad, because learning curve
* Bad, because no traditional debugging
### k3OS
* Good, because simple
* Bad, because discontinued
### Rocky Linux with RKE2
* Good, because enterprise-like
* Good, because familiar Linux experience
* Bad, because mutable system
* Bad, because more operational overhead
* Bad, because larger attack surface
## Links
* [Talos Linux](https://talos.dev)
* [Talos Image Factory](https://factory.talos.dev)
* Related: [ADR-0005](0005-multi-gpu-strategy.md) - GPU driver integration via schematics