- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
19 KiB
19 KiB
🏗️ System Architecture
Comprehensive technical overview of the DaviesTechLabs homelab infrastructure
Overview
The homelab is a production-grade Kubernetes cluster running on bare-metal hardware, designed for AI/ML workloads with multi-GPU support. It follows GitOps principles using Flux CD with SOPS-encrypted secrets.
System Layers
┌─────────────────────────────────────────────────────────────────────────────┐
│ USER LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Companions WebApp│ │ Voice WebApp │ │ Kubeflow UI │ │
│ │ HTMX + Alpine │ │ Gradio UI │ │ Pipeline Mgmt │ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
│ │ WebSocket │ HTTP/WS │ HTTP │
└───────────┴─────────────────────┴─────────────────────┴─────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ INGRESS LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ Cloudflared Tunnel ──► Envoy Gateway ──► HTTPRoute CRDs │
│ │
│ External: *.daviestechlabs.io Internal: *.lab.daviestechlabs.io │
│ • git.daviestechlabs.io • kubeflow.lab.daviestechlabs.io │
│ • auth.daviestechlabs.io • companions-chat.lab... │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ MESSAGE BUS LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ NATS + JetStream │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Streams: │ │
│ │ • COMPANIONS_LOGINS (7d retention) - User analytics │ │
│ │ • COMPANIONS_CHAT (30d retention) - Chat history │ │
│ │ • AI_CHAT_STREAM (5min, memory) - Ephemeral streaming │ │
│ │ • AI_VOICE_STREAM (1h, file) - Voice processing │ │
│ │ • AI_PIPELINE (24h, file) - Workflow triggers │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Message Format: MessagePack (binary, not JSON) │
└─────────────────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ Chat Handler │ │ Voice Assistant │ │ Pipeline Bridge │
├───────────────────┤ ├───────────────────┤ ├───────────────────┤
│ • RAG retrieval │ │ • STT (Whisper) │ │ • KFP triggers │
│ • LLM inference │ │ • RAG retrieval │ │ • Argo triggers │
│ • Streaming resp │ │ • LLM inference │ │ • Status updates │
│ • Session state │ │ • TTS (XTTS) │ │ • Error handling │
└───────────────────┘ └───────────────────┘ └───────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ AI SERVICES LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Whisper │ │ XTTS │ │ vLLM │ │ Milvus │ │ BGE │ │Reranker │ │
│ │ (STT) │ │ (TTS) │ │ (LLM) │ │ (RAG) │ │(Embed) │ │ (BGE) │ │
│ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ │
│ │ KServe │ │ KServe │ │ vLLM │ │ Helm │ │ KServe │ │ KServe │ │
│ │ nvidia │ │ nvidia │ │ ROCm │ │ Minio │ │ rdna2 │ │ intel │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW ENGINE LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ ┌────────────────────────────┐ ┌────────────────────────────┐ │
│ │ Argo Workflows │◄──►│ Kubeflow Pipelines │ │
│ ├────────────────────────────┤ ├────────────────────────────┤ │
│ │ • Complex DAG orchestration│ │ • ML pipeline caching │ │
│ │ • Training workflows │ │ • Experiment tracking │ │
│ │ • Document ingestion │ │ • Model versioning │ │
│ │ • Batch inference │ │ • Artifact lineage │ │
│ └────────────────────────────┘ └────────────────────────────┘ │
│ │
│ Trigger: Argo Events (EventSource → Sensor → Workflow/Pipeline) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ INFRASTRUCTURE LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ Storage: Compute: Security: │
│ ├─ Longhorn (block) ├─ Volcano Scheduler ├─ Vault (secrets) │
│ ├─ NFS CSI (shared) ├─ GPU Device Plugins ├─ Authentik (SSO) │
│ └─ MinIO (S3) │ ├─ AMD ROCm ├─ Falco (runtime) │
│ │ ├─ NVIDIA CUDA └─ SOPS (GitOps) │
│ Databases: │ └─ Intel i915/Arc │
│ ├─ CloudNative-PG └─ Node Feature Discovery │
│ ├─ Valkey (cache) │
│ └─ ClickHouse (analytics) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PLATFORM LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ Talos Linux v1.12.1 │ Kubernetes v1.35.0 │ Cilium CNI │
│ │
│ Nodes: storm, bruenor, catti (control) │ elminster, khelben, drizzt, │
│ │ danilo (workers) │
└─────────────────────────────────────────────────────────────────────────────┘
Node Topology
Control Plane (HA)
| Node | IP | CPU | Memory | Storage | Role |
|---|---|---|---|---|---|
| storm | 192.168.100.25 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
| bruenor | 192.168.100.26 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
| catti | 192.168.100.27 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
VIP: 192.168.100.20 (shared across control plane)
Worker Nodes
| Node | IP | CPU | GPU | GPU Memory | Workload |
|---|---|---|---|---|---|
| elminster | 192.168.100.31 | Intel | NVIDIA RTX 2070 | 8GB VRAM | Whisper, XTTS |
| khelben | 192.168.100.32 | AMD Ryzen | AMD Strix Halo | 64GB Unified | vLLM (dedicated) |
| drizzt | 192.168.100.40 | AMD Ryzen 7 6800H | AMD Radeon 680M | 12GB VRAM | BGE Embeddings |
| danilo | 192.168.100.41 | Intel Core Ultra 9 | Intel Arc | 16GB Shared | Reranker |
Networking
External Access
Internet → Cloudflare → cloudflared tunnel → Envoy Gateway → Services
DNS Zones
- External:
*.daviestechlabs.io(Cloudflare DNS) - Internal:
*.lab.daviestechlabs.io(internal split-horizon)
Network CIDRs
| Network | CIDR | Purpose |
|---|---|---|
| Node Network | 192.168.100.0/24 | Physical nodes |
| Pod Network | 10.42.0.0/16 | Kubernetes pods |
| Service Network | 10.43.0.0/16 | Kubernetes services |
Data Flow: Chat Request
sequenceDiagram
participant U as User
participant W as WebApp
participant N as NATS
participant C as Chat Handler
participant M as Milvus
participant L as vLLM
participant V as Valkey
U->>W: Send message
W->>N: Publish ai.chat.user.{id}.message
N->>C: Deliver to chat-handler
C->>V: Get session history
C->>M: RAG query (if enabled)
M-->>C: Relevant documents
C->>L: LLM inference (with context)
L-->>C: Streaming tokens
C->>N: Publish ai.chat.response.stream.{id}
N-->>W: Deliver streaming chunks
W-->>U: Display tokens
C->>V: Save to session
GitOps Flow
Developer → Git Push → GitHub/Gitea
│
▼
┌─────────────┐
│ Flux CD │
│ (reconcile) │
└──────┬──────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│homelab- │ │ llm- │ │ helm │
│ k8s2 │ │workflows │ │ charts │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└──────────────┴──────────────┘
│
▼
┌─────────────┐
│ Kubernetes │
│ Cluster │
└─────────────┘
Security Architecture
Secrets Management
External Secrets Operator ──► Vault / SOPS ──► Kubernetes Secrets
Authentication
User ──► Cloudflare Access ──► Authentik ──► Application
│
└──► OIDC/SAML providers
Network Security
- Cilium: Network policies, eBPF-based security
- Falco: Runtime security monitoring
- RBAC: Fine-grained Kubernetes permissions
High Availability
Control Plane
- 3-node etcd cluster with automatic leader election
- Virtual IP (192.168.100.20) for API server access
- Automatic failover via Talos
Workloads
- Pod anti-affinity for critical services
- HPA for auto-scaling
- PodDisruptionBudgets for controlled updates
Storage
- Longhorn 3-replica default
- MinIO erasure coding for S3
- Regular Velero backups
Observability
Metrics Pipeline
Applications ──► OpenTelemetry Collector ──► Prometheus ──► Grafana
Logging Pipeline
Applications ──► Grafana Alloy ──► Loki ──► Grafana
Tracing Pipeline
Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafana
Key Design Decisions
| Decision | Rationale | ADR |
|---|---|---|
| Talos Linux | Immutable, API-driven, secure | ADR-0002 |
| NATS over Kafka | Simpler ops, sufficient throughput | ADR-0003 |
| MessagePack over JSON | Binary efficiency for audio | ADR-0004 |
| Multi-GPU heterogeneous | Cost optimization, workload matching | ADR-0005 |
| GitOps with Flux | Declarative, auditable, secure | ADR-0006 |
Related Documents
- TECH-STACK.md - Complete technology inventory
- DOMAIN-MODEL.md - Core entities and relationships
- decisions/ - All architecture decisions