daviestechlabs/homelab-design

Fork 0

Files

Billy D. 598875c5a9 docs: add ADR-0011 (KubeRay), ADR-0012 (uv), update architecture docs

2026-02-02 07:10:47 -05:00

20 KiB

Raw Blame History

🏗️ System Architecture

Comprehensive technical overview of the DaviesTechLabs homelab infrastructure

Overview

The homelab is a production-grade Kubernetes cluster running on bare-metal hardware, designed for AI/ML workloads with multi-GPU support. It follows GitOps principles using Flux CD with SOPS-encrypted secrets.

System Layers

┌─────────────────────────────────────────────────────────────────────────────┐
│                              USER LAYER                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐           │
│  │ Companions WebApp│  │   Voice WebApp   │  │   Kubeflow UI    │           │
│  │  HTMX + Alpine   │  │    Gradio UI     │  │  Pipeline Mgmt   │           │
│  └────────┬─────────┘  └────────┬─────────┘  └────────┬─────────┘           │
│           │ WebSocket           │ HTTP/WS             │ HTTP                │
└───────────┴─────────────────────┴─────────────────────┴─────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           INGRESS LAYER                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│  Cloudflared Tunnel ──► Envoy Gateway ──► HTTPRoute CRDs                    │
│                                                                              │
│  External: *.daviestechlabs.io          Internal: *.lab.daviestechlabs.io  │
│  • git.daviestechlabs.io                • kubeflow.lab.daviestechlabs.io   │
│  • auth.daviestechlabs.io               • companions-chat.lab...           │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          MESSAGE BUS LAYER                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                           NATS + JetStream                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Streams:                                                            │    │
│  │  • COMPANIONS_LOGINS (7d retention)  - User analytics               │    │
│  │  • COMPANIONS_CHAT (30d retention)   - Chat history                 │    │
│  │  • AI_CHAT_STREAM (5min, memory)     - Ephemeral streaming          │    │
│  │  • AI_VOICE_STREAM (1h, file)        - Voice processing             │    │
│  │  • AI_PIPELINE (24h, file)           - Workflow triggers            │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  Message Format: MessagePack (binary, not JSON)                             │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
        ┌─────────────────────────┼─────────────────────────┐
        ▼                         ▼                         ▼
┌───────────────────┐   ┌───────────────────┐   ┌───────────────────┐
│   Chat Handler    │   │  Voice Assistant  │   │  Pipeline Bridge  │
├───────────────────┤   ├───────────────────┤   ├───────────────────┤
│ • RAG retrieval   │   │ • STT (Whisper)   │   │ • KFP triggers    │
│ • LLM inference   │   │ • RAG retrieval   │   │ • Argo triggers   │
│ • Streaming resp  │   │ • LLM inference   │   │ • Status updates  │
│ • Session state   │   │ • TTS (XTTS)      │   │ • Error handling  │
└───────────────────┘   └───────────────────┘   └───────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      GPU INFERENCE LAYER (KubeRay)                           │
├─────────────────────────────────────────────────────────────────────────────┤
│  RayService: ai-inference-serve-svc:8000                                    │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    Ray Serve (Unified Endpoint)                      │    │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐   │    │
│  │  │ /whisper │ │   /tts   │ │   /llm   │ │/embeddings│ │/reranker │   │    │
│  │  │ Whisper  │ │  XTTS    │ │  vLLM    │ │  BGE-L    │ │ BGE-Rnk  │   │    │
│  │  │ (0.5 GPU)│ │(0.5 GPU) │ │(0.95 GPU)│ │ (0.8 GPU) │ │(0.8 GPU) │   │    │
│  │  ├──────────┤ ├──────────┤ ├──────────┤ ├──────────┤ ├──────────┤   │    │
│  │  │elminster │ │elminster │ │ khelben  │ │  drizzt  │ │  danilo  │   │    │
│  │  │RTX 2070  │ │RTX 2070  │ │Strix Halo│ │Radeon 680│ │Intel Arc │   │    │
│  │  │  CUDA    │ │  CUDA    │ │  ROCm    │ │  ROCm    │ │  Intel   │   │    │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘   │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  KServe Aliases: {whisper,tts,llm,embeddings,reranker}-predictor.ai-ml     │
│  Milvus: Vector database for RAG (Helm, MinIO backend)                      │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       WORKFLOW ENGINE LAYER                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│  ┌────────────────────────────┐    ┌────────────────────────────┐          │
│  │     Argo Workflows         │◄──►│    Kubeflow Pipelines      │          │
│  ├────────────────────────────┤    ├────────────────────────────┤          │
│  │ • Complex DAG orchestration│    │ • ML pipeline caching      │          │
│  │ • Training workflows       │    │ • Experiment tracking      │          │
│  │ • Document ingestion       │    │ • Model versioning         │          │
│  │ • Batch inference          │    │ • Artifact lineage         │          │
│  └────────────────────────────┘    └────────────────────────────┘          │
│                                                                              │
│  Trigger: Argo Events (EventSource → Sensor → Workflow/Pipeline)           │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        INFRASTRUCTURE LAYER                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│  Storage:                     Compute:                 Security:            │
│  ├─ Longhorn (block)          ├─ Volcano Scheduler     ├─ Vault (secrets)  │
│  ├─ NFS CSI (shared)          ├─ GPU Device Plugins    ├─ Authentik (SSO)  │
│  └─ MinIO (S3)                │   ├─ AMD ROCm          ├─ Falco (runtime)  │
│                               │   ├─ NVIDIA CUDA       └─ SOPS (GitOps)    │
│  Databases:                   │   └─ Intel i915/Arc                        │
│  ├─ CloudNative-PG            └─ Node Feature Discovery                    │
│  ├─ Valkey (cache)                                                          │
│  └─ ClickHouse (analytics)                                                  │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          PLATFORM LAYER                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│  Talos Linux v1.12.1  │  Kubernetes v1.35.0  │  Cilium CNI                 │
│                                                                              │
│  Nodes: storm, bruenor, catti (control) │ elminster, khelben, drizzt,      │
│                                          │ danilo (workers)                 │
└─────────────────────────────────────────────────────────────────────────────┘

Node Topology

Control Plane (HA)

Node	IP	CPU	Memory	Storage	Role
storm	192.168.100.25	Intel 13th Gen (4c)	16GB	500GB NVMe	etcd, API server
bruenor	192.168.100.26	Intel 13th Gen (4c)	16GB	500GB NVMe	etcd, API server
catti	192.168.100.27	Intel 13th Gen (4c)	16GB	500GB NVMe	etcd, API server

VIP: 192.168.100.20 (shared across control plane)

Worker Nodes

Node	IP	CPU	GPU	GPU Memory	Workload
elminster	192.168.100.31	Intel	NVIDIA RTX 2070	8GB VRAM	Whisper, XTTS
khelben	192.168.100.32	AMD Ryzen	AMD Strix Halo	64GB Unified	vLLM (dedicated)
drizzt	192.168.100.40	AMD Ryzen 7 6800H	AMD Radeon 680M	12GB VRAM	BGE Embeddings
danilo	192.168.100.41	Intel Core Ultra 9	Intel Arc	16GB Shared	Reranker

Networking

External Access

Internet → Cloudflare → cloudflared tunnel → Envoy Gateway → Services

DNS Zones

External: *.daviestechlabs.io (Cloudflare DNS)
Internal: *.lab.daviestechlabs.io (internal split-horizon)

Network CIDRs

Network	CIDR	Purpose
Node Network	192.168.100.0/24	Physical nodes
Pod Network	10.42.0.0/16	Kubernetes pods
Service Network	10.43.0.0/16	Kubernetes services

Data Flow: Chat Request

sequenceDiagram
    participant U as User
    participant W as WebApp
    participant N as NATS
    participant C as Chat Handler
    participant M as Milvus
    participant L as vLLM
    participant V as Valkey

    U->>W: Send message
    W->>N: Publish ai.chat.user.{id}.message
    N->>C: Deliver to chat-handler
    C->>V: Get session history
    C->>M: RAG query (if enabled)
    M-->>C: Relevant documents
    C->>L: LLM inference (with context)
    L-->>C: Streaming tokens
    C->>N: Publish ai.chat.response.stream.{id}
    N-->>W: Deliver streaming chunks
    W-->>U: Display tokens
    C->>V: Save to session

GitOps Flow

Developer → Git Push → GitHub/Gitea
                           │
                           ▼
                    ┌─────────────┐
                    │   Flux CD   │
                    │ (reconcile) │
                    └──────┬──────┘
                           │
            ┌──────────────┼──────────────┐
            ▼              ▼              ▼
     ┌──────────┐   ┌──────────┐   ┌──────────┐
     │homelab-  │   │  llm-    │   │  helm    │
     │  k8s2    │   │workflows │   │ charts   │
     └──────────┘   └──────────┘   └──────────┘
            │              │              │
            └──────────────┴──────────────┘
                           │
                           ▼
                    ┌─────────────┐
                    │  Kubernetes │
                    │   Cluster   │
                    └─────────────┘

Security Architecture

Secrets Management

External Secrets Operator ──► Vault / SOPS ──► Kubernetes Secrets

Authentication

User ──► Cloudflare Access ──► Authentik ──► Application
                                   │
                                   └──► OIDC/SAML providers

Network Security

Cilium: Network policies, eBPF-based security
Falco: Runtime security monitoring
RBAC: Fine-grained Kubernetes permissions

High Availability

Control Plane

3-node etcd cluster with automatic leader election
Virtual IP (192.168.100.20) for API server access
Automatic failover via Talos

Workloads

Pod anti-affinity for critical services
HPA for auto-scaling
PodDisruptionBudgets for controlled updates

Storage

Longhorn 3-replica default
MinIO erasure coding for S3
Regular Velero backups

Observability

Metrics Pipeline

Applications ──► OpenTelemetry Collector ──► Prometheus ──► Grafana

Logging Pipeline

Applications ──► Grafana Alloy ──► Loki ──► Grafana

Tracing Pipeline

Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafana

Key Design Decisions

Decision	Rationale	ADR
Talos Linux	Immutable, API-driven, secure	ADR-0002
NATS over Kafka	Simpler ops, sufficient throughput	ADR-0003
MessagePack over JSON	Binary efficiency for audio	ADR-0004
Multi-GPU heterogeneous	Cost optimization, workload matching	ADR-0005
GitOps with Flux	Declarative, auditable, secure	ADR-0006
KServe for inference	Standardized API, autoscaling	ADR-0007
KubeRay unified backend	Fractional GPU, single endpoint	ADR-0011

TECH-STACK.md - Complete technology inventory
DOMAIN-MODEL.md - Core entities and relationships
decisions/ - All architecture decisions

20 KiB Raw Blame History