daviestechlabs/homelab-design

Fork 0

Files

Billy D. e57d998d9a docs(adr): add ADR-0061 Go handler refactor

2026-02-19 07:14:36 -05:00

21 KiB

Raw Blame History

🏗️ System Architecture

Comprehensive technical overview of the DaviesTechLabs homelab infrastructure

Overview

The homelab is a production-grade Kubernetes cluster running on bare-metal hardware, designed for AI/ML workloads with multi-GPU support. It follows GitOps principles using Flux CD with SOPS-encrypted secrets.

System Layers

┌─────────────────────────────────────────────────────────────────────────────┐
│                              USER LAYER                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐           │
│  │ Companions WebApp│  │   Voice WebApp   │  │   Kubeflow UI    │           │
│  │  HTMX + Alpine   │  │    Gradio UI     │  │  Pipeline Mgmt   │           │
│  └────────┬─────────┘  └────────┬─────────┘  └────────┬─────────┘           │
│           │ WebSocket           │ HTTP/WS             │ HTTP                │
└───────────┴─────────────────────┴─────────────────────┴─────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           INGRESS LAYER                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│  Cloudflared Tunnel ──► Envoy Gateway ──► HTTPRoute CRDs                    │
│                                                                              │
│  External: *.daviestechlabs.io          Internal: *.lab.daviestechlabs.io  │
│  • git.daviestechlabs.io                • kubeflow.lab.daviestechlabs.io   │
│  • auth.daviestechlabs.io               • companions-chat.lab...           │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          MESSAGE BUS LAYER                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                           NATS + JetStream                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Streams:                                                            │    │
│  │  • COMPANIONS_LOGINS (7d retention)  - User analytics               │    │
│  │  • COMPANIONS_CHAT (30d retention)   - Chat history                 │    │
│  │  • AI_CHAT_STREAM (5min, memory)     - Ephemeral streaming          │    │
│  │  • AI_VOICE_STREAM (1h, file)        - Voice processing             │    │
│  │  • AI_PIPELINE (24h, file)           - Workflow triggers            │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  Message Format: MessagePack (binary, not JSON)                             │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
        ┌─────────────────────────┼─────────────────────────┐
        ▼                         ▼                         ▼
┌───────────────────┐   ┌───────────────────┐   ┌───────────────────┐
│   Chat Handler    │   │  Voice Assistant  │   │  Pipeline Bridge  │
├───────────────────┤   ├───────────────────┤   ├───────────────────┤
│ • RAG retrieval   │   │ • STT (Whisper)   │   │ • KFP triggers    │
│ • LLM inference   │   │ • RAG retrieval   │   │ • Argo triggers   │
│ • Streaming resp  │   │ • LLM inference   │   │ • Status updates  │
│ • Session state   │   │ • TTS (XTTS)      │   │ • Error handling  │
└───────────────────┘   └───────────────────┘   └───────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      GPU INFERENCE LAYER (KubeRay)                           │
├─────────────────────────────────────────────────────────────────────────────┤
│  RayService: ai-inference-serve-svc:8000                                    │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    Ray Serve (Unified Endpoint)                      │    │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐   │    │
│  │  │ /whisper │ │   /tts   │ │   /llm   │ │/embeddings│ │/reranker │   │    │
│  │  │ Whisper  │ │  XTTS    │ │  vLLM    │ │  BGE-L    │ │ BGE-Rnk  │   │    │
│  │  │ (0.5 GPU)│ │(0.5 GPU) │ │(0.95 GPU)│ │ (0.8 GPU) │ │(0.8 GPU) │   │    │
│  │  ├──────────┤ ├──────────┤ ├──────────┤ ├──────────┤ ├──────────┤   │    │
│  │  │elminster │ │elminster │ │ khelben  │ │  drizzt  │ │  danilo  │   │    │
│  │  │RTX 2070  │ │RTX 2070  │ │Strix Halo│ │Radeon 680│ │Intel Arc │   │    │
│  │  │  CUDA    │ │  CUDA    │ │  ROCm    │ │  ROCm    │ │  Intel   │   │    │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘   │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  KServe Aliases: {whisper,tts,llm,embeddings,reranker}-predictor.ai-ml     │
│  Milvus: Vector database for RAG (Helm, MinIO backend)                      │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       WORKFLOW ENGINE LAYER                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│  ┌────────────────────────────┐    ┌────────────────────────────┐          │
│  │     Argo Workflows         │◄──►│    Kubeflow Pipelines      │          │
│  ├────────────────────────────┤    ├────────────────────────────┤          │
│  │ • Complex DAG orchestration│    │ • ML pipeline caching      │          │
│  │ • Training workflows       │    │ • Experiment tracking      │          │
│  │ • Document ingestion       │    │ • Model versioning         │          │
│  │ • Batch inference          │    │ • Artifact lineage         │          │
│  └────────────────────────────┘    └────────────────────────────┘          │
│                                                                              │
│  Trigger: Argo Events (EventSource → Sensor → Workflow/Pipeline)           │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        INFRASTRUCTURE LAYER                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│  Storage:                     Compute:                 Security:            │
│  ├─ Longhorn (block)          ├─ Volcano Scheduler     ├─ Vault (secrets)  │
│  ├─ NFS CSI (shared)          ├─ GPU Device Plugins    ├─ Authentik (SSO)  │
│  └─ MinIO (S3)                │   ├─ AMD ROCm          ├─ Falco (runtime)  │
│                               │   ├─ NVIDIA CUDA       └─ SOPS (GitOps)    │
│  Databases:                   │   └─ Intel i915/Arc                        │
│  ├─ CloudNative-PG            └─ Node Feature Discovery                    │
│  ├─ Valkey (cache)                                                          │
│  └─ ClickHouse (analytics)                                                  │
└─────────────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          PLATFORM LAYER                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│  Talos Linux v1.12.x  │  Kubernetes v1.35.0  │  Cilium CNI                 │
│                                                                              │
│  14 nodes: 3 control plane │ 4 GPU workers │ 2 CPU-only x86 workers       │
│            │ 5 Raspberry Pi (arm64) workers                                 │
└─────────────────────────────────────────────────────────────────────────────┘

Node Topology

Control Plane (HA)

Node	IP	CPU	Memory	Storage	Role
storm	192.168.100.25	Intel 13th Gen (4c)	16GB	500GB NVMe	etcd, API server
bruenor	192.168.100.26	Intel 13th Gen (4c)	16GB	500GB NVMe	etcd, API server
catti	192.168.100.27	Intel 13th Gen (4c)	16GB	500GB NVMe	etcd, API server

VIP: 192.168.100.20 (shared across control plane)

Worker Nodes — GPU

Node	IP	CPU	RAM	GPU	GPU Memory	Workload
elminster	192.168.100.31	Intel (16c)	62 GB	NVIDIA RTX 2070	8 GB VRAM	Whisper, XTTS
khelben	192.168.100.32	AMD Ryzen (32c)	94 GB	AMD Strix Halo	32 GB Unified	vLLM (dedicated)
drizzt	192.168.100.40	AMD Ryzen 7 6800H (16c)	27 GB	AMD Radeon 680M	12 GB VRAM	BGE Embeddings
danilo	192.168.100.41	Intel Core Ultra 9 (22c)	62 GB	Intel Arc	16 GB Shared	Reranker

Worker Nodes — CPU-only (x86_64)

Node	IP	CPU	RAM	Workload
regis	192.168.100.43	Intel (4c)	16 GB	General workloads
wulfgar	192.168.100.42	Intel (4c)	31 GB	General workloads

Worker Nodes — Raspberry Pi (arm64)

Node	IP	CPU	RAM	Workload
durnan	192.168.100.54	Cortex-A72 (4c)	4 GB	Lightweight services
jarlaxle	192.168.100.53	Cortex-A72 (4c)	4 GB	Lightweight services
mirt	192.168.100.52	Cortex-A72 (4c)	4 GB	Lightweight services
volo	192.168.100.51	Cortex-A72 (4c)	4 GB	Lightweight services
elaith	192.168.100.55	Cortex-A72 (4c)	8 GB	Lightweight services

Cluster Totals

Resource	Total
Nodes	14 (3 control + 11 worker)
CPU cores	~126
System RAM	~378 GB
Architectures	amd64, arm64
GPUs	4 (NVIDIA, AMD, Intel)

Networking

External Access

Internet → Cloudflare → cloudflared tunnel → Envoy Gateway → Services

DNS Zones

External: *.daviestechlabs.io (Cloudflare DNS)
Internal: *.lab.daviestechlabs.io (internal split-horizon)

Network CIDRs

Network	CIDR	Purpose
Node Network	192.168.100.0/24	Physical nodes
Pod Network	10.42.0.0/16	Kubernetes pods
Service Network	10.43.0.0/16	Kubernetes services

Data Flow: Chat Request

sequenceDiagram
    participant U as User
    participant W as WebApp
    participant N as NATS
    participant C as Chat Handler
    participant M as Milvus
    participant L as vLLM
    participant V as Valkey

    U->>W: Send message
    W->>N: Publish ai.chat.user.{id}.message
    N->>C: Deliver to chat-handler
    C->>V: Get session history
    C->>M: RAG query (if enabled)
    M-->>C: Relevant documents
    C->>L: LLM inference (with context)
    L-->>C: Streaming tokens
    C->>N: Publish ai.chat.response.stream.{id}
    N-->>W: Deliver streaming chunks
    W-->>U: Display tokens
    C->>V: Save to session

GitOps Flow

Developer → Git Push → GitHub/Gitea
                           │
                           ▼
                    ┌─────────────┐
                    │   Flux CD   │
                    │ (reconcile) │
                    └──────┬──────┘
                           │
            ┌──────────────┼──────────────┐
            ▼              ▼              ▼
     ┌──────────┐   ┌──────────┐   ┌──────────┐
     │homelab-  │   │  llm-    │   │  helm    │
     │  k8s2    │   │workflows │   │ charts   │
     └──────────┘   └──────────┘   └──────────┘
            │              │              │
            └──────────────┴──────────────┘
                           │
                           ▼
                    ┌─────────────┐
                    │  Kubernetes │
                    │   Cluster   │
                    └─────────────┘

Security Architecture

Secrets Management

External Secrets Operator ──► Vault / SOPS ──► Kubernetes Secrets

Authentication

User ──► Cloudflare Access ──► Authentik ──► Application
                                   │
                                   └──► OIDC/SAML providers

Network Security

Cilium: Network policies, eBPF-based security
Falco: Runtime security monitoring
RBAC: Fine-grained Kubernetes permissions

High Availability

Control Plane

3-node etcd cluster with automatic leader election
Virtual IP (192.168.100.20) for API server access
Automatic failover via Talos

Workloads

Pod anti-affinity for critical services
HPA for auto-scaling
PodDisruptionBudgets for controlled updates

Storage

Longhorn 3-replica default
MinIO erasure coding for S3
Regular Velero backups

Observability

Metrics Pipeline

Applications ──► OpenTelemetry Collector ──► Prometheus ──► Grafana

Logging Pipeline

Applications ──► Grafana Alloy ──► Loki ──► Grafana

Tracing Pipeline

Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafana

Key Design Decisions

Decision	Rationale	ADR
Talos Linux	Immutable, API-driven, secure	ADR-0002
NATS over Kafka	Simpler ops, sufficient throughput	ADR-0003
MessagePack over JSON	Binary efficiency for audio	ADR-0004
Multi-GPU heterogeneous	Cost optimization, workload matching	ADR-0005
GitOps with Flux	Declarative, auditable, secure	ADR-0006
KServe for inference	Standardized API, autoscaling	ADR-0007
KubeRay unified backend	Fractional GPU, single endpoint	ADR-0011
Go handler refactor	Slim images for non-ML services	ADR-0061

TECH-STACK.md - Complete technology inventory
DOMAIN-MODEL.md - Core entities and relationships
decisions/ - All architecture decisions

21 KiB Raw Blame History