feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
This commit is contained in:
2026-02-01 14:30:05 -05:00
parent 4d4f6f464c
commit 832cda34bd
26 changed files with 3805 additions and 2 deletions

287
ARCHITECTURE.md Normal file
View File

@@ -0,0 +1,287 @@
# 🏗️ System Architecture
> **Comprehensive technical overview of the DaviesTechLabs homelab infrastructure**
## Overview
The homelab is a production-grade Kubernetes cluster running on bare-metal hardware, designed for AI/ML workloads with multi-GPU support. It follows GitOps principles using Flux CD with SOPS-encrypted secrets.
## System Layers
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ USER LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Companions WebApp│ │ Voice WebApp │ │ Kubeflow UI │ │
│ │ HTMX + Alpine │ │ Gradio UI │ │ Pipeline Mgmt │ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
│ │ WebSocket │ HTTP/WS │ HTTP │
└───────────┴─────────────────────┴─────────────────────┴─────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ INGRESS LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ Cloudflared Tunnel ──► Envoy Gateway ──► HTTPRoute CRDs │
│ │
│ External: *.daviestechlabs.io Internal: *.lab.daviestechlabs.io │
│ • git.daviestechlabs.io • kubeflow.lab.daviestechlabs.io │
│ • auth.daviestechlabs.io • companions-chat.lab... │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ MESSAGE BUS LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ NATS + JetStream │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Streams: │ │
│ │ • COMPANIONS_LOGINS (7d retention) - User analytics │ │
│ │ • COMPANIONS_CHAT (30d retention) - Chat history │ │
│ │ • AI_CHAT_STREAM (5min, memory) - Ephemeral streaming │ │
│ │ • AI_VOICE_STREAM (1h, file) - Voice processing │ │
│ │ • AI_PIPELINE (24h, file) - Workflow triggers │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Message Format: MessagePack (binary, not JSON) │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────┼─────────────────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ Chat Handler │ │ Voice Assistant │ │ Pipeline Bridge │
├───────────────────┤ ├───────────────────┤ ├───────────────────┤
│ • RAG retrieval │ │ • STT (Whisper) │ │ • KFP triggers │
│ • LLM inference │ │ • RAG retrieval │ │ • Argo triggers │
│ • Streaming resp │ │ • LLM inference │ │ • Status updates │
│ • Session state │ │ • TTS (XTTS) │ │ • Error handling │
└───────────────────┘ └───────────────────┘ └───────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ AI SERVICES LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Whisper │ │ XTTS │ │ vLLM │ │ Milvus │ │ BGE │ │Reranker │ │
│ │ (STT) │ │ (TTS) │ │ (LLM) │ │ (RAG) │ │(Embed) │ │ (BGE) │ │
│ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ │
│ │ KServe │ │ KServe │ │ vLLM │ │ Helm │ │ KServe │ │ KServe │ │
│ │ nvidia │ │ nvidia │ │ ROCm │ │ Minio │ │ rdna2 │ │ intel │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW ENGINE LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ ┌────────────────────────────┐ ┌────────────────────────────┐ │
│ │ Argo Workflows │◄──►│ Kubeflow Pipelines │ │
│ ├────────────────────────────┤ ├────────────────────────────┤ │
│ │ • Complex DAG orchestration│ │ • ML pipeline caching │ │
│ │ • Training workflows │ │ • Experiment tracking │ │
│ │ • Document ingestion │ │ • Model versioning │ │
│ │ • Batch inference │ │ • Artifact lineage │ │
│ └────────────────────────────┘ └────────────────────────────┘ │
│ │
│ Trigger: Argo Events (EventSource → Sensor → Workflow/Pipeline) │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ INFRASTRUCTURE LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ Storage: Compute: Security: │
│ ├─ Longhorn (block) ├─ Volcano Scheduler ├─ Vault (secrets) │
│ ├─ NFS CSI (shared) ├─ GPU Device Plugins ├─ Authentik (SSO) │
│ └─ MinIO (S3) │ ├─ AMD ROCm ├─ Falco (runtime) │
│ │ ├─ NVIDIA CUDA └─ SOPS (GitOps) │
│ Databases: │ └─ Intel i915/Arc │
│ ├─ CloudNative-PG └─ Node Feature Discovery │
│ ├─ Valkey (cache) │
│ └─ ClickHouse (analytics) │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ PLATFORM LAYER │
├─────────────────────────────────────────────────────────────────────────────┤
│ Talos Linux v1.12.1 │ Kubernetes v1.35.0 │ Cilium CNI │
│ │
│ Nodes: storm, bruenor, catti (control) │ elminster, khelben, drizzt, │
│ │ danilo (workers) │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Node Topology
### Control Plane (HA)
| Node | IP | CPU | Memory | Storage | Role |
|------|-------|-----|--------|---------|------|
| storm | 192.168.100.25 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
| bruenor | 192.168.100.26 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
| catti | 192.168.100.27 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
**VIP**: 192.168.100.20 (shared across control plane)
### Worker Nodes
| Node | IP | CPU | GPU | GPU Memory | Workload |
|------|-------|-----|-----|------------|----------|
| elminster | 192.168.100.31 | Intel | NVIDIA RTX 2070 | 8GB VRAM | Whisper, XTTS |
| khelben | 192.168.100.32 | AMD Ryzen | AMD Strix Halo | 64GB Unified | vLLM (dedicated) |
| drizzt | 192.168.100.40 | AMD Ryzen 7 6800H | AMD Radeon 680M | 12GB VRAM | BGE Embeddings |
| danilo | 192.168.100.41 | Intel Core Ultra 9 | Intel Arc | 16GB Shared | Reranker |
## Networking
### External Access
```
Internet → Cloudflare → cloudflared tunnel → Envoy Gateway → Services
```
### DNS Zones
- **External**: `*.daviestechlabs.io` (Cloudflare DNS)
- **Internal**: `*.lab.daviestechlabs.io` (internal split-horizon)
### Network CIDRs
| Network | CIDR | Purpose |
|---------|------|---------|
| Node Network | 192.168.100.0/24 | Physical nodes |
| Pod Network | 10.42.0.0/16 | Kubernetes pods |
| Service Network | 10.43.0.0/16 | Kubernetes services |
## Data Flow: Chat Request
```mermaid
sequenceDiagram
participant U as User
participant W as WebApp
participant N as NATS
participant C as Chat Handler
participant M as Milvus
participant L as vLLM
participant V as Valkey
U->>W: Send message
W->>N: Publish ai.chat.user.{id}.message
N->>C: Deliver to chat-handler
C->>V: Get session history
C->>M: RAG query (if enabled)
M-->>C: Relevant documents
C->>L: LLM inference (with context)
L-->>C: Streaming tokens
C->>N: Publish ai.chat.response.stream.{id}
N-->>W: Deliver streaming chunks
W-->>U: Display tokens
C->>V: Save to session
```
## GitOps Flow
```
Developer → Git Push → GitHub/Gitea
┌─────────────┐
│ Flux CD │
│ (reconcile) │
└──────┬──────┘
┌──────────────┼──────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│homelab- │ │ llm- │ │ helm │
│ k8s2 │ │workflows │ │ charts │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└──────────────┴──────────────┘
┌─────────────┐
│ Kubernetes │
│ Cluster │
└─────────────┘
```
## Security Architecture
### Secrets Management
```
External Secrets Operator ──► Vault / SOPS ──► Kubernetes Secrets
```
### Authentication
```
User ──► Cloudflare Access ──► Authentik ──► Application
└──► OIDC/SAML providers
```
### Network Security
- **Cilium**: Network policies, eBPF-based security
- **Falco**: Runtime security monitoring
- **RBAC**: Fine-grained Kubernetes permissions
## High Availability
### Control Plane
- 3-node etcd cluster with automatic leader election
- Virtual IP (192.168.100.20) for API server access
- Automatic failover via Talos
### Workloads
- Pod anti-affinity for critical services
- HPA for auto-scaling
- PodDisruptionBudgets for controlled updates
### Storage
- Longhorn 3-replica default
- MinIO erasure coding for S3
- Regular Velero backups
## Observability
### Metrics Pipeline
```
Applications ──► OpenTelemetry Collector ──► Prometheus ──► Grafana
```
### Logging Pipeline
```
Applications ──► Grafana Alloy ──► Loki ──► Grafana
```
### Tracing Pipeline
```
Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafana
```
## Key Design Decisions
| Decision | Rationale | ADR |
|----------|-----------|-----|
| Talos Linux | Immutable, API-driven, secure | [ADR-0002](decisions/0002-use-talos-linux.md) |
| NATS over Kafka | Simpler ops, sufficient throughput | [ADR-0003](decisions/0003-use-nats-for-messaging.md) |
| MessagePack over JSON | Binary efficiency for audio | [ADR-0004](decisions/0004-use-messagepack-for-nats.md) |
| Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) |
| GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
## Related Documents
- [TECH-STACK.md](TECH-STACK.md) - Complete technology inventory
- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities and relationships
- [decisions/](decisions/) - All architecture decisions