feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00
parent 4d4f6f464c
commit 832cda34bd
26 changed files with 3805 additions and 2 deletions
--- a/AGENT-ONBOARDING.md
+++ b/AGENT-ONBOARDING.md
@@ -0,0 +1,191 @@
+# 🤖 Agent Onboarding
+
+> **This is the most important file for AI agents working on this codebase.**
+
+## TL;DR
+
+You are working on a **homelab Kubernetes cluster** running:
+- **Talos Linux v1.12.1** on bare-metal nodes
+- **Kubernetes v1.35.0** with Flux CD GitOps
+- **AI/ML platform** with KServe, Kubeflow, Milvus, NATS
+- **Multi-GPU** (AMD ROCm, NVIDIA CUDA, Intel Arc)
+
+## 🗺️ Repository Map
+
+| Repo | What It Contains | When to Edit |
+|------|------------------|--------------|
+| `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes |
+| `llm-workflows` | NATS handlers, Argo/KFP workflows | Workflow/handler changes |
+| `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes |
+| `homelab-design` (this) | Architecture docs, ADRs | Design decisions |
+
+## 🏗️ System Architecture (30-Second Version)
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         USER INTERFACES                          │
+│  Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI           │
+└───────────────────────────┬─────────────────────────────────────┘
+                            │ WebSocket/HTTP
+                            ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      NATS MESSAGE BUS                            │
+│  Subjects: ai.chat.*, ai.voice.*, ai.pipeline.*                 │
+│  Format: MessagePack (binary)                                   │
+└───────────────────────────┬─────────────────────────────────────┘
+                            │
+        ┌───────────────────┼───────────────────┐
+        ▼                   ▼                   ▼
+┌───────────────┐   ┌───────────────┐   ┌───────────────┐
+│ Chat Handler  │   │Voice Assistant│   │Pipeline Bridge│
+│ (RAG+LLM)     │   │ (STT→LLM→TTS) │   │ (KFP/Argo)    │
+└───────┬───────┘   └───────┬───────┘   └───────┬───────┘
+        │                   │                   │
+        └───────────────────┼───────────────────┘
+                            ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                       AI SERVICES                                │
+│  Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker         │
+│    STT   │ TTS  │ LLM  │  RAG   │   Embed   │  Rank            │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## 📁 Key File Locations
+
+### Infrastructure (`homelab-k8s2`)
+
+```
+kubernetes/apps/
+├── ai-ml/                    # 🧠 AI/ML services
+│   ├── kserve/               #   InferenceServices
+│   ├── kubeflow/             #   Pipelines, Training Operator
+│   ├── milvus/               #   Vector database
+│   ├── nats/                 #   Message bus
+│   ├── vllm/                 #   LLM inference
+│   └── llm-workflows/        #   GitRepo sync to llm-workflows
+├── analytics/                # 📊 Spark, Flink, ClickHouse
+├── observability/            # 📈 Grafana, Alloy, OpenTelemetry
+└── security/                 # 🔒 Vault, Authentik, Falco
+
+talos/
+├── talconfig.yaml            # Node definitions
+├── patches/                  # GPU-specific patches
+│   ├── amd/amdgpu.yaml
+│   └── nvidia/nvidia-runtime.yaml
+```
+
+### Workflows (`llm-workflows`)
+
+```
+workflows/                    # NATS handler deployments
+├── chat-handler.yaml
+├── voice-assistant.yaml
+└── pipeline-bridge.yaml
+
+argo/                         # Argo WorkflowTemplates
+├── document-ingestion.yaml
+├── batch-inference.yaml
+└── qlora-training.yaml
+
+pipelines/                    # Kubeflow Pipeline Python
+├── voice_pipeline.py
+└── document_ingestion_pipeline.py
+```
+
+## 🔌 Service Endpoints (Internal)
+
+```python
+# Copy-paste ready for Python code
+NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
+VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
+WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
+TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
+EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
+RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
+MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
+MILVUS_PORT = 19530
+VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"
+```
+
+## 📨 NATS Subject Patterns
+
+```python
+# Chat
+f"ai.chat.user.{user_id}.message"      # User sends message
+f"ai.chat.response.{request_id}"       # Response back
+f"ai.chat.response.stream.{request_id}" # Streaming tokens
+
+# Voice
+f"ai.voice.user.{user_id}.request"     # Voice input
+f"ai.voice.response.{request_id}"      # Voice output
+
+# Pipelines
+"ai.pipeline.trigger"                   # Trigger any pipeline
+f"ai.pipeline.status.{request_id}"     # Status updates
+```
+
+## 🎮 GPU Allocation
+
+| Node | GPU | Workload | Memory |
+|------|-----|----------|--------|
+| khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified |
+| elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM |
+| drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM |
+| danilo | Intel Arc | Reranker | 16GB shared |
+
+## ⚡ Common Tasks
+
+### Deploy a New AI Service
+
+1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/`
+2. Add endpoint to `llm-workflows/config/ai-services-config.yaml`
+3. Push to main → Flux deploys automatically
+
+### Add a New Workflow
+
+1. Create handler in `llm-workflows/chat-handler/` or `llm-workflows/voice-assistant/`
+2. Add Kubernetes Deployment in `llm-workflows/workflows/`
+3. Push to main → Flux deploys automatically
+
+### Create Architecture Decision
+
+1. Copy `decisions/0000-template.md` to `decisions/NNNN-title.md`
+2. Fill in context, decision, consequences
+3. Submit PR
+
+## ❌ Antipatterns to Avoid
+
+1. **Don't hardcode secrets** - Use External Secrets Operator
+2. **Don't use `latest` tags** - Pin versions for reproducibility
+3. **Don't skip ADRs** - Document significant decisions
+4. **Don't bypass Flux** - All changes via Git, never `kubectl apply` directly
+
+## 📚 Where to Learn More
+
+- [ARCHITECTURE.md](ARCHITECTURE.md) - Full system design
+- [TECH-STACK.md](TECH-STACK.md) - All technologies used
+- [decisions/](decisions/) - Why we made certain choices
+- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities
+
+## 🆘 Quick Debugging
+
+```bash
+# Check Flux sync status
+flux get all -A
+
+# View NATS JetStream streams
+kubectl exec -n ai-ml deploy/nats-box -- nats stream ls
+
+# Check GPU allocation
+kubectl describe node khelben | grep -A10 "Allocated"
+
+# View KServe inference services
+kubectl get inferenceservices -n ai-ml
+
+# Tail AI service logs
+kubectl logs -n ai-ml -l app=chat-handler -f
+```
+
+---
+
+*This document is the canonical starting point for AI agents. When in doubt, check the ADRs.*