feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
This commit is contained in:
191
AGENT-ONBOARDING.md
Normal file
191
AGENT-ONBOARDING.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# 🤖 Agent Onboarding
|
||||
|
||||
> **This is the most important file for AI agents working on this codebase.**
|
||||
|
||||
## TL;DR
|
||||
|
||||
You are working on a **homelab Kubernetes cluster** running:
|
||||
- **Talos Linux v1.12.1** on bare-metal nodes
|
||||
- **Kubernetes v1.35.0** with Flux CD GitOps
|
||||
- **AI/ML platform** with KServe, Kubeflow, Milvus, NATS
|
||||
- **Multi-GPU** (AMD ROCm, NVIDIA CUDA, Intel Arc)
|
||||
|
||||
## 🗺️ Repository Map
|
||||
|
||||
| Repo | What It Contains | When to Edit |
|
||||
|------|------------------|--------------|
|
||||
| `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes |
|
||||
| `llm-workflows` | NATS handlers, Argo/KFP workflows | Workflow/handler changes |
|
||||
| `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes |
|
||||
| `homelab-design` (this) | Architecture docs, ADRs | Design decisions |
|
||||
|
||||
## 🏗️ System Architecture (30-Second Version)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ USER INTERFACES │
|
||||
│ Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI │
|
||||
└───────────────────────────┬─────────────────────────────────────┘
|
||||
│ WebSocket/HTTP
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ NATS MESSAGE BUS │
|
||||
│ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │
|
||||
│ Format: MessagePack (binary) │
|
||||
└───────────────────────────┬─────────────────────────────────────┘
|
||||
│
|
||||
┌───────────────────┼───────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
||||
│ Chat Handler │ │Voice Assistant│ │Pipeline Bridge│
|
||||
│ (RAG+LLM) │ │ (STT→LLM→TTS) │ │ (KFP/Argo) │
|
||||
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
|
||||
│ │ │
|
||||
└───────────────────┼───────────────────┘
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ AI SERVICES │
|
||||
│ Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker │
|
||||
│ STT │ TTS │ LLM │ RAG │ Embed │ Rank │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 📁 Key File Locations
|
||||
|
||||
### Infrastructure (`homelab-k8s2`)
|
||||
|
||||
```
|
||||
kubernetes/apps/
|
||||
├── ai-ml/ # 🧠 AI/ML services
|
||||
│ ├── kserve/ # InferenceServices
|
||||
│ ├── kubeflow/ # Pipelines, Training Operator
|
||||
│ ├── milvus/ # Vector database
|
||||
│ ├── nats/ # Message bus
|
||||
│ ├── vllm/ # LLM inference
|
||||
│ └── llm-workflows/ # GitRepo sync to llm-workflows
|
||||
├── analytics/ # 📊 Spark, Flink, ClickHouse
|
||||
├── observability/ # 📈 Grafana, Alloy, OpenTelemetry
|
||||
└── security/ # 🔒 Vault, Authentik, Falco
|
||||
|
||||
talos/
|
||||
├── talconfig.yaml # Node definitions
|
||||
├── patches/ # GPU-specific patches
|
||||
│ ├── amd/amdgpu.yaml
|
||||
│ └── nvidia/nvidia-runtime.yaml
|
||||
```
|
||||
|
||||
### Workflows (`llm-workflows`)
|
||||
|
||||
```
|
||||
workflows/ # NATS handler deployments
|
||||
├── chat-handler.yaml
|
||||
├── voice-assistant.yaml
|
||||
└── pipeline-bridge.yaml
|
||||
|
||||
argo/ # Argo WorkflowTemplates
|
||||
├── document-ingestion.yaml
|
||||
├── batch-inference.yaml
|
||||
└── qlora-training.yaml
|
||||
|
||||
pipelines/ # Kubeflow Pipeline Python
|
||||
├── voice_pipeline.py
|
||||
└── document_ingestion_pipeline.py
|
||||
```
|
||||
|
||||
## 🔌 Service Endpoints (Internal)
|
||||
|
||||
```python
|
||||
# Copy-paste ready for Python code
|
||||
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
|
||||
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
|
||||
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
|
||||
TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
|
||||
EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
|
||||
RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
|
||||
MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
|
||||
MILVUS_PORT = 19530
|
||||
VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"
|
||||
```
|
||||
|
||||
## 📨 NATS Subject Patterns
|
||||
|
||||
```python
|
||||
# Chat
|
||||
f"ai.chat.user.{user_id}.message" # User sends message
|
||||
f"ai.chat.response.{request_id}" # Response back
|
||||
f"ai.chat.response.stream.{request_id}" # Streaming tokens
|
||||
|
||||
# Voice
|
||||
f"ai.voice.user.{user_id}.request" # Voice input
|
||||
f"ai.voice.response.{request_id}" # Voice output
|
||||
|
||||
# Pipelines
|
||||
"ai.pipeline.trigger" # Trigger any pipeline
|
||||
f"ai.pipeline.status.{request_id}" # Status updates
|
||||
```
|
||||
|
||||
## 🎮 GPU Allocation
|
||||
|
||||
| Node | GPU | Workload | Memory |
|
||||
|------|-----|----------|--------|
|
||||
| khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified |
|
||||
| elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM |
|
||||
| drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM |
|
||||
| danilo | Intel Arc | Reranker | 16GB shared |
|
||||
|
||||
## ⚡ Common Tasks
|
||||
|
||||
### Deploy a New AI Service
|
||||
|
||||
1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/`
|
||||
2. Add endpoint to `llm-workflows/config/ai-services-config.yaml`
|
||||
3. Push to main → Flux deploys automatically
|
||||
|
||||
### Add a New Workflow
|
||||
|
||||
1. Create handler in `llm-workflows/chat-handler/` or `llm-workflows/voice-assistant/`
|
||||
2. Add Kubernetes Deployment in `llm-workflows/workflows/`
|
||||
3. Push to main → Flux deploys automatically
|
||||
|
||||
### Create Architecture Decision
|
||||
|
||||
1. Copy `decisions/0000-template.md` to `decisions/NNNN-title.md`
|
||||
2. Fill in context, decision, consequences
|
||||
3. Submit PR
|
||||
|
||||
## ❌ Antipatterns to Avoid
|
||||
|
||||
1. **Don't hardcode secrets** - Use External Secrets Operator
|
||||
2. **Don't use `latest` tags** - Pin versions for reproducibility
|
||||
3. **Don't skip ADRs** - Document significant decisions
|
||||
4. **Don't bypass Flux** - All changes via Git, never `kubectl apply` directly
|
||||
|
||||
## 📚 Where to Learn More
|
||||
|
||||
- [ARCHITECTURE.md](ARCHITECTURE.md) - Full system design
|
||||
- [TECH-STACK.md](TECH-STACK.md) - All technologies used
|
||||
- [decisions/](decisions/) - Why we made certain choices
|
||||
- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities
|
||||
|
||||
## 🆘 Quick Debugging
|
||||
|
||||
```bash
|
||||
# Check Flux sync status
|
||||
flux get all -A
|
||||
|
||||
# View NATS JetStream streams
|
||||
kubectl exec -n ai-ml deploy/nats-box -- nats stream ls
|
||||
|
||||
# Check GPU allocation
|
||||
kubectl describe node khelben | grep -A10 "Allocated"
|
||||
|
||||
# View KServe inference services
|
||||
kubectl get inferenceservices -n ai-ml
|
||||
|
||||
# Tail AI service logs
|
||||
kubectl logs -n ai-ml -l app=chat-handler -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*This document is the canonical starting point for AI agents. When in doubt, check the ADRs.*
|
||||
Reference in New Issue
Block a user