feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
This commit is contained in:
191
AGENT-ONBOARDING.md
Normal file
191
AGENT-ONBOARDING.md
Normal file
@@ -0,0 +1,191 @@
|
|||||||
|
# 🤖 Agent Onboarding
|
||||||
|
|
||||||
|
> **This is the most important file for AI agents working on this codebase.**
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
You are working on a **homelab Kubernetes cluster** running:
|
||||||
|
- **Talos Linux v1.12.1** on bare-metal nodes
|
||||||
|
- **Kubernetes v1.35.0** with Flux CD GitOps
|
||||||
|
- **AI/ML platform** with KServe, Kubeflow, Milvus, NATS
|
||||||
|
- **Multi-GPU** (AMD ROCm, NVIDIA CUDA, Intel Arc)
|
||||||
|
|
||||||
|
## 🗺️ Repository Map
|
||||||
|
|
||||||
|
| Repo | What It Contains | When to Edit |
|
||||||
|
|------|------------------|--------------|
|
||||||
|
| `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes |
|
||||||
|
| `llm-workflows` | NATS handlers, Argo/KFP workflows | Workflow/handler changes |
|
||||||
|
| `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes |
|
||||||
|
| `homelab-design` (this) | Architecture docs, ADRs | Design decisions |
|
||||||
|
|
||||||
|
## 🏗️ System Architecture (30-Second Version)
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ USER INTERFACES │
|
||||||
|
│ Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI │
|
||||||
|
└───────────────────────────┬─────────────────────────────────────┘
|
||||||
|
│ WebSocket/HTTP
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ NATS MESSAGE BUS │
|
||||||
|
│ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │
|
||||||
|
│ Format: MessagePack (binary) │
|
||||||
|
└───────────────────────────┬─────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────────┼───────────────────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
||||||
|
│ Chat Handler │ │Voice Assistant│ │Pipeline Bridge│
|
||||||
|
│ (RAG+LLM) │ │ (STT→LLM→TTS) │ │ (KFP/Argo) │
|
||||||
|
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
|
||||||
|
│ │ │
|
||||||
|
└───────────────────┼───────────────────┘
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ AI SERVICES │
|
||||||
|
│ Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker │
|
||||||
|
│ STT │ TTS │ LLM │ RAG │ Embed │ Rank │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📁 Key File Locations
|
||||||
|
|
||||||
|
### Infrastructure (`homelab-k8s2`)
|
||||||
|
|
||||||
|
```
|
||||||
|
kubernetes/apps/
|
||||||
|
├── ai-ml/ # 🧠 AI/ML services
|
||||||
|
│ ├── kserve/ # InferenceServices
|
||||||
|
│ ├── kubeflow/ # Pipelines, Training Operator
|
||||||
|
│ ├── milvus/ # Vector database
|
||||||
|
│ ├── nats/ # Message bus
|
||||||
|
│ ├── vllm/ # LLM inference
|
||||||
|
│ └── llm-workflows/ # GitRepo sync to llm-workflows
|
||||||
|
├── analytics/ # 📊 Spark, Flink, ClickHouse
|
||||||
|
├── observability/ # 📈 Grafana, Alloy, OpenTelemetry
|
||||||
|
└── security/ # 🔒 Vault, Authentik, Falco
|
||||||
|
|
||||||
|
talos/
|
||||||
|
├── talconfig.yaml # Node definitions
|
||||||
|
├── patches/ # GPU-specific patches
|
||||||
|
│ ├── amd/amdgpu.yaml
|
||||||
|
│ └── nvidia/nvidia-runtime.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
### Workflows (`llm-workflows`)
|
||||||
|
|
||||||
|
```
|
||||||
|
workflows/ # NATS handler deployments
|
||||||
|
├── chat-handler.yaml
|
||||||
|
├── voice-assistant.yaml
|
||||||
|
└── pipeline-bridge.yaml
|
||||||
|
|
||||||
|
argo/ # Argo WorkflowTemplates
|
||||||
|
├── document-ingestion.yaml
|
||||||
|
├── batch-inference.yaml
|
||||||
|
└── qlora-training.yaml
|
||||||
|
|
||||||
|
pipelines/ # Kubeflow Pipeline Python
|
||||||
|
├── voice_pipeline.py
|
||||||
|
└── document_ingestion_pipeline.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🔌 Service Endpoints (Internal)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Copy-paste ready for Python code
|
||||||
|
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
|
||||||
|
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
|
||||||
|
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
|
||||||
|
TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
|
||||||
|
EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
|
||||||
|
RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
|
||||||
|
MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
|
||||||
|
MILVUS_PORT = 19530
|
||||||
|
VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📨 NATS Subject Patterns
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Chat
|
||||||
|
f"ai.chat.user.{user_id}.message" # User sends message
|
||||||
|
f"ai.chat.response.{request_id}" # Response back
|
||||||
|
f"ai.chat.response.stream.{request_id}" # Streaming tokens
|
||||||
|
|
||||||
|
# Voice
|
||||||
|
f"ai.voice.user.{user_id}.request" # Voice input
|
||||||
|
f"ai.voice.response.{request_id}" # Voice output
|
||||||
|
|
||||||
|
# Pipelines
|
||||||
|
"ai.pipeline.trigger" # Trigger any pipeline
|
||||||
|
f"ai.pipeline.status.{request_id}" # Status updates
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🎮 GPU Allocation
|
||||||
|
|
||||||
|
| Node | GPU | Workload | Memory |
|
||||||
|
|------|-----|----------|--------|
|
||||||
|
| khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified |
|
||||||
|
| elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM |
|
||||||
|
| drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM |
|
||||||
|
| danilo | Intel Arc | Reranker | 16GB shared |
|
||||||
|
|
||||||
|
## ⚡ Common Tasks
|
||||||
|
|
||||||
|
### Deploy a New AI Service
|
||||||
|
|
||||||
|
1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/`
|
||||||
|
2. Add endpoint to `llm-workflows/config/ai-services-config.yaml`
|
||||||
|
3. Push to main → Flux deploys automatically
|
||||||
|
|
||||||
|
### Add a New Workflow
|
||||||
|
|
||||||
|
1. Create handler in `llm-workflows/chat-handler/` or `llm-workflows/voice-assistant/`
|
||||||
|
2. Add Kubernetes Deployment in `llm-workflows/workflows/`
|
||||||
|
3. Push to main → Flux deploys automatically
|
||||||
|
|
||||||
|
### Create Architecture Decision
|
||||||
|
|
||||||
|
1. Copy `decisions/0000-template.md` to `decisions/NNNN-title.md`
|
||||||
|
2. Fill in context, decision, consequences
|
||||||
|
3. Submit PR
|
||||||
|
|
||||||
|
## ❌ Antipatterns to Avoid
|
||||||
|
|
||||||
|
1. **Don't hardcode secrets** - Use External Secrets Operator
|
||||||
|
2. **Don't use `latest` tags** - Pin versions for reproducibility
|
||||||
|
3. **Don't skip ADRs** - Document significant decisions
|
||||||
|
4. **Don't bypass Flux** - All changes via Git, never `kubectl apply` directly
|
||||||
|
|
||||||
|
## 📚 Where to Learn More
|
||||||
|
|
||||||
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - Full system design
|
||||||
|
- [TECH-STACK.md](TECH-STACK.md) - All technologies used
|
||||||
|
- [decisions/](decisions/) - Why we made certain choices
|
||||||
|
- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities
|
||||||
|
|
||||||
|
## 🆘 Quick Debugging
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check Flux sync status
|
||||||
|
flux get all -A
|
||||||
|
|
||||||
|
# View NATS JetStream streams
|
||||||
|
kubectl exec -n ai-ml deploy/nats-box -- nats stream ls
|
||||||
|
|
||||||
|
# Check GPU allocation
|
||||||
|
kubectl describe node khelben | grep -A10 "Allocated"
|
||||||
|
|
||||||
|
# View KServe inference services
|
||||||
|
kubectl get inferenceservices -n ai-ml
|
||||||
|
|
||||||
|
# Tail AI service logs
|
||||||
|
kubectl logs -n ai-ml -l app=chat-handler -f
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This document is the canonical starting point for AI agents. When in doubt, check the ADRs.*
|
||||||
287
ARCHITECTURE.md
Normal file
287
ARCHITECTURE.md
Normal file
@@ -0,0 +1,287 @@
|
|||||||
|
# 🏗️ System Architecture
|
||||||
|
|
||||||
|
> **Comprehensive technical overview of the DaviesTechLabs homelab infrastructure**
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The homelab is a production-grade Kubernetes cluster running on bare-metal hardware, designed for AI/ML workloads with multi-GPU support. It follows GitOps principles using Flux CD with SOPS-encrypted secrets.
|
||||||
|
|
||||||
|
## System Layers
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ USER LAYER │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
||||||
|
│ │ Companions WebApp│ │ Voice WebApp │ │ Kubeflow UI │ │
|
||||||
|
│ │ HTMX + Alpine │ │ Gradio UI │ │ Pipeline Mgmt │ │
|
||||||
|
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
|
||||||
|
│ │ WebSocket │ HTTP/WS │ HTTP │
|
||||||
|
└───────────┴─────────────────────┴─────────────────────┴─────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ INGRESS LAYER │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Cloudflared Tunnel ──► Envoy Gateway ──► HTTPRoute CRDs │
|
||||||
|
│ │
|
||||||
|
│ External: *.daviestechlabs.io Internal: *.lab.daviestechlabs.io │
|
||||||
|
│ • git.daviestechlabs.io • kubeflow.lab.daviestechlabs.io │
|
||||||
|
│ • auth.daviestechlabs.io • companions-chat.lab... │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ MESSAGE BUS LAYER │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ NATS + JetStream │
|
||||||
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Streams: │ │
|
||||||
|
│ │ • COMPANIONS_LOGINS (7d retention) - User analytics │ │
|
||||||
|
│ │ • COMPANIONS_CHAT (30d retention) - Chat history │ │
|
||||||
|
│ │ • AI_CHAT_STREAM (5min, memory) - Ephemeral streaming │ │
|
||||||
|
│ │ • AI_VOICE_STREAM (1h, file) - Voice processing │ │
|
||||||
|
│ │ • AI_PIPELINE (24h, file) - Workflow triggers │ │
|
||||||
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ Message Format: MessagePack (binary, not JSON) │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌─────────────────────────┼─────────────────────────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
|
||||||
|
│ Chat Handler │ │ Voice Assistant │ │ Pipeline Bridge │
|
||||||
|
├───────────────────┤ ├───────────────────┤ ├───────────────────┤
|
||||||
|
│ • RAG retrieval │ │ • STT (Whisper) │ │ • KFP triggers │
|
||||||
|
│ • LLM inference │ │ • RAG retrieval │ │ • Argo triggers │
|
||||||
|
│ • Streaming resp │ │ • LLM inference │ │ • Status updates │
|
||||||
|
│ • Session state │ │ • TTS (XTTS) │ │ • Error handling │
|
||||||
|
└───────────────────┘ └───────────────────┘ └───────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ AI SERVICES LAYER │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||||
|
│ │ Whisper │ │ XTTS │ │ vLLM │ │ Milvus │ │ BGE │ │Reranker │ │
|
||||||
|
│ │ (STT) │ │ (TTS) │ │ (LLM) │ │ (RAG) │ │(Embed) │ │ (BGE) │ │
|
||||||
|
│ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ │
|
||||||
|
│ │ KServe │ │ KServe │ │ vLLM │ │ Helm │ │ KServe │ │ KServe │ │
|
||||||
|
│ │ nvidia │ │ nvidia │ │ ROCm │ │ Minio │ │ rdna2 │ │ intel │ │
|
||||||
|
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ WORKFLOW ENGINE LAYER │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ ┌────────────────────────────┐ ┌────────────────────────────┐ │
|
||||||
|
│ │ Argo Workflows │◄──►│ Kubeflow Pipelines │ │
|
||||||
|
│ ├────────────────────────────┤ ├────────────────────────────┤ │
|
||||||
|
│ │ • Complex DAG orchestration│ │ • ML pipeline caching │ │
|
||||||
|
│ │ • Training workflows │ │ • Experiment tracking │ │
|
||||||
|
│ │ • Document ingestion │ │ • Model versioning │ │
|
||||||
|
│ │ • Batch inference │ │ • Artifact lineage │ │
|
||||||
|
│ └────────────────────────────┘ └────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ Trigger: Argo Events (EventSource → Sensor → Workflow/Pipeline) │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ INFRASTRUCTURE LAYER │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Storage: Compute: Security: │
|
||||||
|
│ ├─ Longhorn (block) ├─ Volcano Scheduler ├─ Vault (secrets) │
|
||||||
|
│ ├─ NFS CSI (shared) ├─ GPU Device Plugins ├─ Authentik (SSO) │
|
||||||
|
│ └─ MinIO (S3) │ ├─ AMD ROCm ├─ Falco (runtime) │
|
||||||
|
│ │ ├─ NVIDIA CUDA └─ SOPS (GitOps) │
|
||||||
|
│ Databases: │ └─ Intel i915/Arc │
|
||||||
|
│ ├─ CloudNative-PG └─ Node Feature Discovery │
|
||||||
|
│ ├─ Valkey (cache) │
|
||||||
|
│ └─ ClickHouse (analytics) │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ PLATFORM LAYER │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Talos Linux v1.12.1 │ Kubernetes v1.35.0 │ Cilium CNI │
|
||||||
|
│ │
|
||||||
|
│ Nodes: storm, bruenor, catti (control) │ elminster, khelben, drizzt, │
|
||||||
|
│ │ danilo (workers) │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Node Topology
|
||||||
|
|
||||||
|
### Control Plane (HA)
|
||||||
|
|
||||||
|
| Node | IP | CPU | Memory | Storage | Role |
|
||||||
|
|------|-------|-----|--------|---------|------|
|
||||||
|
| storm | 192.168.100.25 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
|
||||||
|
| bruenor | 192.168.100.26 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
|
||||||
|
| catti | 192.168.100.27 | Intel 13th Gen (4c) | 16GB | 500GB NVMe | etcd, API server |
|
||||||
|
|
||||||
|
**VIP**: 192.168.100.20 (shared across control plane)
|
||||||
|
|
||||||
|
### Worker Nodes
|
||||||
|
|
||||||
|
| Node | IP | CPU | GPU | GPU Memory | Workload |
|
||||||
|
|------|-------|-----|-----|------------|----------|
|
||||||
|
| elminster | 192.168.100.31 | Intel | NVIDIA RTX 2070 | 8GB VRAM | Whisper, XTTS |
|
||||||
|
| khelben | 192.168.100.32 | AMD Ryzen | AMD Strix Halo | 64GB Unified | vLLM (dedicated) |
|
||||||
|
| drizzt | 192.168.100.40 | AMD Ryzen 7 6800H | AMD Radeon 680M | 12GB VRAM | BGE Embeddings |
|
||||||
|
| danilo | 192.168.100.41 | Intel Core Ultra 9 | Intel Arc | 16GB Shared | Reranker |
|
||||||
|
|
||||||
|
## Networking
|
||||||
|
|
||||||
|
### External Access
|
||||||
|
|
||||||
|
```
|
||||||
|
Internet → Cloudflare → cloudflared tunnel → Envoy Gateway → Services
|
||||||
|
```
|
||||||
|
|
||||||
|
### DNS Zones
|
||||||
|
|
||||||
|
- **External**: `*.daviestechlabs.io` (Cloudflare DNS)
|
||||||
|
- **Internal**: `*.lab.daviestechlabs.io` (internal split-horizon)
|
||||||
|
|
||||||
|
### Network CIDRs
|
||||||
|
|
||||||
|
| Network | CIDR | Purpose |
|
||||||
|
|---------|------|---------|
|
||||||
|
| Node Network | 192.168.100.0/24 | Physical nodes |
|
||||||
|
| Pod Network | 10.42.0.0/16 | Kubernetes pods |
|
||||||
|
| Service Network | 10.43.0.0/16 | Kubernetes services |
|
||||||
|
|
||||||
|
## Data Flow: Chat Request
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant U as User
|
||||||
|
participant W as WebApp
|
||||||
|
participant N as NATS
|
||||||
|
participant C as Chat Handler
|
||||||
|
participant M as Milvus
|
||||||
|
participant L as vLLM
|
||||||
|
participant V as Valkey
|
||||||
|
|
||||||
|
U->>W: Send message
|
||||||
|
W->>N: Publish ai.chat.user.{id}.message
|
||||||
|
N->>C: Deliver to chat-handler
|
||||||
|
C->>V: Get session history
|
||||||
|
C->>M: RAG query (if enabled)
|
||||||
|
M-->>C: Relevant documents
|
||||||
|
C->>L: LLM inference (with context)
|
||||||
|
L-->>C: Streaming tokens
|
||||||
|
C->>N: Publish ai.chat.response.stream.{id}
|
||||||
|
N-->>W: Deliver streaming chunks
|
||||||
|
W-->>U: Display tokens
|
||||||
|
C->>V: Save to session
|
||||||
|
```
|
||||||
|
|
||||||
|
## GitOps Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Developer → Git Push → GitHub/Gitea
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ Flux CD │
|
||||||
|
│ (reconcile) │
|
||||||
|
└──────┬──────┘
|
||||||
|
│
|
||||||
|
┌──────────────┼──────────────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||||
|
│homelab- │ │ llm- │ │ helm │
|
||||||
|
│ k8s2 │ │workflows │ │ charts │
|
||||||
|
└──────────┘ └──────────┘ └──────────┘
|
||||||
|
│ │ │
|
||||||
|
└──────────────┴──────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ Kubernetes │
|
||||||
|
│ Cluster │
|
||||||
|
└─────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Architecture
|
||||||
|
|
||||||
|
### Secrets Management
|
||||||
|
|
||||||
|
```
|
||||||
|
External Secrets Operator ──► Vault / SOPS ──► Kubernetes Secrets
|
||||||
|
```
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
|
||||||
|
```
|
||||||
|
User ──► Cloudflare Access ──► Authentik ──► Application
|
||||||
|
│
|
||||||
|
└──► OIDC/SAML providers
|
||||||
|
```
|
||||||
|
|
||||||
|
### Network Security
|
||||||
|
|
||||||
|
- **Cilium**: Network policies, eBPF-based security
|
||||||
|
- **Falco**: Runtime security monitoring
|
||||||
|
- **RBAC**: Fine-grained Kubernetes permissions
|
||||||
|
|
||||||
|
## High Availability
|
||||||
|
|
||||||
|
### Control Plane
|
||||||
|
|
||||||
|
- 3-node etcd cluster with automatic leader election
|
||||||
|
- Virtual IP (192.168.100.20) for API server access
|
||||||
|
- Automatic failover via Talos
|
||||||
|
|
||||||
|
### Workloads
|
||||||
|
|
||||||
|
- Pod anti-affinity for critical services
|
||||||
|
- HPA for auto-scaling
|
||||||
|
- PodDisruptionBudgets for controlled updates
|
||||||
|
|
||||||
|
### Storage
|
||||||
|
|
||||||
|
- Longhorn 3-replica default
|
||||||
|
- MinIO erasure coding for S3
|
||||||
|
- Regular Velero backups
|
||||||
|
|
||||||
|
## Observability
|
||||||
|
|
||||||
|
### Metrics Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
Applications ──► OpenTelemetry Collector ──► Prometheus ──► Grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
### Logging Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
Applications ──► Grafana Alloy ──► Loki ──► Grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tracing Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Design Decisions
|
||||||
|
|
||||||
|
| Decision | Rationale | ADR |
|
||||||
|
|----------|-----------|-----|
|
||||||
|
| Talos Linux | Immutable, API-driven, secure | [ADR-0002](decisions/0002-use-talos-linux.md) |
|
||||||
|
| NATS over Kafka | Simpler ops, sufficient throughput | [ADR-0003](decisions/0003-use-nats-for-messaging.md) |
|
||||||
|
| MessagePack over JSON | Binary efficiency for audio | [ADR-0004](decisions/0004-use-messagepack-for-nats.md) |
|
||||||
|
| Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) |
|
||||||
|
| GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [TECH-STACK.md](TECH-STACK.md) - Complete technology inventory
|
||||||
|
- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities and relationships
|
||||||
|
- [decisions/](decisions/) - All architecture decisions
|
||||||
424
CODING-CONVENTIONS.md
Normal file
424
CODING-CONVENTIONS.md
Normal file
@@ -0,0 +1,424 @@
|
|||||||
|
# 📐 Coding Conventions
|
||||||
|
|
||||||
|
> **Patterns, practices, and folder structure conventions for DaviesTechLabs repositories**
|
||||||
|
|
||||||
|
## Repository Conventions
|
||||||
|
|
||||||
|
### homelab-k8s2 (Infrastructure)
|
||||||
|
|
||||||
|
```
|
||||||
|
kubernetes/
|
||||||
|
├── apps/ # Application deployments
|
||||||
|
│ └── {namespace}/ # One folder per namespace
|
||||||
|
│ └── {app}/ # One folder per application
|
||||||
|
│ ├── app/ # Kubernetes manifests
|
||||||
|
│ │ ├── kustomization.yaml
|
||||||
|
│ │ ├── helmrelease.yaml # OR individual manifests
|
||||||
|
│ │ └── ...
|
||||||
|
│ └── ks.yaml # Flux Kustomization
|
||||||
|
├── components/ # Reusable Kustomize components
|
||||||
|
└── flux/ # Flux system configuration
|
||||||
|
```
|
||||||
|
|
||||||
|
**Naming Conventions:**
|
||||||
|
- Namespaces: lowercase with hyphens (`ai-ml`, `cert-manager`)
|
||||||
|
- Apps: lowercase with hyphens (`chat-handler`, `voice-assistant`)
|
||||||
|
- Secrets: `{app}-{type}` (e.g., `milvus-credentials`)
|
||||||
|
|
||||||
|
### llm-workflows (Orchestration)
|
||||||
|
|
||||||
|
```
|
||||||
|
workflows/ # Kubernetes Deployments for NATS handlers
|
||||||
|
├── {handler}.yaml # One file per handler
|
||||||
|
|
||||||
|
argo/ # Argo WorkflowTemplates
|
||||||
|
├── {workflow-name}.yaml # One file per workflow
|
||||||
|
|
||||||
|
pipelines/ # Kubeflow Pipeline Python files
|
||||||
|
├── {pipeline}_pipeline.py # Pipeline definition
|
||||||
|
└── kfp-sync-job.yaml # Upload job
|
||||||
|
|
||||||
|
{handler}/ # Python source code
|
||||||
|
├── __init__.py
|
||||||
|
├── {handler}.py # Main entry point
|
||||||
|
├── requirements.txt
|
||||||
|
└── Dockerfile
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Python Conventions
|
||||||
|
|
||||||
|
### Project Structure
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Use async/await for I/O
|
||||||
|
async def handle_message(msg: Msg) -> None:
|
||||||
|
...
|
||||||
|
|
||||||
|
# Use dataclasses for structured data
|
||||||
|
@dataclass
|
||||||
|
class ChatRequest:
|
||||||
|
user_id: str
|
||||||
|
message: str
|
||||||
|
enable_rag: bool = True
|
||||||
|
|
||||||
|
# Use msgpack for NATS messages
|
||||||
|
import msgpack
|
||||||
|
data = msgpack.packb({"key": "value"})
|
||||||
|
```
|
||||||
|
|
||||||
|
### Naming
|
||||||
|
|
||||||
|
| Element | Convention | Example |
|
||||||
|
|---------|------------|---------|
|
||||||
|
| Files | snake_case | `chat_handler.py` |
|
||||||
|
| Classes | PascalCase | `ChatHandler` |
|
||||||
|
| Functions | snake_case | `process_message` |
|
||||||
|
| Constants | UPPER_SNAKE | `NATS_URL` |
|
||||||
|
| Private | Leading underscore | `_internal_method` |
|
||||||
|
|
||||||
|
### Type Hints
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Always use type hints
|
||||||
|
from typing import Optional, List, Dict, Any
|
||||||
|
|
||||||
|
async def query_rag(
|
||||||
|
query: str,
|
||||||
|
collection: str = "knowledge_base",
|
||||||
|
top_k: int = 5,
|
||||||
|
) -> List[Dict[str, Any]]:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Use specific exceptions
|
||||||
|
class RAGQueryError(Exception):
|
||||||
|
"""Raised when RAG query fails."""
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Log errors with context
|
||||||
|
import logging
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = await milvus.search(...)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"RAG query failed: {e}", extra={"query": query})
|
||||||
|
raise RAGQueryError(f"Failed to query collection {collection}") from e
|
||||||
|
```
|
||||||
|
|
||||||
|
### NATS Message Handling
|
||||||
|
|
||||||
|
```python
|
||||||
|
import nats
|
||||||
|
import msgpack
|
||||||
|
|
||||||
|
async def message_handler(msg: Msg) -> None:
|
||||||
|
try:
|
||||||
|
# Decode MessagePack
|
||||||
|
data = msgpack.unpackb(msg.data, raw=False)
|
||||||
|
|
||||||
|
# Process
|
||||||
|
result = await process(data)
|
||||||
|
|
||||||
|
# Reply if request-reply pattern
|
||||||
|
if msg.reply:
|
||||||
|
await msg.respond(msgpack.packb(result))
|
||||||
|
|
||||||
|
# Acknowledge for JetStream
|
||||||
|
await msg.ack()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Handler error: {e}")
|
||||||
|
# NAK for retry (JetStream)
|
||||||
|
await msg.nak()
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Kubernetes Manifest Conventions
|
||||||
|
|
||||||
|
### Labels
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
# Required
|
||||||
|
app.kubernetes.io/name: chat-handler
|
||||||
|
app.kubernetes.io/instance: chat-handler
|
||||||
|
app.kubernetes.io/component: handler
|
||||||
|
app.kubernetes.io/part-of: ai-platform
|
||||||
|
|
||||||
|
# Optional
|
||||||
|
app.kubernetes.io/version: "1.0.0"
|
||||||
|
app.kubernetes.io/managed-by: flux
|
||||||
|
```
|
||||||
|
|
||||||
|
### Annotations
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
metadata:
|
||||||
|
annotations:
|
||||||
|
# Reloader for config changes
|
||||||
|
reloader.stakater.com/auto: "true"
|
||||||
|
|
||||||
|
# Documentation
|
||||||
|
description: "Handles chat messages via NATS"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Resource Requests
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 256Mi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 512Mi
|
||||||
|
|
||||||
|
# GPU workloads
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
amd.com/gpu: 1 # AMD
|
||||||
|
nvidia.com/gpu: 1 # NVIDIA
|
||||||
|
```
|
||||||
|
|
||||||
|
### Health Checks
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health
|
||||||
|
port: 8080
|
||||||
|
initialDelaySeconds: 10
|
||||||
|
periodSeconds: 30
|
||||||
|
|
||||||
|
readinessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /ready
|
||||||
|
port: 8080
|
||||||
|
initialDelaySeconds: 5
|
||||||
|
periodSeconds: 10
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Flux/GitOps Conventions
|
||||||
|
|
||||||
|
### Kustomization Structure
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# ks.yaml - Flux Kustomization
|
||||||
|
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||||
|
kind: Kustomization
|
||||||
|
metadata:
|
||||||
|
name: &app chat-handler
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
targetNamespace: ai-ml
|
||||||
|
commonMetadata:
|
||||||
|
labels:
|
||||||
|
app.kubernetes.io/name: *app
|
||||||
|
path: ./kubernetes/apps/ai-ml/chat-handler/app
|
||||||
|
prune: true
|
||||||
|
sourceRef:
|
||||||
|
kind: GitRepository
|
||||||
|
name: flux-system
|
||||||
|
wait: true
|
||||||
|
interval: 30m
|
||||||
|
retryInterval: 1m
|
||||||
|
timeout: 5m
|
||||||
|
```
|
||||||
|
|
||||||
|
### HelmRelease Structure
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||||
|
kind: HelmRelease
|
||||||
|
metadata:
|
||||||
|
name: milvus
|
||||||
|
spec:
|
||||||
|
interval: 30m
|
||||||
|
chart:
|
||||||
|
spec:
|
||||||
|
chart: milvus
|
||||||
|
version: 4.x.x
|
||||||
|
sourceRef:
|
||||||
|
kind: HelmRepository
|
||||||
|
name: milvus
|
||||||
|
namespace: flux-system
|
||||||
|
values:
|
||||||
|
# Values here
|
||||||
|
```
|
||||||
|
|
||||||
|
### Secret References
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Never hardcode secrets
|
||||||
|
env:
|
||||||
|
- name: DATABASE_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: postgres-credentials
|
||||||
|
key: password
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## NATS Subject Conventions
|
||||||
|
|
||||||
|
### Hierarchy
|
||||||
|
|
||||||
|
```
|
||||||
|
ai.{domain}.{scope}.{action}
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
ai.chat.user.{userId}.message # User chat message
|
||||||
|
ai.chat.response.{requestId} # Chat response
|
||||||
|
ai.voice.user.{userId}.request # Voice request
|
||||||
|
ai.pipeline.trigger # Pipeline trigger
|
||||||
|
```
|
||||||
|
|
||||||
|
### Wildcards
|
||||||
|
|
||||||
|
```
|
||||||
|
ai.chat.> # All chat events
|
||||||
|
ai.chat.user.*.message # All user messages
|
||||||
|
ai.*.response.{id} # Any response type
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Git Conventions
|
||||||
|
|
||||||
|
### Commit Messages
|
||||||
|
|
||||||
|
```
|
||||||
|
type(scope): subject
|
||||||
|
|
||||||
|
body (optional)
|
||||||
|
|
||||||
|
footer (optional)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Types:**
|
||||||
|
- `feat`: New feature
|
||||||
|
- `fix`: Bug fix
|
||||||
|
- `docs`: Documentation
|
||||||
|
- `style`: Formatting
|
||||||
|
- `refactor`: Code restructuring
|
||||||
|
- `test`: Tests
|
||||||
|
- `chore`: Maintenance
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
```
|
||||||
|
feat(chat-handler): add streaming response support
|
||||||
|
fix(voice): handle empty audio gracefully
|
||||||
|
docs(adr): add decision for MessagePack format
|
||||||
|
```
|
||||||
|
|
||||||
|
### Branch Naming
|
||||||
|
|
||||||
|
```
|
||||||
|
feature/short-description
|
||||||
|
fix/issue-number-description
|
||||||
|
docs/what-changed
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration Conventions
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Use pydantic-settings or similar
|
||||||
|
from pydantic_settings import BaseSettings
|
||||||
|
|
||||||
|
class Settings(BaseSettings):
|
||||||
|
nats_url: str = "nats://localhost:4222"
|
||||||
|
vllm_url: str = "http://localhost:8000"
|
||||||
|
milvus_host: str = "localhost"
|
||||||
|
milvus_port: int = 19530
|
||||||
|
log_level: str = "INFO"
|
||||||
|
|
||||||
|
class Config:
|
||||||
|
env_prefix = "" # No prefix
|
||||||
|
```
|
||||||
|
|
||||||
|
### ConfigMaps
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: ai-services-config
|
||||||
|
data:
|
||||||
|
NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
|
||||||
|
VLLM_URL: "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
|
||||||
|
# ... other non-sensitive config
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation Conventions
|
||||||
|
|
||||||
|
### ADR Format
|
||||||
|
|
||||||
|
See [decisions/0000-template.md](decisions/0000-template.md)
|
||||||
|
|
||||||
|
### Code Comments
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Use docstrings for public functions
|
||||||
|
async def query_rag(query: str) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Query the RAG system for relevant documents.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: The search query string
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of document chunks with scores
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
RAGQueryError: If the query fails
|
||||||
|
"""
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### README Files
|
||||||
|
|
||||||
|
Each application should have a README with:
|
||||||
|
1. Purpose
|
||||||
|
2. Configuration
|
||||||
|
3. Deployment
|
||||||
|
4. Local development
|
||||||
|
5. API documentation (if applicable)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Anti-Patterns to Avoid
|
||||||
|
|
||||||
|
| Don't | Do Instead |
|
||||||
|
|-------|------------|
|
||||||
|
| `kubectl apply` directly | Commit to Git, let Flux deploy |
|
||||||
|
| Hardcode secrets | Use External Secrets Operator |
|
||||||
|
| Use `latest` image tags | Pin to specific versions |
|
||||||
|
| Skip health checks | Always define liveness/readiness |
|
||||||
|
| Ignore resource limits | Set appropriate requests/limits |
|
||||||
|
| Use JSON for NATS messages | Use MessagePack (binary) |
|
||||||
|
| Synchronous I/O in handlers | Use async/await |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [TECH-STACK.md](TECH-STACK.md) - Technologies used
|
||||||
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - System design
|
||||||
|
- [decisions/](decisions/) - Why we made certain choices
|
||||||
123
CONTAINER-DIAGRAM.mmd
Normal file
123
CONTAINER-DIAGRAM.mmd
Normal file
@@ -0,0 +1,123 @@
|
|||||||
|
%% C4 Container Diagram - Level 2
|
||||||
|
%% DaviesTechLabs Homelab AI/ML Platform
|
||||||
|
%%
|
||||||
|
%% To render: Use Mermaid Live Editor or VS Code Mermaid extension
|
||||||
|
|
||||||
|
graph TB
|
||||||
|
subgraph users["Users"]
|
||||||
|
user["👤 User"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph ingress["Ingress Layer"]
|
||||||
|
cloudflared["cloudflared<br/>(Tunnel)"]
|
||||||
|
envoy["Envoy Gateway<br/>(HTTPRoute)"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph frontends["Frontend Applications"]
|
||||||
|
companions["Companions WebApp<br/>[Go + HTMX]<br/>AI Chat Interface"]
|
||||||
|
voice["Voice WebApp<br/>[Gradio]<br/>Voice Assistant UI"]
|
||||||
|
kubeflow_ui["Kubeflow UI<br/>[React]<br/>Pipeline Management"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph messaging["Message Bus"]
|
||||||
|
nats["NATS<br/>[JetStream]<br/>Event Streaming"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph handlers["NATS Handlers"]
|
||||||
|
chat_handler["Chat Handler<br/>[Python]<br/>RAG + LLM Orchestration"]
|
||||||
|
voice_handler["Voice Assistant<br/>[Python]<br/>STT → LLM → TTS"]
|
||||||
|
pipeline_bridge["Pipeline Bridge<br/>[Python]<br/>Workflow Triggers"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph ai_services["AI Services (KServe)"]
|
||||||
|
whisper["Whisper<br/>[faster-whisper]<br/>Speech-to-Text"]
|
||||||
|
xtts["XTTS<br/>[Coqui]<br/>Text-to-Speech"]
|
||||||
|
vllm["vLLM<br/>[ROCm]<br/>LLM Inference"]
|
||||||
|
embeddings["BGE Embeddings<br/>[sentence-transformers]<br/>Vector Encoding"]
|
||||||
|
reranker["BGE Reranker<br/>[sentence-transformers]<br/>Document Ranking"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph storage["Data Stores"]
|
||||||
|
milvus["Milvus<br/>[Vector DB]<br/>RAG Storage"]
|
||||||
|
valkey["Valkey<br/>[Redis API]<br/>Session Cache"]
|
||||||
|
postgres["CloudNative-PG<br/>[PostgreSQL]<br/>Metadata"]
|
||||||
|
minio["MinIO<br/>[S3 API]<br/>Object Storage"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph workflows["Workflow Engines"]
|
||||||
|
argo["Argo Workflows<br/>[DAG Engine]<br/>Complex Pipelines"]
|
||||||
|
kfp["Kubeflow Pipelines<br/>[ML Platform]<br/>Training + Inference"]
|
||||||
|
argo_events["Argo Events<br/>[Event Source]<br/>NATS → Workflow"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph mlops["MLOps"]
|
||||||
|
mlflow["MLflow<br/>[Tracking Server]<br/>Experiment Tracking"]
|
||||||
|
volcano["Volcano<br/>[Scheduler]<br/>GPU Scheduling"]
|
||||||
|
end
|
||||||
|
|
||||||
|
%% User flow
|
||||||
|
user --> cloudflared
|
||||||
|
cloudflared --> envoy
|
||||||
|
envoy --> companions
|
||||||
|
envoy --> voice
|
||||||
|
envoy --> kubeflow_ui
|
||||||
|
|
||||||
|
%% Frontend to NATS
|
||||||
|
companions --> |WebSocket| nats
|
||||||
|
voice --> |HTTP/WS| nats
|
||||||
|
|
||||||
|
%% NATS to handlers
|
||||||
|
nats --> chat_handler
|
||||||
|
nats --> voice_handler
|
||||||
|
nats --> pipeline_bridge
|
||||||
|
|
||||||
|
%% Handlers to AI services
|
||||||
|
chat_handler --> embeddings
|
||||||
|
chat_handler --> reranker
|
||||||
|
chat_handler --> vllm
|
||||||
|
chat_handler --> milvus
|
||||||
|
chat_handler --> valkey
|
||||||
|
|
||||||
|
voice_handler --> whisper
|
||||||
|
voice_handler --> embeddings
|
||||||
|
voice_handler --> reranker
|
||||||
|
voice_handler --> vllm
|
||||||
|
voice_handler --> xtts
|
||||||
|
|
||||||
|
%% Pipeline flow
|
||||||
|
pipeline_bridge --> argo_events
|
||||||
|
argo_events --> argo
|
||||||
|
argo_events --> kfp
|
||||||
|
kubeflow_ui --> kfp
|
||||||
|
|
||||||
|
%% Workflow to AI
|
||||||
|
argo --> ai_services
|
||||||
|
kfp --> ai_services
|
||||||
|
kfp --> mlflow
|
||||||
|
|
||||||
|
%% Storage connections
|
||||||
|
ai_services --> minio
|
||||||
|
milvus --> minio
|
||||||
|
kfp --> postgres
|
||||||
|
mlflow --> postgres
|
||||||
|
mlflow --> minio
|
||||||
|
|
||||||
|
%% GPU scheduling
|
||||||
|
volcano -.-> vllm
|
||||||
|
volcano -.-> whisper
|
||||||
|
volcano -.-> xtts
|
||||||
|
|
||||||
|
%% Styling
|
||||||
|
classDef frontend fill:#90EE90,stroke:#333
|
||||||
|
classDef handler fill:#87CEEB,stroke:#333
|
||||||
|
classDef ai fill:#FFB6C1,stroke:#333
|
||||||
|
classDef storage fill:#DDA0DD,stroke:#333
|
||||||
|
classDef workflow fill:#F0E68C,stroke:#333
|
||||||
|
classDef messaging fill:#FFA500,stroke:#333
|
||||||
|
|
||||||
|
class companions,voice,kubeflow_ui frontend
|
||||||
|
class chat_handler,voice_handler,pipeline_bridge handler
|
||||||
|
class whisper,xtts,vllm,embeddings,reranker ai
|
||||||
|
class milvus,valkey,postgres,minio storage
|
||||||
|
class argo,kfp,argo_events,mlflow,volcano workflow
|
||||||
|
class nats messaging
|
||||||
69
CONTEXT-DIAGRAM.mmd
Normal file
69
CONTEXT-DIAGRAM.mmd
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
%% C4 Context Diagram - Level 1
|
||||||
|
%% DaviesTechLabs Homelab System Context
|
||||||
|
%%
|
||||||
|
%% To render: Use Mermaid Live Editor or VS Code Mermaid extension
|
||||||
|
|
||||||
|
graph TB
|
||||||
|
subgraph users["External Users"]
|
||||||
|
dev["👤 Developer<br/>(Billy)"]
|
||||||
|
family["👥 Family Members"]
|
||||||
|
agents["🤖 AI Agents"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph external["External Systems"]
|
||||||
|
cf["☁️ Cloudflare<br/>DNS + Tunnel"]
|
||||||
|
gh["🐙 GitHub<br/>Source Code"]
|
||||||
|
ghcr["📦 GHCR<br/>Container Registry"]
|
||||||
|
hf["🤗 Hugging Face<br/>Model Registry"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph homelab["🏠 DaviesTechLabs Homelab"]
|
||||||
|
direction TB
|
||||||
|
|
||||||
|
subgraph apps["Application Layer"]
|
||||||
|
companions["💬 Companions<br/>AI Chat"]
|
||||||
|
voice["🎤 Voice Assistant"]
|
||||||
|
media["🎬 Media Services<br/>(Jellyfin, *arr)"]
|
||||||
|
productivity["📝 Productivity<br/>(Nextcloud, Gitea)"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph platform["Platform Layer"]
|
||||||
|
k8s["☸️ Kubernetes Cluster<br/>Talos Linux"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph ai["AI/ML Layer"]
|
||||||
|
inference["🧠 Inference Services<br/>(vLLM, Whisper, XTTS)"]
|
||||||
|
workflows["⚙️ Workflow Engines<br/>(Kubeflow, Argo)"]
|
||||||
|
vectordb["📚 Vector Store<br/>(Milvus)"]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
%% User interactions
|
||||||
|
dev --> |manages| productivity
|
||||||
|
dev --> |develops| k8s
|
||||||
|
family --> |uses| media
|
||||||
|
family --> |chats| companions
|
||||||
|
agents --> |queries| inference
|
||||||
|
|
||||||
|
%% External integrations
|
||||||
|
cf --> |routes traffic| apps
|
||||||
|
gh --> |GitOps sync| k8s
|
||||||
|
ghcr --> |pulls images| k8s
|
||||||
|
hf --> |downloads models| inference
|
||||||
|
|
||||||
|
%% Internal relationships
|
||||||
|
apps --> platform
|
||||||
|
ai --> platform
|
||||||
|
companions --> inference
|
||||||
|
voice --> inference
|
||||||
|
workflows --> inference
|
||||||
|
inference --> vectordb
|
||||||
|
|
||||||
|
%% Styling
|
||||||
|
classDef external fill:#f9f,stroke:#333,stroke-width:2px
|
||||||
|
classDef homelab fill:#bbf,stroke:#333,stroke-width:2px
|
||||||
|
classDef user fill:#bfb,stroke:#333,stroke-width:2px
|
||||||
|
|
||||||
|
class cf,gh,ghcr,hf external
|
||||||
|
class companions,voice,media,productivity,k8s,inference,workflows,vectordb homelab
|
||||||
|
class dev,family,agents user
|
||||||
345
DOMAIN-MODEL.md
Normal file
345
DOMAIN-MODEL.md
Normal file
@@ -0,0 +1,345 @@
|
|||||||
|
# 📊 Domain Model
|
||||||
|
|
||||||
|
> **Core entities, bounded contexts, and relationships in the DaviesTechLabs homelab**
|
||||||
|
|
||||||
|
## Bounded Contexts
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ BOUNDED CONTEXTS │
|
||||||
|
├─────────────────────────────────────────────────────────────────────────────┤
|
||||||
|
│ │
|
||||||
|
│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │
|
||||||
|
│ │ CHAT CONTEXT │ │ VOICE CONTEXT │ │ WORKFLOW CONTEXT │ │
|
||||||
|
│ ├───────────────────┤ ├───────────────────┤ ├───────────────────┤ │
|
||||||
|
│ │ • ChatSession │ │ • VoiceSession │ │ • Pipeline │ │
|
||||||
|
│ │ • ChatMessage │ │ • AudioChunk │ │ • PipelineRun │ │
|
||||||
|
│ │ • Conversation │ │ • Transcription │ │ • Artifact │ │
|
||||||
|
│ │ • User │ │ • SynthesizedAudio│ │ • Experiment │ │
|
||||||
|
│ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │
|
||||||
|
│ │ │ │ │
|
||||||
|
│ └───────────────────────┼───────────────────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ ▼ │
|
||||||
|
│ ┌───────────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ INFERENCE CONTEXT │ │
|
||||||
|
│ ├───────────────────────────────────────────────────────────────────┤ │
|
||||||
|
│ │ • InferenceRequest • Model • Embedding • Document • Chunk │ │
|
||||||
|
│ └───────────────────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Entities
|
||||||
|
|
||||||
|
### User Context
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
User:
|
||||||
|
id: string (UUID)
|
||||||
|
username: string
|
||||||
|
premium: boolean
|
||||||
|
preferences:
|
||||||
|
voice_id: string
|
||||||
|
model_preference: string
|
||||||
|
enable_rag: boolean
|
||||||
|
created_at: timestamp
|
||||||
|
|
||||||
|
Session:
|
||||||
|
id: string (UUID)
|
||||||
|
user_id: string
|
||||||
|
type: "chat" | "voice"
|
||||||
|
started_at: timestamp
|
||||||
|
last_activity: timestamp
|
||||||
|
metadata: object
|
||||||
|
```
|
||||||
|
|
||||||
|
### Chat Context
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
ChatMessage:
|
||||||
|
id: string (UUID)
|
||||||
|
session_id: string
|
||||||
|
user_id: string
|
||||||
|
role: "user" | "assistant" | "system"
|
||||||
|
content: string
|
||||||
|
created_at: timestamp
|
||||||
|
metadata:
|
||||||
|
tokens_used: integer
|
||||||
|
latency_ms: float
|
||||||
|
rag_sources: string[]
|
||||||
|
model_used: string
|
||||||
|
|
||||||
|
Conversation:
|
||||||
|
id: string (UUID)
|
||||||
|
user_id: string
|
||||||
|
messages: ChatMessage[]
|
||||||
|
title: string (auto-generated)
|
||||||
|
created_at: timestamp
|
||||||
|
updated_at: timestamp
|
||||||
|
```
|
||||||
|
|
||||||
|
### Voice Context
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
VoiceRequest:
|
||||||
|
id: string (UUID)
|
||||||
|
user_id: string
|
||||||
|
audio_b64: string (base64)
|
||||||
|
format: "wav" | "webm" | "mp3"
|
||||||
|
language: string
|
||||||
|
premium: boolean
|
||||||
|
enable_rag: boolean
|
||||||
|
|
||||||
|
VoiceResponse:
|
||||||
|
id: string (UUID)
|
||||||
|
request_id: string
|
||||||
|
transcription: string
|
||||||
|
response_text: string
|
||||||
|
audio_b64: string (base64)
|
||||||
|
audio_format: string
|
||||||
|
latency_ms: float
|
||||||
|
rag_docs_used: integer
|
||||||
|
```
|
||||||
|
|
||||||
|
### Inference Context
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
InferenceRequest:
|
||||||
|
id: string (UUID)
|
||||||
|
service: "llm" | "stt" | "tts" | "embeddings" | "reranker"
|
||||||
|
input: string | bytes
|
||||||
|
parameters: object
|
||||||
|
priority: "standard" | "premium"
|
||||||
|
|
||||||
|
InferenceResponse:
|
||||||
|
id: string (UUID)
|
||||||
|
request_id: string
|
||||||
|
output: string | bytes | float[]
|
||||||
|
metadata:
|
||||||
|
model: string
|
||||||
|
latency_ms: float
|
||||||
|
tokens: integer (if applicable)
|
||||||
|
```
|
||||||
|
|
||||||
|
### RAG Context
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
Document:
|
||||||
|
id: string (UUID)
|
||||||
|
collection: string
|
||||||
|
title: string
|
||||||
|
content: string
|
||||||
|
source_url: string
|
||||||
|
ingested_at: timestamp
|
||||||
|
|
||||||
|
Chunk:
|
||||||
|
id: string (UUID)
|
||||||
|
document_id: string
|
||||||
|
content: string
|
||||||
|
embedding: float[1024] # BGE-large dimensions
|
||||||
|
metadata:
|
||||||
|
position: integer
|
||||||
|
page: integer
|
||||||
|
|
||||||
|
RAGQuery:
|
||||||
|
query: string
|
||||||
|
collection: string
|
||||||
|
top_k: integer (default: 5)
|
||||||
|
rerank: boolean (default: true)
|
||||||
|
rerank_top_k: integer (default: 3)
|
||||||
|
|
||||||
|
RAGResult:
|
||||||
|
chunks: Chunk[]
|
||||||
|
scores: float[]
|
||||||
|
reranked: boolean
|
||||||
|
```
|
||||||
|
|
||||||
|
### Workflow Context
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
Pipeline:
|
||||||
|
id: string
|
||||||
|
name: string
|
||||||
|
version: string
|
||||||
|
engine: "kubeflow" | "argo"
|
||||||
|
definition: object (YAML)
|
||||||
|
|
||||||
|
PipelineRun:
|
||||||
|
id: string (UUID)
|
||||||
|
pipeline_id: string
|
||||||
|
status: "pending" | "running" | "succeeded" | "failed"
|
||||||
|
started_at: timestamp
|
||||||
|
completed_at: timestamp
|
||||||
|
parameters: object
|
||||||
|
artifacts: Artifact[]
|
||||||
|
|
||||||
|
Artifact:
|
||||||
|
id: string (UUID)
|
||||||
|
run_id: string
|
||||||
|
name: string
|
||||||
|
type: "model" | "dataset" | "metrics" | "logs"
|
||||||
|
uri: string (s3://)
|
||||||
|
metadata: object
|
||||||
|
|
||||||
|
Experiment:
|
||||||
|
id: string (UUID)
|
||||||
|
name: string
|
||||||
|
runs: PipelineRun[]
|
||||||
|
metrics: object
|
||||||
|
created_at: timestamp
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Entity Relationships
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
erDiagram
|
||||||
|
USER ||--o{ SESSION : has
|
||||||
|
USER ||--o{ CONVERSATION : owns
|
||||||
|
SESSION ||--o{ CHAT_MESSAGE : contains
|
||||||
|
CONVERSATION ||--o{ CHAT_MESSAGE : contains
|
||||||
|
|
||||||
|
USER ||--o{ VOICE_REQUEST : makes
|
||||||
|
VOICE_REQUEST ||--|| VOICE_RESPONSE : produces
|
||||||
|
|
||||||
|
DOCUMENT ||--o{ CHUNK : contains
|
||||||
|
CHUNK }|--|| EMBEDDING : has
|
||||||
|
|
||||||
|
PIPELINE ||--o{ PIPELINE_RUN : executed_as
|
||||||
|
PIPELINE_RUN ||--o{ ARTIFACT : produces
|
||||||
|
EXPERIMENT ||--o{ PIPELINE_RUN : tracks
|
||||||
|
|
||||||
|
INFERENCE_REQUEST }|--|| INFERENCE_RESPONSE : produces
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Aggregate Roots
|
||||||
|
|
||||||
|
| Aggregate | Root Entity | Child Entities |
|
||||||
|
|-----------|-------------|----------------|
|
||||||
|
| Chat | Conversation | ChatMessage |
|
||||||
|
| Voice | VoiceRequest | VoiceResponse |
|
||||||
|
| RAG | Document | Chunk, Embedding |
|
||||||
|
| Workflow | PipelineRun | Artifact |
|
||||||
|
| User | User | Session, Preferences |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Event Flow
|
||||||
|
|
||||||
|
### Chat Event Stream
|
||||||
|
|
||||||
|
```
|
||||||
|
UserLogin
|
||||||
|
└─► SessionCreated
|
||||||
|
└─► MessageReceived
|
||||||
|
├─► RAGQueryExecuted (optional)
|
||||||
|
├─► InferenceRequested
|
||||||
|
└─► ResponseGenerated
|
||||||
|
└─► MessageStored
|
||||||
|
```
|
||||||
|
|
||||||
|
### Voice Event Stream
|
||||||
|
|
||||||
|
```
|
||||||
|
VoiceRequestReceived
|
||||||
|
└─► TranscriptionStarted
|
||||||
|
└─► TranscriptionCompleted
|
||||||
|
└─► RAGQueryExecuted (optional)
|
||||||
|
└─► LLMInferenceStarted
|
||||||
|
└─► LLMResponseGenerated
|
||||||
|
└─► TTSSynthesisStarted
|
||||||
|
└─► AudioResponseReady
|
||||||
|
```
|
||||||
|
|
||||||
|
### Workflow Event Stream
|
||||||
|
|
||||||
|
```
|
||||||
|
PipelineTriggerReceived
|
||||||
|
└─► PipelineRunCreated
|
||||||
|
└─► StepStarted (repeated)
|
||||||
|
└─► StepCompleted (repeated)
|
||||||
|
└─► ArtifactProduced (repeated)
|
||||||
|
└─► PipelineRunCompleted
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Retention
|
||||||
|
|
||||||
|
| Entity | Retention | Storage |
|
||||||
|
|--------|-----------|---------|
|
||||||
|
| ChatMessage | 30 days | JetStream → PostgreSQL |
|
||||||
|
| VoiceRequest/Response | 1 hour (audio), 30 days (text) | JetStream → PostgreSQL |
|
||||||
|
| Chunk/Embedding | Permanent | Milvus |
|
||||||
|
| PipelineRun | Permanent | PostgreSQL |
|
||||||
|
| Artifact | Permanent | MinIO |
|
||||||
|
| Session | 7 days | Valkey |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Invariants
|
||||||
|
|
||||||
|
### Chat Context
|
||||||
|
- A ChatMessage must belong to exactly one Conversation
|
||||||
|
- A Conversation must have at least one ChatMessage
|
||||||
|
- Messages are immutable once created
|
||||||
|
|
||||||
|
### Voice Context
|
||||||
|
- VoiceResponse must have corresponding VoiceRequest
|
||||||
|
- Audio format must be one of: wav, webm, mp3
|
||||||
|
- Transcription cannot be empty for valid audio
|
||||||
|
|
||||||
|
### RAG Context
|
||||||
|
- Chunk must belong to exactly one Document
|
||||||
|
- Embedding dimensions must match model (1024 for BGE-large)
|
||||||
|
- Document must have at least one Chunk
|
||||||
|
|
||||||
|
### Workflow Context
|
||||||
|
- PipelineRun must reference valid Pipeline
|
||||||
|
- Artifacts must have valid S3 URIs
|
||||||
|
- Run status transitions: pending → running → (succeeded|failed)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Value Objects
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Immutable value objects
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class MessageContent:
|
||||||
|
text: str
|
||||||
|
tokens: int
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class AudioData:
|
||||||
|
data: bytes
|
||||||
|
format: str
|
||||||
|
duration_ms: int
|
||||||
|
sample_rate: int
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class EmbeddingVector:
|
||||||
|
values: tuple[float, ...]
|
||||||
|
model: str
|
||||||
|
dimensions: int
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class RAGContext:
|
||||||
|
chunks: tuple[str, ...]
|
||||||
|
scores: tuple[float, ...]
|
||||||
|
query: str
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
|
||||||
|
- [GLOSSARY.md](GLOSSARY.md) - Term definitions
|
||||||
|
- [decisions/0004-use-messagepack-for-nats.md](decisions/0004-use-messagepack-for-nats.md) - Message format decision
|
||||||
242
GLOSSARY.md
Normal file
242
GLOSSARY.md
Normal file
@@ -0,0 +1,242 @@
|
|||||||
|
# 📖 Glossary
|
||||||
|
|
||||||
|
> **Terminology and abbreviations used in the DaviesTechLabs homelab**
|
||||||
|
|
||||||
|
## A
|
||||||
|
|
||||||
|
**ADR (Architecture Decision Record)**
|
||||||
|
: A document that captures an important architectural decision, including context, decision, and consequences.
|
||||||
|
|
||||||
|
**Argo Events**
|
||||||
|
: Event-driven automation for Kubernetes that triggers workflows based on events from various sources.
|
||||||
|
|
||||||
|
**Argo Workflows**
|
||||||
|
: A container-native workflow engine for orchestrating parallel jobs on Kubernetes.
|
||||||
|
|
||||||
|
**Authentik**
|
||||||
|
: Self-hosted identity provider supporting SAML, OIDC, and other protocols.
|
||||||
|
|
||||||
|
## B
|
||||||
|
|
||||||
|
**BGE (BAAI General Embedding)**
|
||||||
|
: A family of embedding models from BAAI used for semantic search and RAG.
|
||||||
|
|
||||||
|
**Bounded Context**
|
||||||
|
: A DDD concept defining a boundary within which a particular domain model applies.
|
||||||
|
|
||||||
|
## C
|
||||||
|
|
||||||
|
**C4 Model**
|
||||||
|
: A hierarchical approach to software architecture diagrams: Context, Container, Component, Code.
|
||||||
|
|
||||||
|
**Cilium**
|
||||||
|
: eBPF-based networking, security, and observability for Kubernetes.
|
||||||
|
|
||||||
|
**CloudNative-PG**
|
||||||
|
: Kubernetes operator for PostgreSQL databases.
|
||||||
|
|
||||||
|
**CNI (Container Network Interface)**
|
||||||
|
: Standard for configuring network interfaces in Linux containers.
|
||||||
|
|
||||||
|
## D
|
||||||
|
|
||||||
|
**DDD (Domain-Driven Design)**
|
||||||
|
: Software design approach focusing on the core domain and domain logic.
|
||||||
|
|
||||||
|
## E
|
||||||
|
|
||||||
|
**Embedding**
|
||||||
|
: A vector representation of text, used for semantic similarity and search.
|
||||||
|
|
||||||
|
**Envoy Gateway**
|
||||||
|
: Kubernetes Gateway API implementation using Envoy proxy.
|
||||||
|
|
||||||
|
**External Secrets Operator (ESO)**
|
||||||
|
: Kubernetes operator that syncs secrets from external stores (Vault, etc.).
|
||||||
|
|
||||||
|
## F
|
||||||
|
|
||||||
|
**Falco**
|
||||||
|
: Runtime security tool that detects anomalous activity in containers.
|
||||||
|
|
||||||
|
**Flux CD**
|
||||||
|
: GitOps toolkit for Kubernetes, continuously reconciling cluster state with Git.
|
||||||
|
|
||||||
|
## G
|
||||||
|
|
||||||
|
**GitOps**
|
||||||
|
: Operational practice using Git as the single source of truth for declarative infrastructure.
|
||||||
|
|
||||||
|
**GPU Device Plugin**
|
||||||
|
: Kubernetes plugin that exposes GPU resources to containers.
|
||||||
|
|
||||||
|
## H
|
||||||
|
|
||||||
|
**HelmRelease**
|
||||||
|
: Flux CRD for managing Helm chart releases declaratively.
|
||||||
|
|
||||||
|
**HTTPRoute**
|
||||||
|
: Kubernetes Gateway API resource for HTTP routing rules.
|
||||||
|
|
||||||
|
## I
|
||||||
|
|
||||||
|
**InferenceService**
|
||||||
|
: KServe CRD for deploying ML models with autoscaling and traffic management.
|
||||||
|
|
||||||
|
## J
|
||||||
|
|
||||||
|
**JetStream**
|
||||||
|
: NATS persistence layer providing streaming, key-value, and object stores.
|
||||||
|
|
||||||
|
## K
|
||||||
|
|
||||||
|
**KServe**
|
||||||
|
: Kubernetes-native platform for deploying and serving ML models.
|
||||||
|
|
||||||
|
**Kubeflow**
|
||||||
|
: ML toolkit for Kubernetes, including pipelines, training operators, and more.
|
||||||
|
|
||||||
|
**Kustomization**
|
||||||
|
: Flux CRD for applying Kustomize overlays from Git sources.
|
||||||
|
|
||||||
|
## L
|
||||||
|
|
||||||
|
**LLM (Large Language Model)**
|
||||||
|
: AI model trained on vast text data, capable of generating human-like text.
|
||||||
|
|
||||||
|
**Longhorn**
|
||||||
|
: Cloud-native distributed storage for Kubernetes.
|
||||||
|
|
||||||
|
## M
|
||||||
|
|
||||||
|
**MessagePack (msgpack)**
|
||||||
|
: Binary serialization format, more compact than JSON.
|
||||||
|
|
||||||
|
**Milvus**
|
||||||
|
: Open-source vector database for similarity search and AI applications.
|
||||||
|
|
||||||
|
**MLflow**
|
||||||
|
: Platform for managing the ML lifecycle: experiments, models, deployment.
|
||||||
|
|
||||||
|
**MinIO**
|
||||||
|
: S3-compatible object storage.
|
||||||
|
|
||||||
|
## N
|
||||||
|
|
||||||
|
**NATS**
|
||||||
|
: Cloud-native messaging system for microservices, IoT, and serverless.
|
||||||
|
|
||||||
|
**Node Feature Discovery (NFD)**
|
||||||
|
: Kubernetes add-on for detecting hardware features on nodes.
|
||||||
|
|
||||||
|
## P
|
||||||
|
|
||||||
|
**Pipeline**
|
||||||
|
: In ML context, a DAG of components that process data and train/serve models.
|
||||||
|
|
||||||
|
**Premium User**
|
||||||
|
: User tier with enhanced features (more RAG docs, priority routing).
|
||||||
|
|
||||||
|
## R
|
||||||
|
|
||||||
|
**RAG (Retrieval-Augmented Generation)**
|
||||||
|
: AI technique combining document retrieval with LLM generation for grounded responses.
|
||||||
|
|
||||||
|
**Reranker**
|
||||||
|
: Model that rescores retrieved documents based on relevance to a query.
|
||||||
|
|
||||||
|
**ROCm**
|
||||||
|
: AMD's open-source GPU computing platform (alternative to CUDA).
|
||||||
|
|
||||||
|
## S
|
||||||
|
|
||||||
|
**Schematic**
|
||||||
|
: Talos Linux concept for defining system extensions and configurations.
|
||||||
|
|
||||||
|
**SOPS (Secrets OPerationS)**
|
||||||
|
: Tool for encrypting secrets in Git repositories.
|
||||||
|
|
||||||
|
**STT (Speech-to-Text)**
|
||||||
|
: Converting spoken audio to text (e.g., Whisper).
|
||||||
|
|
||||||
|
**Strix Halo**
|
||||||
|
: AMD's unified memory architecture for APUs with large GPU memory.
|
||||||
|
|
||||||
|
## T
|
||||||
|
|
||||||
|
**Talos Linux**
|
||||||
|
: Minimal, immutable Linux distribution designed specifically for Kubernetes.
|
||||||
|
|
||||||
|
**TTS (Text-to-Speech)**
|
||||||
|
: Converting text to spoken audio (e.g., XTTS/Coqui).
|
||||||
|
|
||||||
|
## V
|
||||||
|
|
||||||
|
**Valkey**
|
||||||
|
: Redis-compatible in-memory data store (Redis fork).
|
||||||
|
|
||||||
|
**vLLM**
|
||||||
|
: High-throughput LLM serving engine with PagedAttention.
|
||||||
|
|
||||||
|
**VIP (Virtual IP)**
|
||||||
|
: IP address shared among multiple hosts for high availability.
|
||||||
|
|
||||||
|
**Volcano**
|
||||||
|
: Kubernetes batch scheduler for high-performance workloads (ML, HPC).
|
||||||
|
|
||||||
|
**VRM**
|
||||||
|
: File format for 3D humanoid avatars.
|
||||||
|
|
||||||
|
## W
|
||||||
|
|
||||||
|
**Whisper**
|
||||||
|
: OpenAI's speech recognition model.
|
||||||
|
|
||||||
|
## X
|
||||||
|
|
||||||
|
**XTTS**
|
||||||
|
: Coqui's multi-language text-to-speech model with voice cloning.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acronyms Quick Reference
|
||||||
|
|
||||||
|
| Acronym | Full Form |
|
||||||
|
|---------|-----------|
|
||||||
|
| ADR | Architecture Decision Record |
|
||||||
|
| API | Application Programming Interface |
|
||||||
|
| BGE | BAAI General Embedding |
|
||||||
|
| CI/CD | Continuous Integration/Continuous Deployment |
|
||||||
|
| CRD | Custom Resource Definition |
|
||||||
|
| DAG | Directed Acyclic Graph |
|
||||||
|
| DDD | Domain-Driven Design |
|
||||||
|
| ESO | External Secrets Operator |
|
||||||
|
| GPU | Graphics Processing Unit |
|
||||||
|
| HA | High Availability |
|
||||||
|
| HPA | Horizontal Pod Autoscaler |
|
||||||
|
| LLM | Large Language Model |
|
||||||
|
| ML | Machine Learning |
|
||||||
|
| NATS | (not an acronym, named after message passing in Erlang) |
|
||||||
|
| NFD | Node Feature Discovery |
|
||||||
|
| OIDC | OpenID Connect |
|
||||||
|
| RAG | Retrieval-Augmented Generation |
|
||||||
|
| RBAC | Role-Based Access Control |
|
||||||
|
| ROCm | Radeon Open Compute |
|
||||||
|
| S3 | Simple Storage Service |
|
||||||
|
| SAML | Security Assertion Markup Language |
|
||||||
|
| SOPS | Secrets OPerationS |
|
||||||
|
| SSO | Single Sign-On |
|
||||||
|
| STT | Speech-to-Text |
|
||||||
|
| TLS | Transport Layer Security |
|
||||||
|
| TTS | Text-to-Speech |
|
||||||
|
| UUID | Universally Unique Identifier |
|
||||||
|
| VIP | Virtual IP |
|
||||||
|
| VRAM | Video Random Access Memory |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - System overview
|
||||||
|
- [TECH-STACK.md](TECH-STACK.md) - Technology details
|
||||||
|
- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Entity definitions
|
||||||
106
README.md
106
README.md
@@ -1,3 +1,105 @@
|
|||||||
# homelab-design
|
# 🏠 DaviesTechLabs Homelab Architecture
|
||||||
|
|
||||||
homelab design process goes here.
|
> **Production-grade AI/ML platform running on bare-metal Kubernetes**
|
||||||
|
|
||||||
|
[](https://talos.dev)
|
||||||
|
[](https://kubernetes.io)
|
||||||
|
[](https://fluxcd.io)
|
||||||
|
[](LICENSE)
|
||||||
|
|
||||||
|
## 📖 Quick Navigation
|
||||||
|
|
||||||
|
| Document | Purpose |
|
||||||
|
|----------|---------|
|
||||||
|
| [AGENT-ONBOARDING.md](AGENT-ONBOARDING.md) | **Start here if you're an AI agent** |
|
||||||
|
| [ARCHITECTURE.md](ARCHITECTURE.md) | High-level system overview |
|
||||||
|
| [TECH-STACK.md](TECH-STACK.md) | Complete technology stack |
|
||||||
|
| [DOMAIN-MODEL.md](DOMAIN-MODEL.md) | Core entities and bounded contexts |
|
||||||
|
| [GLOSSARY.md](GLOSSARY.md) | Terminology reference |
|
||||||
|
| [decisions/](decisions/) | Architecture Decision Records (ADRs) |
|
||||||
|
|
||||||
|
## 🎯 What This Is
|
||||||
|
|
||||||
|
A comprehensive architecture documentation repository for the DaviesTechLabs homelab Kubernetes cluster, featuring:
|
||||||
|
|
||||||
|
- **AI/ML Platform**: KServe inference services, RAG pipelines, voice assistants
|
||||||
|
- **Multi-GPU Support**: AMD ROCm (RDNA3/Strix Halo), NVIDIA CUDA, Intel Arc
|
||||||
|
- **GitOps**: Flux CD with SOPS encryption
|
||||||
|
- **Event-Driven**: NATS JetStream for real-time messaging
|
||||||
|
- **ML Workflows**: Kubeflow Pipelines + Argo Workflows
|
||||||
|
|
||||||
|
## 🖥️ Cluster Overview
|
||||||
|
|
||||||
|
| Node | Role | Hardware | GPU |
|
||||||
|
|------|------|----------|-----|
|
||||||
|
| storm | Control Plane | Intel 13th Gen | Integrated |
|
||||||
|
| bruenor | Control Plane | Intel 13th Gen | Integrated |
|
||||||
|
| catti | Control Plane | Intel 13th Gen | Integrated |
|
||||||
|
| elminster | Worker | NVIDIA RTX 2070 | 8GB CUDA |
|
||||||
|
| khelben | Worker (vLLM) | AMD Strix Halo | 64GB Unified |
|
||||||
|
| drizzt | Worker | AMD Radeon 680M | 12GB RDNA2 |
|
||||||
|
| danilo | Worker | Intel Core Ultra 9 | Intel Arc |
|
||||||
|
|
||||||
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
### View Current Cluster State
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get node status
|
||||||
|
kubectl get nodes -o wide
|
||||||
|
|
||||||
|
# View AI/ML workloads
|
||||||
|
kubectl get pods -n ai-ml
|
||||||
|
|
||||||
|
# Check KServe inference services
|
||||||
|
kubectl get inferenceservices -n ai-ml
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Endpoints
|
||||||
|
|
||||||
|
| Service | URL | Purpose |
|
||||||
|
|---------|-----|---------|
|
||||||
|
| Kubeflow | `kubeflow.lab.daviestechlabs.io` | ML Pipeline UI |
|
||||||
|
| Companions | `companions-chat.lab.daviestechlabs.io` | AI Chat Interface |
|
||||||
|
| Voice | `voice.lab.daviestechlabs.io` | Voice Assistant |
|
||||||
|
| Gitea | `git.daviestechlabs.io` | Self-hosted Git |
|
||||||
|
|
||||||
|
## 📂 Repository Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
homelab-design/
|
||||||
|
├── README.md # This file
|
||||||
|
├── AGENT-ONBOARDING.md # AI agent quick-start
|
||||||
|
├── ARCHITECTURE.md # High-level system overview
|
||||||
|
├── CONTEXT-DIAGRAM.mmd # C4 Level 1 (Mermaid)
|
||||||
|
├── CONTAINER-DIAGRAM.mmd # C4 Level 2
|
||||||
|
├── TECH-STACK.md # Complete tech stack
|
||||||
|
├── DOMAIN-MODEL.md # Core entities
|
||||||
|
├── CODING-CONVENTIONS.md # Patterns & practices
|
||||||
|
├── GLOSSARY.md # Terminology
|
||||||
|
├── decisions/ # ADRs
|
||||||
|
│ ├── 0000-template.md
|
||||||
|
│ ├── 0001-record-architecture-decisions.md
|
||||||
|
│ ├── 0002-use-talos-linux.md
|
||||||
|
│ └── ...
|
||||||
|
├── specs/ # Feature specifications
|
||||||
|
└── diagrams/ # Additional diagrams
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🔗 Related Repositories
|
||||||
|
|
||||||
|
| Repository | Purpose |
|
||||||
|
|------------|---------|
|
||||||
|
| [homelab-k8s2](https://github.com/Billy-Davies-2/homelab-k8s2) | Kubernetes manifests, Flux GitOps |
|
||||||
|
| [llm-workflows](https://github.com/Billy-Davies-2/llm-workflows) | NATS handlers, Argo/KFP workflows |
|
||||||
|
| [companions-frontend](https://github.com/Billy-Davies-2/companions-frontend) | Go web server, HTMX frontend |
|
||||||
|
|
||||||
|
## 📝 Contributing
|
||||||
|
|
||||||
|
1. For architecture changes, create an ADR in `decisions/`
|
||||||
|
2. Update relevant documentation
|
||||||
|
3. Submit a PR with context
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Last updated: 2026-02-01*
|
||||||
|
|||||||
271
TECH-STACK.md
Normal file
271
TECH-STACK.md
Normal file
@@ -0,0 +1,271 @@
|
|||||||
|
# 🛠️ Technology Stack
|
||||||
|
|
||||||
|
> **Complete inventory of technologies used in the DaviesTechLabs homelab**
|
||||||
|
|
||||||
|
## Platform Layer
|
||||||
|
|
||||||
|
### Operating System
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Talos Linux](https://talos.dev) | v1.12.1 | Immutable, API-driven Kubernetes OS |
|
||||||
|
| Kernel | 6.18.2-talos | Linux kernel with GPU drivers |
|
||||||
|
|
||||||
|
### Container Orchestration
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Kubernetes](https://kubernetes.io) | v1.35.0 | Container orchestration |
|
||||||
|
| [containerd](https://containerd.io) | 2.1.6 | Container runtime |
|
||||||
|
| [Cilium](https://cilium.io) | Latest | CNI, network policies, eBPF |
|
||||||
|
|
||||||
|
### GitOps
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Flux CD](https://fluxcd.io) | v2 | GitOps continuous delivery |
|
||||||
|
| [SOPS](https://github.com/getsops/sops) | Latest | Secret encryption |
|
||||||
|
| [Age](https://github.com/FiloSottile/age) | Latest | Encryption key management |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## AI/ML Layer
|
||||||
|
|
||||||
|
### Inference Engines
|
||||||
|
|
||||||
|
| Service | Framework | GPU | Model Type |
|
||||||
|
|---------|-----------|-----|------------|
|
||||||
|
| [vLLM](https://vllm.ai) | ROCm | AMD Strix Halo | Large Language Models |
|
||||||
|
| [faster-whisper](https://github.com/guillaumekln/faster-whisper) | CUDA | NVIDIA RTX 2070 | Speech-to-Text |
|
||||||
|
| [XTTS](https://github.com/coqui-ai/TTS) | CUDA | NVIDIA RTX 2070 | Text-to-Speech |
|
||||||
|
| [BGE Embeddings](https://huggingface.co/BAAI/bge-large-en-v1.5) | ROCm | AMD Radeon 680M | Text Embeddings |
|
||||||
|
| [BGE Reranker](https://huggingface.co/BAAI/bge-reranker-large) | Intel | Intel Arc | Document Reranking |
|
||||||
|
|
||||||
|
### ML Serving
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [KServe](https://kserve.github.io) | v0.12+ | Model serving framework |
|
||||||
|
| [Ray Serve](https://ray.io/serve) | 2.53.0 | Unified inference endpoints |
|
||||||
|
|
||||||
|
### ML Workflows
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Kubeflow Pipelines](https://kubeflow.org) | 2.15.0 | ML pipeline orchestration |
|
||||||
|
| [Argo Workflows](https://argoproj.github.io/workflows) | v3.7.8 | DAG-based workflows |
|
||||||
|
| [Argo Events](https://argoproj.github.io/events) | Latest | Event-driven triggers |
|
||||||
|
| [MLflow](https://mlflow.org) | 3.7.0 | Experiment tracking, model registry |
|
||||||
|
|
||||||
|
### GPU Scheduling
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Volcano](https://volcano.sh) | Latest | GPU-aware scheduling |
|
||||||
|
| AMD GPU Device Plugin | v1.4.1 | ROCm GPU allocation |
|
||||||
|
| NVIDIA Device Plugin | Latest | CUDA GPU allocation |
|
||||||
|
| [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) | v0.18.2 | Hardware detection |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Layer
|
||||||
|
|
||||||
|
### Databases
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [CloudNative-PG](https://cloudnative-pg.io) | 16.11 | PostgreSQL for metadata |
|
||||||
|
| [Milvus](https://milvus.io) | Latest | Vector database for RAG |
|
||||||
|
| [ClickHouse](https://clickhouse.com) | Latest | Analytics, access logs |
|
||||||
|
| [Valkey](https://valkey.io) | Latest | Redis-compatible cache |
|
||||||
|
|
||||||
|
### Object Storage
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [MinIO](https://min.io) | Latest | S3-compatible storage |
|
||||||
|
| [Longhorn](https://longhorn.io) | v1.10.1 | Distributed block storage |
|
||||||
|
| NFS CSI Driver | Latest | Shared filesystem |
|
||||||
|
|
||||||
|
### Messaging
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [NATS](https://nats.io) | Latest | Message bus |
|
||||||
|
| NATS JetStream | Built-in | Persistent streaming |
|
||||||
|
|
||||||
|
### Data Processing
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Apache Spark](https://spark.apache.org) | Latest | Batch analytics |
|
||||||
|
| [Apache Flink](https://flink.apache.org) | Latest | Stream processing |
|
||||||
|
| [Apache Iceberg](https://iceberg.apache.org) | Latest | Table format |
|
||||||
|
| [Nessie](https://projectnessie.org) | Latest | Data catalog |
|
||||||
|
| [Trino](https://trino.io) | 479 | SQL query engine |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Application Layer
|
||||||
|
|
||||||
|
### Web Frameworks
|
||||||
|
|
||||||
|
| Application | Language | Framework | Purpose |
|
||||||
|
|-------------|----------|-----------|---------|
|
||||||
|
| Companions | Go | net/http + HTMX | AI chat interface |
|
||||||
|
| Voice WebApp | Python | Gradio | Voice assistant UI |
|
||||||
|
| Various handlers | Python | asyncio + nats.py | NATS event handlers |
|
||||||
|
|
||||||
|
### Frontend
|
||||||
|
|
||||||
|
| Technology | Purpose |
|
||||||
|
|------------|---------|
|
||||||
|
| [HTMX](https://htmx.org) | Dynamic HTML updates |
|
||||||
|
| [Alpine.js](https://alpinejs.dev) | Lightweight reactivity |
|
||||||
|
| [VRM](https://vrm.dev) | 3D avatar rendering |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Networking Layer
|
||||||
|
|
||||||
|
### Ingress
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Envoy Gateway](https://gateway.envoyproxy.io) | v1.6.3 | Gateway API implementation |
|
||||||
|
| [cloudflared](https://developers.cloudflare.com/cloudflare-one/connections/connect-apps) | Latest | Cloudflare tunnel |
|
||||||
|
|
||||||
|
### DNS & Certificates
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [external-dns](https://github.com/kubernetes-sigs/external-dns) | Latest | Automatic DNS management |
|
||||||
|
| [cert-manager](https://cert-manager.io) | Latest | TLS certificate automation |
|
||||||
|
|
||||||
|
### Service Mesh
|
||||||
|
|
||||||
|
| Component | Purpose |
|
||||||
|
|-----------|---------|
|
||||||
|
| [Spegel](https://github.com/spegel-org/spegel) | P2P container image distribution |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Security Layer
|
||||||
|
|
||||||
|
### Identity & Access
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Authentik](https://goauthentik.io) | 2025.12.1 | Identity provider, SSO |
|
||||||
|
| [Vault](https://vaultproject.io) | 1.21.2 | Secret management |
|
||||||
|
| [External Secrets Operator](https://external-secrets.io) | v1.3.1 | Kubernetes secrets sync |
|
||||||
|
|
||||||
|
### Runtime Security
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Falco](https://falco.org) | 0.42.1 | Runtime threat detection |
|
||||||
|
| Cilium Network Policies | Built-in | Network segmentation |
|
||||||
|
|
||||||
|
### Backup
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Velero](https://velero.io) | v1.17.1 | Cluster backup/restore |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Observability Layer
|
||||||
|
|
||||||
|
### Metrics
|
||||||
|
|
||||||
|
| Component | Purpose |
|
||||||
|
|-----------|---------|
|
||||||
|
| [Prometheus](https://prometheus.io) | Metrics collection |
|
||||||
|
| [Grafana](https://grafana.com) | Dashboards & visualization |
|
||||||
|
|
||||||
|
### Logging
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Grafana Alloy](https://grafana.com/oss/alloy) | v1.12.0 | Log collection |
|
||||||
|
| [Loki](https://grafana.com/oss/loki) | Latest | Log aggregation |
|
||||||
|
|
||||||
|
### Tracing
|
||||||
|
|
||||||
|
| Component | Purpose |
|
||||||
|
|-----------|---------|
|
||||||
|
| [OpenTelemetry Collector](https://opentelemetry.io) | Trace collection |
|
||||||
|
| Tempo/Jaeger | Trace storage & query |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Development Tools
|
||||||
|
|
||||||
|
### Local Development
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| [mise](https://mise.jdx.dev) | Tool version management |
|
||||||
|
| [Task](https://taskfile.dev) | Task runner (Taskfile.yaml) |
|
||||||
|
| [flux-local](https://github.com/allenporter/flux-local) | Local Flux testing |
|
||||||
|
|
||||||
|
### CI/CD
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| GitHub Actions | CI/CD pipelines |
|
||||||
|
| [Renovate](https://renovatebot.com) | Dependency updates |
|
||||||
|
|
||||||
|
### Image Building
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| Docker | Container builds |
|
||||||
|
| GHCR | Container registry |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Media & Entertainment
|
||||||
|
|
||||||
|
| Component | Version | Purpose |
|
||||||
|
|-----------|---------|---------|
|
||||||
|
| [Jellyfin](https://jellyfin.org) | 10.11.5 | Media server |
|
||||||
|
| [Nextcloud](https://nextcloud.com) | 32.0.5 | File sync & share |
|
||||||
|
| Prowlarr, Bazarr, etc. | Various | *arr stack |
|
||||||
|
| [Kasm](https://kasmweb.com) | 1.18.1 | Browser isolation |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Python Dependencies (llm-workflows)
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# Core
|
||||||
|
nats-py>=2.7.0 # NATS client
|
||||||
|
msgpack>=1.0.0 # Binary serialization
|
||||||
|
aiohttp>=3.9.0 # HTTP client
|
||||||
|
|
||||||
|
# ML/AI
|
||||||
|
pymilvus>=2.4.0 # Milvus client
|
||||||
|
sentence-transformers # Embeddings
|
||||||
|
openai>=1.0.0 # vLLM OpenAI API
|
||||||
|
|
||||||
|
# Kubeflow
|
||||||
|
kfp>=2.12.1 # Pipeline SDK
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Version Pinning Strategy
|
||||||
|
|
||||||
|
| Component Type | Strategy |
|
||||||
|
|----------------|----------|
|
||||||
|
| Base images | Pin major.minor |
|
||||||
|
| Helm charts | Pin exact version |
|
||||||
|
| Python packages | Pin minimum version |
|
||||||
|
| System extensions | Pin via Talos schematic |
|
||||||
|
|
||||||
|
## Related Documents
|
||||||
|
|
||||||
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - How components connect
|
||||||
|
- [decisions/](decisions/) - Why we chose specific technologies
|
||||||
71
decisions/0000-template.md
Normal file
71
decisions/0000-template.md
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
# [short title of solved problem and solution]
|
||||||
|
|
||||||
|
* Status: [proposed | rejected | accepted | deprecated | superseded by [ADR-NNNN](NNNN-example.md)]
|
||||||
|
* Date: YYYY-MM-DD
|
||||||
|
* Deciders: [list of people involved in decision]
|
||||||
|
* Technical Story: [description | ticket/issue URL]
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
[Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to articulate the problem in form of a question.]
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* [driver 1, e.g., a force, facing concern, …]
|
||||||
|
* [driver 2, e.g., a force, facing concern, …]
|
||||||
|
* … <!-- numbers of drivers can vary -->
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* [option 1]
|
||||||
|
* [option 2]
|
||||||
|
* [option 3]
|
||||||
|
* … <!-- numbers of options can vary -->
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "[option N]", because [justification. e.g., only option which meets k.o. criterion decision driver | which resolves force | … | comes out best (see below)].
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* [e.g., improvement of quality attribute satisfaction, follow-up decisions required, …]
|
||||||
|
* …
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* [e.g., compromising quality attribute, follow-up decisions required, …]
|
||||||
|
* …
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### [option 1]
|
||||||
|
|
||||||
|
[example | description | pointer to more information | …]
|
||||||
|
|
||||||
|
* Good, because [argument a]
|
||||||
|
* Good, because [argument b]
|
||||||
|
* Bad, because [argument c]
|
||||||
|
* … <!-- numbers of pros and cons can vary -->
|
||||||
|
|
||||||
|
### [option 2]
|
||||||
|
|
||||||
|
[example | description | pointer to more information | …]
|
||||||
|
|
||||||
|
* Good, because [argument a]
|
||||||
|
* Good, because [argument b]
|
||||||
|
* Bad, because [argument c]
|
||||||
|
* … <!-- numbers of pros and cons can vary -->
|
||||||
|
|
||||||
|
### [option 3]
|
||||||
|
|
||||||
|
[example | description | pointer to more information | …]
|
||||||
|
|
||||||
|
* Good, because [argument a]
|
||||||
|
* Good, because [argument b]
|
||||||
|
* Bad, because [argument c]
|
||||||
|
* … <!-- numbers of pros and cons can vary -->
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [Link type] [Link to ADR] <!-- example: Refined by [ADR-0005](0005-example.md) -->
|
||||||
|
* … <!-- numbers of links can vary -->
|
||||||
79
decisions/0001-record-architecture-decisions.md
Normal file
79
decisions/0001-record-architecture-decisions.md
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
# Record Architecture Decisions
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-11-30
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Initial setup of homelab documentation
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
As the homelab infrastructure grows in complexity with AI/ML services, multi-GPU configurations, and event-driven architectures, we need a way to document and communicate significant architectural decisions. Without documentation, the rationale behind choices gets lost, making future changes risky and onboarding difficult.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Need to preserve context for why decisions were made
|
||||||
|
* Enable future maintainers (including AI agents) to understand the system
|
||||||
|
* Provide a structured way to evaluate alternatives
|
||||||
|
* Support the wiki/design process for iterative improvements
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* Informal documentation in README files
|
||||||
|
* Wiki pages without structure
|
||||||
|
* Architecture Decision Records (ADRs)
|
||||||
|
* No documentation (rely on code)
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "Architecture Decision Records (ADRs)", because they provide a structured format that captures context, alternatives, and consequences. They're lightweight, version-controlled, and well-suited for technical decisions.
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Clear historical record of decisions
|
||||||
|
* Structured format makes decisions searchable
|
||||||
|
* Forces consideration of alternatives
|
||||||
|
* Git-versioned alongside code
|
||||||
|
* AI agents can parse and understand decisions
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* Requires discipline to create ADRs
|
||||||
|
* May accumulate outdated decisions over time
|
||||||
|
* Additional overhead for simple decisions
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### Informal README documentation
|
||||||
|
|
||||||
|
* Good, because low friction
|
||||||
|
* Good, because close to code
|
||||||
|
* Bad, because no structure for alternatives
|
||||||
|
* Bad, because decisions get buried in prose
|
||||||
|
|
||||||
|
### Wiki pages
|
||||||
|
|
||||||
|
* Good, because easy to edit
|
||||||
|
* Good, because supports rich formatting
|
||||||
|
* Bad, because separate from code repository
|
||||||
|
* Bad, because no enforced structure
|
||||||
|
|
||||||
|
### ADRs
|
||||||
|
|
||||||
|
* Good, because structured format
|
||||||
|
* Good, because version controlled
|
||||||
|
* Good, because captures alternatives considered
|
||||||
|
* Good, because industry-standard practice
|
||||||
|
* Bad, because requires creating new files
|
||||||
|
* Bad, because may seem bureaucratic for small decisions
|
||||||
|
|
||||||
|
### No documentation
|
||||||
|
|
||||||
|
* Good, because no overhead
|
||||||
|
* Bad, because context is lost
|
||||||
|
* Bad, because makes onboarding difficult
|
||||||
|
* Bad, because risky for future changes
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* Based on [MADR template](https://adr.github.io/madr/)
|
||||||
|
* [ADR GitHub organization](https://adr.github.io/)
|
||||||
97
decisions/0002-use-talos-linux.md
Normal file
97
decisions/0002-use-talos-linux.md
Normal file
@@ -0,0 +1,97 @@
|
|||||||
|
# Use Talos Linux for Kubernetes Nodes
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-11-30
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Selecting OS for bare-metal Kubernetes cluster
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
We need a reliable, secure operating system for running Kubernetes on bare-metal homelab nodes. The OS should minimize attack surface, be easy to manage at scale, and support our GPU requirements (AMD ROCm, NVIDIA CUDA, Intel).
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Security-first design (immutable, minimal)
|
||||||
|
* API-driven management (no SSH)
|
||||||
|
* Support for various GPU drivers
|
||||||
|
* Kubernetes-native focus
|
||||||
|
* Community support and updates
|
||||||
|
* Ease of upgrades
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* Ubuntu Server with kubeadm
|
||||||
|
* Flatcar Container Linux
|
||||||
|
* Talos Linux
|
||||||
|
* k3OS (discontinued)
|
||||||
|
* Rocky Linux with RKE2
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "Talos Linux", because it provides an immutable, API-driven, Kubernetes-focused OS that minimizes attack surface and simplifies operations.
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Immutable root filesystem prevents drift
|
||||||
|
* No SSH reduces attack vectors
|
||||||
|
* API-driven management integrates well with GitOps
|
||||||
|
* Schematic system allows custom kernel modules (GPU drivers)
|
||||||
|
* Consistent configuration across all nodes
|
||||||
|
* Automatic updates with minimal disruption
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* Learning curve for API-driven management
|
||||||
|
* Debugging requires different approaches (no SSH)
|
||||||
|
* Custom extensions require schematic IDs
|
||||||
|
* Less flexibility for non-Kubernetes workloads
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### Ubuntu Server with kubeadm
|
||||||
|
|
||||||
|
* Good, because familiar
|
||||||
|
* Good, because extensive package availability
|
||||||
|
* Good, because easy debugging via SSH
|
||||||
|
* Bad, because mutable system leads to drift
|
||||||
|
* Bad, because large attack surface
|
||||||
|
* Bad, because manual package management
|
||||||
|
|
||||||
|
### Flatcar Container Linux
|
||||||
|
|
||||||
|
* Good, because immutable
|
||||||
|
* Good, because auto-updates
|
||||||
|
* Good, because container-focused
|
||||||
|
* Bad, because less Kubernetes-specific
|
||||||
|
* Bad, because smaller community than Talos
|
||||||
|
* Bad, because GPU driver setup more complex
|
||||||
|
|
||||||
|
### Talos Linux
|
||||||
|
|
||||||
|
* Good, because purpose-built for Kubernetes
|
||||||
|
* Good, because immutable and minimal
|
||||||
|
* Good, because API-driven (no SSH)
|
||||||
|
* Good, because excellent Kubernetes integration
|
||||||
|
* Good, because active development and community
|
||||||
|
* Good, because schematic system for GPU drivers
|
||||||
|
* Bad, because learning curve
|
||||||
|
* Bad, because no traditional debugging
|
||||||
|
|
||||||
|
### k3OS
|
||||||
|
|
||||||
|
* Good, because simple
|
||||||
|
* Bad, because discontinued
|
||||||
|
|
||||||
|
### Rocky Linux with RKE2
|
||||||
|
|
||||||
|
* Good, because enterprise-like
|
||||||
|
* Good, because familiar Linux experience
|
||||||
|
* Bad, because mutable system
|
||||||
|
* Bad, because more operational overhead
|
||||||
|
* Bad, because larger attack surface
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [Talos Linux](https://talos.dev)
|
||||||
|
* [Talos Image Factory](https://factory.talos.dev)
|
||||||
|
* Related: [ADR-0005](0005-multi-gpu-strategy.md) - GPU driver integration via schematics
|
||||||
112
decisions/0003-use-nats-for-messaging.md
Normal file
112
decisions/0003-use-nats-for-messaging.md
Normal file
@@ -0,0 +1,112 @@
|
|||||||
|
# Use NATS for AI/ML Messaging
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-12-01
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Selecting message bus for AI service orchestration
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
The AI/ML platform requires a messaging system for:
|
||||||
|
- Real-time chat message routing
|
||||||
|
- Voice request/response streaming
|
||||||
|
- Pipeline triggers and status updates
|
||||||
|
- Event-driven workflow orchestration
|
||||||
|
|
||||||
|
We need a messaging system that handles both ephemeral real-time messages and persistent streams.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Low latency for real-time chat/voice
|
||||||
|
* Persistence for audit and replay
|
||||||
|
* Simple operations for homelab
|
||||||
|
* Support for request-reply pattern
|
||||||
|
* Wildcard subscriptions for routing
|
||||||
|
* Binary message support (audio data)
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* Apache Kafka
|
||||||
|
* RabbitMQ
|
||||||
|
* Redis Pub/Sub + Streams
|
||||||
|
* NATS with JetStream
|
||||||
|
* Apache Pulsar
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "NATS with JetStream", because it provides both fire-and-forget messaging and persistent streams with significantly simpler operations than alternatives.
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Sub-millisecond latency for real-time messages
|
||||||
|
* JetStream provides persistence when needed
|
||||||
|
* Simple deployment (single binary)
|
||||||
|
* Excellent Kubernetes integration
|
||||||
|
* Request-reply pattern built-in
|
||||||
|
* Wildcard subscriptions for flexible routing
|
||||||
|
* Low resource footprint
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* Less ecosystem than Kafka
|
||||||
|
* JetStream less mature than Kafka Streams
|
||||||
|
* No built-in schema registry
|
||||||
|
* Smaller community than RabbitMQ
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### Apache Kafka
|
||||||
|
|
||||||
|
* Good, because industry standard for streaming
|
||||||
|
* Good, because rich ecosystem (Kafka Streams, Connect)
|
||||||
|
* Good, because schema registry
|
||||||
|
* Good, because excellent for high throughput
|
||||||
|
* Bad, because operationally complex (ZooKeeper/KRaft)
|
||||||
|
* Bad, because high resource requirements
|
||||||
|
* Bad, because overkill for homelab scale
|
||||||
|
* Bad, because higher latency for real-time messages
|
||||||
|
|
||||||
|
### RabbitMQ
|
||||||
|
|
||||||
|
* Good, because mature and stable
|
||||||
|
* Good, because flexible routing
|
||||||
|
* Good, because good management UI
|
||||||
|
* Bad, because AMQP protocol overhead
|
||||||
|
* Bad, because not designed for streaming
|
||||||
|
* Bad, because more complex clustering
|
||||||
|
|
||||||
|
### Redis Pub/Sub + Streams
|
||||||
|
|
||||||
|
* Good, because simple
|
||||||
|
* Good, because already might use Redis
|
||||||
|
* Good, because low latency
|
||||||
|
* Bad, because pub/sub not persistent
|
||||||
|
* Bad, because streams API less intuitive
|
||||||
|
* Bad, because not primary purpose of Redis
|
||||||
|
|
||||||
|
### NATS with JetStream
|
||||||
|
|
||||||
|
* Good, because extremely low latency
|
||||||
|
* Good, because simple operations
|
||||||
|
* Good, because both pub/sub and persistence
|
||||||
|
* Good, because request-reply built-in
|
||||||
|
* Good, because wildcard subscriptions
|
||||||
|
* Good, because low resource usage
|
||||||
|
* Good, because excellent Go/Python clients
|
||||||
|
* Bad, because smaller ecosystem
|
||||||
|
* Bad, because JetStream newer than Kafka
|
||||||
|
|
||||||
|
### Apache Pulsar
|
||||||
|
|
||||||
|
* Good, because unified messaging + streaming
|
||||||
|
* Good, because multi-tenancy
|
||||||
|
* Good, because geo-replication
|
||||||
|
* Bad, because complex architecture
|
||||||
|
* Bad, because high resource requirements
|
||||||
|
* Bad, because smaller community
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [NATS.io](https://nats.io)
|
||||||
|
* [JetStream Documentation](https://docs.nats.io/nats-concepts/jetstream)
|
||||||
|
* Related: [ADR-0004](0004-use-messagepack-for-nats.md) - Message format
|
||||||
137
decisions/0004-use-messagepack-for-nats.md
Normal file
137
decisions/0004-use-messagepack-for-nats.md
Normal file
@@ -0,0 +1,137 @@
|
|||||||
|
# Use MessagePack for NATS Messages
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-12-01
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Selecting serialization format for NATS messages
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
NATS messages in the AI platform carry various payloads:
|
||||||
|
- Text chat messages (small)
|
||||||
|
- Voice audio data (potentially large, base64 or binary)
|
||||||
|
- Streaming response chunks
|
||||||
|
- Pipeline parameters
|
||||||
|
|
||||||
|
We need a serialization format that handles both text and binary efficiently.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Efficient binary data handling (audio)
|
||||||
|
* Compact message size
|
||||||
|
* Fast serialization/deserialization
|
||||||
|
* Cross-language support (Python, Go)
|
||||||
|
* Debugging ability
|
||||||
|
* Schema flexibility
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* JSON
|
||||||
|
* Protocol Buffers (protobuf)
|
||||||
|
* MessagePack (msgpack)
|
||||||
|
* CBOR
|
||||||
|
* Avro
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "MessagePack (msgpack)", because it provides binary efficiency with JSON-like simplicity and schema-less flexibility.
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Native binary support (no base64 overhead for audio)
|
||||||
|
* 20-50% smaller than JSON for typical messages
|
||||||
|
* Faster serialization than JSON
|
||||||
|
* No schema compilation step
|
||||||
|
* Easy debugging (can pretty-print like JSON)
|
||||||
|
* Excellent Python and Go libraries
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* Less human-readable than JSON when raw
|
||||||
|
* No built-in schema validation
|
||||||
|
* Slightly less common than JSON
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### JSON
|
||||||
|
|
||||||
|
* Good, because human-readable
|
||||||
|
* Good, because universal support
|
||||||
|
* Good, because no setup required
|
||||||
|
* Bad, because binary data requires base64 (33% overhead)
|
||||||
|
* Bad, because larger message sizes
|
||||||
|
* Bad, because slower parsing
|
||||||
|
|
||||||
|
### Protocol Buffers
|
||||||
|
|
||||||
|
* Good, because very compact
|
||||||
|
* Good, because fast
|
||||||
|
* Good, because schema validation
|
||||||
|
* Good, because cross-language
|
||||||
|
* Bad, because requires schema definition
|
||||||
|
* Bad, because compilation step
|
||||||
|
* Bad, because less flexible for evolving schemas
|
||||||
|
* Bad, because overkill for simple messages
|
||||||
|
|
||||||
|
### MessagePack
|
||||||
|
|
||||||
|
* Good, because binary-efficient
|
||||||
|
* Good, because JSON-like simplicity
|
||||||
|
* Good, because no schema required
|
||||||
|
* Good, because excellent library support
|
||||||
|
* Good, because can include raw bytes
|
||||||
|
* Bad, because not human-readable raw
|
||||||
|
* Bad, because no schema validation
|
||||||
|
|
||||||
|
### CBOR
|
||||||
|
|
||||||
|
* Good, because binary-efficient
|
||||||
|
* Good, because IETF standard
|
||||||
|
* Good, because schema-less
|
||||||
|
* Bad, because less common libraries
|
||||||
|
* Bad, because smaller community
|
||||||
|
* Bad, because similar to msgpack with less adoption
|
||||||
|
|
||||||
|
### Avro
|
||||||
|
|
||||||
|
* Good, because schema evolution
|
||||||
|
* Good, because compact
|
||||||
|
* Good, because schema registry integration
|
||||||
|
* Bad, because requires schema
|
||||||
|
* Bad, because more complex setup
|
||||||
|
* Bad, because Java-centric ecosystem
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Python usage
|
||||||
|
import msgpack
|
||||||
|
|
||||||
|
# Serialize
|
||||||
|
data = {
|
||||||
|
"user_id": "user-123",
|
||||||
|
"audio": audio_bytes, # Raw bytes, no base64
|
||||||
|
"premium": True
|
||||||
|
}
|
||||||
|
payload = msgpack.packb(data)
|
||||||
|
|
||||||
|
# Deserialize
|
||||||
|
data = msgpack.unpackb(payload, raw=False)
|
||||||
|
```
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Go usage
|
||||||
|
import "github.com/vmihailenco/msgpack/v5"
|
||||||
|
|
||||||
|
type Message struct {
|
||||||
|
UserID string `msgpack:"user_id"`
|
||||||
|
Audio []byte `msgpack:"audio"`
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [MessagePack Specification](https://msgpack.org)
|
||||||
|
* [msgpack-python](https://github.com/msgpack/msgpack-python)
|
||||||
|
* Related: [ADR-0003](0003-use-nats-for-messaging.md) - Message bus choice
|
||||||
|
* See: [BINARY_MESSAGES_AND_JETSTREAM.md](../specs/BINARY_MESSAGES_AND_JETSTREAM.md)
|
||||||
145
decisions/0005-multi-gpu-strategy.md
Normal file
145
decisions/0005-multi-gpu-strategy.md
Normal file
@@ -0,0 +1,145 @@
|
|||||||
|
# Multi-GPU Heterogeneous Strategy
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-12-01
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: GPU allocation strategy for AI workloads
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
The homelab has diverse GPU hardware:
|
||||||
|
- AMD Strix Halo (64GB unified memory) - khelben
|
||||||
|
- NVIDIA RTX 2070 (8GB VRAM) - elminster
|
||||||
|
- AMD Radeon 680M (12GB VRAM) - drizzt
|
||||||
|
- Intel Arc (integrated) - danilo
|
||||||
|
|
||||||
|
Different AI workloads have different requirements. How do we allocate GPUs effectively?
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Maximize utilization of all GPUs
|
||||||
|
* Match workloads to appropriate hardware
|
||||||
|
* Support concurrent inference services
|
||||||
|
* Enable fractional GPU sharing where appropriate
|
||||||
|
* Minimize cross-vendor complexity
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* Single GPU vendor only
|
||||||
|
* All workloads on largest GPU
|
||||||
|
* Workload-specific GPU allocation
|
||||||
|
* Dynamic GPU scheduling (MIG/fractional)
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "Workload-specific GPU allocation with dedicated nodes", where each AI service is pinned to the most appropriate GPU based on requirements.
|
||||||
|
|
||||||
|
### Allocation Strategy
|
||||||
|
|
||||||
|
| Workload | GPU | Node | Rationale |
|
||||||
|
|----------|-----|------|-----------|
|
||||||
|
| vLLM (LLM inference) | AMD Strix Halo (64GB) | khelben (dedicated) | Large models need unified memory |
|
||||||
|
| Whisper (STT) | NVIDIA RTX 2070 (8GB) | elminster | CUDA optimized, medium memory |
|
||||||
|
| XTTS (TTS) | NVIDIA RTX 2070 (8GB) | elminster | Shares with Whisper |
|
||||||
|
| BGE Embeddings | AMD Radeon 680M (12GB) | drizzt | ROCm support, batch processing |
|
||||||
|
| BGE Reranker | Intel Arc | danilo | Light workload, Intel optimization |
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Each workload gets optimal hardware
|
||||||
|
* No GPU memory contention for LLM
|
||||||
|
* NVIDIA services can share via time-slicing
|
||||||
|
* Cost-effective use of varied hardware
|
||||||
|
* Clear ownership and debugging
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* More complex scheduling (node taints/tolerations)
|
||||||
|
* Less flexibility for workload migration
|
||||||
|
* Must maintain multiple GPU driver stacks
|
||||||
|
* Some GPUs underutilized at times
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
### Node Taints
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# khelben - dedicated vLLM node
|
||||||
|
nodeTaints:
|
||||||
|
dedicated: "vllm:NoSchedule"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pod Tolerations and Node Affinity
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# vLLM deployment
|
||||||
|
spec:
|
||||||
|
tolerations:
|
||||||
|
- key: "dedicated"
|
||||||
|
operator: "Equal"
|
||||||
|
value: "vllm"
|
||||||
|
effect: "NoSchedule"
|
||||||
|
affinity:
|
||||||
|
nodeAffinity:
|
||||||
|
requiredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
nodeSelectorTerms:
|
||||||
|
- matchExpressions:
|
||||||
|
- key: kubernetes.io/hostname
|
||||||
|
operator: In
|
||||||
|
values: ["khelben"]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Resource Limits
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# NVIDIA GPU (elminster)
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
nvidia.com/gpu: 1
|
||||||
|
|
||||||
|
# AMD GPU (drizzt, khelben)
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
amd.com/gpu: 1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### Single GPU vendor only
|
||||||
|
|
||||||
|
* Good, because simpler driver management
|
||||||
|
* Good, because consistent tooling
|
||||||
|
* Bad, because wastes existing hardware
|
||||||
|
* Bad, because higher cost for new hardware
|
||||||
|
|
||||||
|
### All workloads on largest GPU
|
||||||
|
|
||||||
|
* Good, because simple scheduling
|
||||||
|
* Good, because unified memory benefits
|
||||||
|
* Bad, because memory contention
|
||||||
|
* Bad, because single point of failure
|
||||||
|
* Bad, because wastes other GPUs
|
||||||
|
|
||||||
|
### Workload-specific allocation (chosen)
|
||||||
|
|
||||||
|
* Good, because optimal hardware matching
|
||||||
|
* Good, because uses all available GPUs
|
||||||
|
* Good, because clear resource boundaries
|
||||||
|
* Good, because parallel inference
|
||||||
|
* Bad, because more complex configuration
|
||||||
|
* Bad, because multiple driver stacks
|
||||||
|
|
||||||
|
### Dynamic GPU scheduling
|
||||||
|
|
||||||
|
* Good, because flexible
|
||||||
|
* Good, because maximizes utilization
|
||||||
|
* Bad, because complex to implement
|
||||||
|
* Bad, because MIG not available on consumer GPUs
|
||||||
|
* Bad, because cross-vendor scheduling immature
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [Volcano Scheduler](https://volcano.sh)
|
||||||
|
* [AMD GPU Device Plugin](https://github.com/ROCm/k8s-device-plugin)
|
||||||
|
* [NVIDIA Device Plugin](https://github.com/NVIDIA/k8s-device-plugin)
|
||||||
|
* Related: [ADR-0002](0002-use-talos-linux.md) - GPU drivers via Talos schematics
|
||||||
140
decisions/0006-gitops-with-flux.md
Normal file
140
decisions/0006-gitops-with-flux.md
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
# GitOps with Flux CD
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-11-30
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Implementing GitOps for cluster management
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
Managing a Kubernetes cluster with numerous applications, configurations, and secrets requires a reliable, auditable, and reproducible approach. Manual `kubectl apply` is error-prone and doesn't track state over time.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Infrastructure as Code (IaC) principles
|
||||||
|
* Audit trail for all changes
|
||||||
|
* Self-healing cluster state
|
||||||
|
* Multi-repository support
|
||||||
|
* Secret encryption integration
|
||||||
|
* Active community and maintenance
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* Manual kubectl apply
|
||||||
|
* ArgoCD
|
||||||
|
* Flux CD
|
||||||
|
* Rancher Fleet
|
||||||
|
* Pulumi/Terraform for Kubernetes
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "Flux CD", because it provides a mature GitOps implementation with excellent multi-source support, SOPS integration, and aligns well with the Kubernetes ecosystem.
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Git is single source of truth
|
||||||
|
* Automatic drift detection and correction
|
||||||
|
* Native SOPS/Age secret encryption
|
||||||
|
* Multi-repository support (homelab-k8s2 + llm-workflows)
|
||||||
|
* Helm and Kustomize native support
|
||||||
|
* Webhook-free sync (pull-based)
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* No built-in UI (use CLI or third-party)
|
||||||
|
* Learning curve for CRD-based configuration
|
||||||
|
* Debugging requires understanding Flux controllers
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Repository Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
homelab-k8s2/
|
||||||
|
├── kubernetes/
|
||||||
|
│ ├── flux/ # Flux system config
|
||||||
|
│ │ ├── config/
|
||||||
|
│ │ │ ├── cluster.yaml
|
||||||
|
│ │ │ └── secrets.yaml # SOPS encrypted
|
||||||
|
│ │ └── repositories/
|
||||||
|
│ │ ├── helm/ # HelmRepositories
|
||||||
|
│ │ └── git/ # GitRepositories
|
||||||
|
│ └── apps/ # Application Kustomizations
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multi-Repository Sync
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# GitRepository for llm-workflows
|
||||||
|
apiVersion: source.toolkit.fluxcd.io/v1
|
||||||
|
kind: GitRepository
|
||||||
|
metadata:
|
||||||
|
name: llm-workflows
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
url: ssh://git@github.com/Billy-Davies-2/llm-workflows
|
||||||
|
ref:
|
||||||
|
branch: main
|
||||||
|
secretRef:
|
||||||
|
name: github-deploy-key
|
||||||
|
```
|
||||||
|
|
||||||
|
### SOPS Integration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# .sops.yaml
|
||||||
|
creation_rules:
|
||||||
|
- path_regex: .*\.sops\.yaml$
|
||||||
|
age: >-
|
||||||
|
age1... # Public key
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### Manual kubectl apply
|
||||||
|
|
||||||
|
* Good, because simple
|
||||||
|
* Good, because no setup
|
||||||
|
* Bad, because no audit trail
|
||||||
|
* Bad, because no drift detection
|
||||||
|
* Bad, because not reproducible
|
||||||
|
|
||||||
|
### ArgoCD
|
||||||
|
|
||||||
|
* Good, because great UI
|
||||||
|
* Good, because app-of-apps pattern
|
||||||
|
* Good, because large community
|
||||||
|
* Bad, because heavier resource usage
|
||||||
|
* Bad, because webhook-dependent sync
|
||||||
|
* Bad, because SOPS requires plugins
|
||||||
|
|
||||||
|
### Flux CD
|
||||||
|
|
||||||
|
* Good, because lightweight
|
||||||
|
* Good, because pull-based (no webhooks)
|
||||||
|
* Good, because native SOPS support
|
||||||
|
* Good, because multi-source/multi-tenant
|
||||||
|
* Good, because Kubernetes-native CRDs
|
||||||
|
* Bad, because no built-in UI
|
||||||
|
* Bad, because CRD learning curve
|
||||||
|
|
||||||
|
### Rancher Fleet
|
||||||
|
|
||||||
|
* Good, because integrated with Rancher
|
||||||
|
* Good, because multi-cluster
|
||||||
|
* Bad, because Rancher ecosystem lock-in
|
||||||
|
* Bad, because smaller community
|
||||||
|
|
||||||
|
### Pulumi/Terraform
|
||||||
|
|
||||||
|
* Good, because familiar IaC tools
|
||||||
|
* Good, because drift detection
|
||||||
|
* Bad, because not Kubernetes-native
|
||||||
|
* Bad, because requires state management
|
||||||
|
* Bad, because not continuous reconciliation
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [Flux CD](https://fluxcd.io)
|
||||||
|
* [SOPS Integration](https://fluxcd.io/flux/guides/mozilla-sops/)
|
||||||
|
* [flux-local](https://github.com/allenporter/flux-local) - Local testing
|
||||||
115
decisions/0007-use-kserve-for-inference.md
Normal file
115
decisions/0007-use-kserve-for-inference.md
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
# Use KServe for ML Model Serving
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-12-15
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Selecting model serving platform for inference services
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
We need to deploy multiple ML models (Whisper, XTTS, BGE, vLLM) as inference endpoints. Each model has different requirements for scaling, protocols (HTTP/gRPC), and GPU allocation.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Standardized inference protocol (V2)
|
||||||
|
* Autoscaling based on load
|
||||||
|
* Traffic splitting for canary deployments
|
||||||
|
* Integration with Kubeflow ecosystem
|
||||||
|
* GPU resource management
|
||||||
|
* Health checks and readiness
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* Raw Kubernetes Deployments + Services
|
||||||
|
* KServe InferenceService
|
||||||
|
* Seldon Core
|
||||||
|
* BentoML
|
||||||
|
* Ray Serve only
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "KServe InferenceService", because it provides a standardized, Kubernetes-native approach to model serving with built-in autoscaling and traffic management.
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Standardized V2 inference protocol
|
||||||
|
* Automatic scale-to-zero capability
|
||||||
|
* Canary/blue-green deployments
|
||||||
|
* Integration with Kubeflow UI
|
||||||
|
* Transformer/Explainer components
|
||||||
|
* GPU resource abstraction
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* Additional CRDs and operators
|
||||||
|
* Learning curve for InferenceService spec
|
||||||
|
* Some overhead for simple deployments
|
||||||
|
* Knative Serving dependency (optional)
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### Raw Kubernetes Deployments
|
||||||
|
|
||||||
|
* Good, because simple
|
||||||
|
* Good, because full control
|
||||||
|
* Bad, because no autoscaling logic
|
||||||
|
* Bad, because manual service mesh
|
||||||
|
* Bad, because repetitive configuration
|
||||||
|
|
||||||
|
### KServe InferenceService
|
||||||
|
|
||||||
|
* Good, because standardized API
|
||||||
|
* Good, because autoscaling
|
||||||
|
* Good, because traffic management
|
||||||
|
* Good, because Kubeflow integration
|
||||||
|
* Bad, because operator complexity
|
||||||
|
* Bad, because Knative optional dependency
|
||||||
|
|
||||||
|
### Seldon Core
|
||||||
|
|
||||||
|
* Good, because mature
|
||||||
|
* Good, because A/B testing
|
||||||
|
* Good, because explainability
|
||||||
|
* Bad, because more complex than KServe
|
||||||
|
* Bad, because heavier resource usage
|
||||||
|
|
||||||
|
### BentoML
|
||||||
|
|
||||||
|
* Good, because developer-friendly
|
||||||
|
* Good, because packaging focused
|
||||||
|
* Bad, because less Kubernetes-native
|
||||||
|
* Bad, because smaller community
|
||||||
|
|
||||||
|
### Ray Serve
|
||||||
|
|
||||||
|
* Good, because unified compute
|
||||||
|
* Good, because Python-native
|
||||||
|
* Good, because fractional GPU
|
||||||
|
* Bad, because less standardized API
|
||||||
|
* Bad, because Ray cluster overhead
|
||||||
|
|
||||||
|
## Current Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: serving.kserve.io/v1beta1
|
||||||
|
kind: InferenceService
|
||||||
|
metadata:
|
||||||
|
name: whisper
|
||||||
|
namespace: ai-ml
|
||||||
|
spec:
|
||||||
|
predictor:
|
||||||
|
minReplicas: 1
|
||||||
|
maxReplicas: 3
|
||||||
|
containers:
|
||||||
|
- name: whisper
|
||||||
|
image: ghcr.io/org/whisper:latest
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
nvidia.com/gpu: 1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [KServe](https://kserve.github.io)
|
||||||
|
* [V2 Inference Protocol](https://kserve.github.io/website/latest/modelserving/data_plane/v2_protocol/)
|
||||||
|
* Related: [ADR-0005](0005-multi-gpu-strategy.md) - GPU allocation
|
||||||
107
decisions/0008-use-milvus-for-vectors.md
Normal file
107
decisions/0008-use-milvus-for-vectors.md
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
# Use Milvus for Vector Storage
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-12-15
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Selecting vector database for RAG system
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
The RAG (Retrieval-Augmented Generation) system requires a vector database to store document embeddings and perform similarity search. We need to store millions of embeddings and query them with low latency.
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Query performance (< 100ms for top-k search)
|
||||||
|
* Scalability to millions of vectors
|
||||||
|
* Kubernetes-native deployment
|
||||||
|
* Active development and community
|
||||||
|
* Support for metadata filtering
|
||||||
|
* Backup and restore capabilities
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* Milvus
|
||||||
|
* Pinecone (managed)
|
||||||
|
* Qdrant
|
||||||
|
* Weaviate
|
||||||
|
* pgvector (PostgreSQL extension)
|
||||||
|
* Chroma
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "Milvus", because it provides production-grade vector search with excellent Kubernetes support, scalability, and active development.
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* High-performance similarity search
|
||||||
|
* Horizontal scalability
|
||||||
|
* Rich filtering and hybrid search
|
||||||
|
* Helm chart for Kubernetes
|
||||||
|
* Active CNCF sandbox project
|
||||||
|
* GPU acceleration available
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* Complex architecture (multiple components)
|
||||||
|
* Higher resource usage than simpler alternatives
|
||||||
|
* Requires object storage (MinIO)
|
||||||
|
* Learning curve for optimization
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### Milvus
|
||||||
|
|
||||||
|
* Good, because production-proven at scale
|
||||||
|
* Good, because rich query API
|
||||||
|
* Good, because Kubernetes-native
|
||||||
|
* Good, because hybrid search (vector + scalar)
|
||||||
|
* Good, because CNCF project
|
||||||
|
* Bad, because complex architecture
|
||||||
|
* Bad, because higher resource usage
|
||||||
|
|
||||||
|
### Pinecone
|
||||||
|
|
||||||
|
* Good, because fully managed
|
||||||
|
* Good, because simple API
|
||||||
|
* Good, because reliable
|
||||||
|
* Bad, because external dependency
|
||||||
|
* Bad, because cost at scale
|
||||||
|
* Bad, because data sovereignty concerns
|
||||||
|
|
||||||
|
### Qdrant
|
||||||
|
|
||||||
|
* Good, because simpler than Milvus
|
||||||
|
* Good, because Rust performance
|
||||||
|
* Good, because good filtering
|
||||||
|
* Bad, because smaller community
|
||||||
|
* Bad, because less enterprise features
|
||||||
|
|
||||||
|
### Weaviate
|
||||||
|
|
||||||
|
* Good, because built-in vectorization
|
||||||
|
* Good, because GraphQL API
|
||||||
|
* Good, because modules system
|
||||||
|
* Bad, because more opinionated
|
||||||
|
* Bad, because schema requirements
|
||||||
|
|
||||||
|
### pgvector
|
||||||
|
|
||||||
|
* Good, because familiar PostgreSQL
|
||||||
|
* Good, because simple deployment
|
||||||
|
* Good, because ACID transactions
|
||||||
|
* Bad, because limited scale
|
||||||
|
* Bad, because slower for large datasets
|
||||||
|
* Bad, because no specialized optimizations
|
||||||
|
|
||||||
|
### Chroma
|
||||||
|
|
||||||
|
* Good, because simple
|
||||||
|
* Good, because embedded option
|
||||||
|
* Bad, because not production-ready at scale
|
||||||
|
* Bad, because limited features
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [Milvus](https://milvus.io)
|
||||||
|
* [Milvus Helm Chart](https://github.com/milvus-io/milvus-helm)
|
||||||
|
* Related: [DOMAIN-MODEL.md](../DOMAIN-MODEL.md) - Chunk/Embedding entities
|
||||||
124
decisions/0009-dual-workflow-engines.md
Normal file
124
decisions/0009-dual-workflow-engines.md
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
# Dual Workflow Engine Strategy (Argo + Kubeflow)
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2026-01-15
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Selecting workflow orchestration for ML pipelines
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
The AI platform needs workflow orchestration for:
|
||||||
|
- ML training pipelines with caching
|
||||||
|
- Document ingestion (batch)
|
||||||
|
- Complex DAG workflows (training → evaluation → deployment)
|
||||||
|
- Hybrid scenarios combining both
|
||||||
|
|
||||||
|
Should we use one engine or leverage strengths of multiple?
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* ML-specific features (caching, lineage)
|
||||||
|
* Complex DAG support
|
||||||
|
* Kubernetes-native execution
|
||||||
|
* Visibility and debugging
|
||||||
|
* Community and ecosystem
|
||||||
|
* Integration with existing tools
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* Kubeflow Pipelines only
|
||||||
|
* Argo Workflows only
|
||||||
|
* Both engines with clear use cases
|
||||||
|
* Airflow on Kubernetes
|
||||||
|
* Prefect/Dagster
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "Both engines with clear use cases", using Kubeflow Pipelines for ML-centric workflows and Argo Workflows for complex DAG orchestration.
|
||||||
|
|
||||||
|
### Decision Matrix
|
||||||
|
|
||||||
|
| Use Case | Engine | Reason |
|
||||||
|
|----------|--------|--------|
|
||||||
|
| ML training with caching | Kubeflow | Component caching, experiment tracking |
|
||||||
|
| Model evaluation | Kubeflow | Metric collection, comparison |
|
||||||
|
| Document ingestion | Argo | Simple DAG, no ML features needed |
|
||||||
|
| Batch inference | Argo | Parallelization, retries |
|
||||||
|
| Complex DAG with branching | Argo | Superior control flow |
|
||||||
|
| Hybrid ML training | Both | Argo orchestrates, KFP for ML steps |
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Best tool for each job
|
||||||
|
* ML pipelines get proper caching
|
||||||
|
* Complex workflows get better DAG support
|
||||||
|
* Can integrate via Argo Events
|
||||||
|
* Gradual migration possible
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* Two systems to maintain
|
||||||
|
* Team needs to learn both
|
||||||
|
* More complex debugging
|
||||||
|
* Integration overhead
|
||||||
|
|
||||||
|
## Integration Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
NATS Event ──► Argo Events ──► Sensor ──┬──► Argo Workflow
|
||||||
|
│
|
||||||
|
└──► Kubeflow Pipeline (via API)
|
||||||
|
|
||||||
|
OR
|
||||||
|
|
||||||
|
Argo Workflow ──► Step: kfp-trigger ──► Kubeflow Pipeline
|
||||||
|
(WorkflowTemplate)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### Kubeflow Pipelines only
|
||||||
|
|
||||||
|
* Good, because ML-focused
|
||||||
|
* Good, because caching
|
||||||
|
* Good, because experiment tracking
|
||||||
|
* Bad, because limited DAG features
|
||||||
|
* Bad, because less flexible control flow
|
||||||
|
|
||||||
|
### Argo Workflows only
|
||||||
|
|
||||||
|
* Good, because powerful DAG
|
||||||
|
* Good, because flexible
|
||||||
|
* Good, because great debugging
|
||||||
|
* Bad, because no ML caching
|
||||||
|
* Bad, because no experiment tracking
|
||||||
|
|
||||||
|
### Both engines (chosen)
|
||||||
|
|
||||||
|
* Good, because best of both
|
||||||
|
* Good, because appropriate tool per job
|
||||||
|
* Good, because can integrate
|
||||||
|
* Bad, because operational complexity
|
||||||
|
* Bad, because learning two systems
|
||||||
|
|
||||||
|
### Airflow
|
||||||
|
|
||||||
|
* Good, because mature
|
||||||
|
* Good, because large community
|
||||||
|
* Bad, because Python-centric
|
||||||
|
* Bad, because not Kubernetes-native
|
||||||
|
* Bad, because no ML features
|
||||||
|
|
||||||
|
### Prefect/Dagster
|
||||||
|
|
||||||
|
* Good, because modern design
|
||||||
|
* Good, because Python-native
|
||||||
|
* Bad, because less Kubernetes-native
|
||||||
|
* Bad, because newer/less proven
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [Kubeflow Pipelines](https://kubeflow.org/docs/components/pipelines/)
|
||||||
|
* [Argo Workflows](https://argoproj.github.io/workflows/)
|
||||||
|
* [Argo Events](https://argoproj.github.io/events/)
|
||||||
|
* Related: [kfp-integration.yaml](../../llm-workflows/argo/kfp-integration.yaml)
|
||||||
120
decisions/0010-use-envoy-gateway.md
Normal file
120
decisions/0010-use-envoy-gateway.md
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
# Use Envoy Gateway for Ingress
|
||||||
|
|
||||||
|
* Status: accepted
|
||||||
|
* Date: 2025-12-01
|
||||||
|
* Deciders: Billy Davies
|
||||||
|
* Technical Story: Selecting ingress controller for cluster
|
||||||
|
|
||||||
|
## Context and Problem Statement
|
||||||
|
|
||||||
|
We need an ingress solution that supports:
|
||||||
|
- Gateway API (modern Kubernetes standard)
|
||||||
|
- gRPC for ML inference
|
||||||
|
- WebSocket for real-time chat/voice
|
||||||
|
- Header-based routing for A/B testing
|
||||||
|
- TLS termination
|
||||||
|
|
||||||
|
## Decision Drivers
|
||||||
|
|
||||||
|
* Gateway API support (HTTPRoute, GRPCRoute)
|
||||||
|
* WebSocket support
|
||||||
|
* gRPC support
|
||||||
|
* Performance at edge
|
||||||
|
* Active development
|
||||||
|
* Envoy ecosystem familiarity
|
||||||
|
|
||||||
|
## Considered Options
|
||||||
|
|
||||||
|
* NGINX Ingress Controller
|
||||||
|
* Traefik
|
||||||
|
* Envoy Gateway
|
||||||
|
* Istio Gateway
|
||||||
|
* Contour
|
||||||
|
|
||||||
|
## Decision Outcome
|
||||||
|
|
||||||
|
Chosen option: "Envoy Gateway", because it's the reference implementation of Gateway API with full Envoy feature set.
|
||||||
|
|
||||||
|
### Positive Consequences
|
||||||
|
|
||||||
|
* Native Gateway API support
|
||||||
|
* Full Envoy feature set
|
||||||
|
* WebSocket and gRPC native
|
||||||
|
* No Istio complexity
|
||||||
|
* CNCF graduated project (Envoy)
|
||||||
|
* Easy integration with observability
|
||||||
|
|
||||||
|
### Negative Consequences
|
||||||
|
|
||||||
|
* Newer than alternatives
|
||||||
|
* Less documentation than NGINX
|
||||||
|
* Envoy configuration learning curve
|
||||||
|
|
||||||
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
|
### NGINX Ingress
|
||||||
|
|
||||||
|
* Good, because mature
|
||||||
|
* Good, because well-documented
|
||||||
|
* Good, because familiar
|
||||||
|
* Bad, because limited Gateway API
|
||||||
|
* Bad, because commercial features gated
|
||||||
|
|
||||||
|
### Traefik
|
||||||
|
|
||||||
|
* Good, because auto-discovery
|
||||||
|
* Good, because good UI
|
||||||
|
* Good, because Let's Encrypt
|
||||||
|
* Bad, because Gateway API experimental
|
||||||
|
* Bad, because less gRPC focus
|
||||||
|
|
||||||
|
### Envoy Gateway
|
||||||
|
|
||||||
|
* Good, because Gateway API native
|
||||||
|
* Good, because full Envoy features
|
||||||
|
* Good, because extensible
|
||||||
|
* Good, because gRPC/WebSocket native
|
||||||
|
* Bad, because newer project
|
||||||
|
* Bad, because less community content
|
||||||
|
|
||||||
|
### Istio Gateway
|
||||||
|
|
||||||
|
* Good, because full mesh features
|
||||||
|
* Good, because Gateway API
|
||||||
|
* Bad, because overkill without mesh
|
||||||
|
* Bad, because resource heavy
|
||||||
|
|
||||||
|
### Contour
|
||||||
|
|
||||||
|
* Good, because Envoy-based
|
||||||
|
* Good, because lightweight
|
||||||
|
* Bad, because Gateway API evolving
|
||||||
|
* Bad, because smaller community
|
||||||
|
|
||||||
|
## Configuration Example
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: gateway.networking.k8s.io/v1
|
||||||
|
kind: HTTPRoute
|
||||||
|
metadata:
|
||||||
|
name: companions-chat
|
||||||
|
spec:
|
||||||
|
parentRefs:
|
||||||
|
- name: eg-gateway
|
||||||
|
namespace: network
|
||||||
|
hostnames:
|
||||||
|
- companions-chat.lab.daviestechlabs.io
|
||||||
|
rules:
|
||||||
|
- matches:
|
||||||
|
- path:
|
||||||
|
type: PathPrefix
|
||||||
|
value: /
|
||||||
|
backendRefs:
|
||||||
|
- name: companions-chat
|
||||||
|
port: 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
## Links
|
||||||
|
|
||||||
|
* [Envoy Gateway](https://gateway.envoyproxy.io)
|
||||||
|
* [Gateway API](https://gateway-api.sigs.k8s.io)
|
||||||
35
diagrams/README.md
Normal file
35
diagrams/README.md
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
# Diagrams
|
||||||
|
|
||||||
|
This directory contains additional architecture diagrams beyond the main C4 diagrams.
|
||||||
|
|
||||||
|
## Available Diagrams
|
||||||
|
|
||||||
|
| File | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| [gpu-allocation.mmd](gpu-allocation.mmd) | GPU workload distribution |
|
||||||
|
| [data-flow-chat.mmd](data-flow-chat.mmd) | Chat request data flow |
|
||||||
|
| [data-flow-voice.mmd](data-flow-voice.mmd) | Voice request data flow |
|
||||||
|
|
||||||
|
## Rendering Diagrams
|
||||||
|
|
||||||
|
### VS Code
|
||||||
|
|
||||||
|
Install the "Markdown Preview Mermaid Support" extension.
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Using mmdc (Mermaid CLI)
|
||||||
|
npx @mermaid-js/mermaid-cli mmdc -i diagram.mmd -o diagram.png
|
||||||
|
```
|
||||||
|
|
||||||
|
### Online
|
||||||
|
|
||||||
|
Use [Mermaid Live Editor](https://mermaid.live)
|
||||||
|
|
||||||
|
## Diagram Conventions
|
||||||
|
|
||||||
|
1. Use `.mmd` extension for Mermaid diagrams
|
||||||
|
2. Include title as comment at top of file
|
||||||
|
3. Use consistent styling classes
|
||||||
|
4. Keep diagrams focused (one concept per diagram)
|
||||||
51
diagrams/data-flow-chat.mmd
Normal file
51
diagrams/data-flow-chat.mmd
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
%% Chat Request Data Flow
|
||||||
|
%% Sequence diagram showing chat message processing
|
||||||
|
|
||||||
|
sequenceDiagram
|
||||||
|
autonumber
|
||||||
|
participant U as User
|
||||||
|
participant W as WebApp<br/>(companions)
|
||||||
|
participant N as NATS
|
||||||
|
participant C as Chat Handler
|
||||||
|
participant V as Valkey<br/>(Cache)
|
||||||
|
participant E as BGE Embeddings
|
||||||
|
participant M as Milvus
|
||||||
|
participant R as Reranker
|
||||||
|
participant L as vLLM
|
||||||
|
|
||||||
|
U->>W: Send message
|
||||||
|
W->>N: Publish ai.chat.user.{id}.message
|
||||||
|
N->>C: Deliver message
|
||||||
|
|
||||||
|
C->>V: Get session history
|
||||||
|
V-->>C: Previous messages
|
||||||
|
|
||||||
|
alt RAG Enabled
|
||||||
|
C->>E: Generate query embedding
|
||||||
|
E-->>C: Query vector
|
||||||
|
C->>M: Search similar chunks
|
||||||
|
M-->>C: Top-K chunks
|
||||||
|
|
||||||
|
opt Reranker Enabled
|
||||||
|
C->>R: Rerank chunks
|
||||||
|
R-->>C: Reordered chunks
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
C->>L: LLM inference (context + query)
|
||||||
|
|
||||||
|
alt Streaming Enabled
|
||||||
|
loop For each token
|
||||||
|
L-->>C: Token
|
||||||
|
C->>N: Publish ai.chat.response.stream.{id}
|
||||||
|
N-->>W: Deliver chunk
|
||||||
|
W-->>U: Display token
|
||||||
|
end
|
||||||
|
else Non-streaming
|
||||||
|
L-->>C: Full response
|
||||||
|
C->>N: Publish ai.chat.response.{id}
|
||||||
|
N-->>W: Deliver response
|
||||||
|
W-->>U: Display response
|
||||||
|
end
|
||||||
|
|
||||||
|
C->>V: Save to session history
|
||||||
46
diagrams/data-flow-voice.mmd
Normal file
46
diagrams/data-flow-voice.mmd
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
%% Voice Request Data Flow
|
||||||
|
%% Sequence diagram showing voice assistant processing
|
||||||
|
|
||||||
|
sequenceDiagram
|
||||||
|
autonumber
|
||||||
|
participant U as User
|
||||||
|
participant W as Voice WebApp
|
||||||
|
participant N as NATS
|
||||||
|
participant VA as Voice Assistant
|
||||||
|
participant STT as Whisper<br/>(STT)
|
||||||
|
participant E as BGE Embeddings
|
||||||
|
participant M as Milvus
|
||||||
|
participant R as Reranker
|
||||||
|
participant L as vLLM
|
||||||
|
participant TTS as XTTS<br/>(TTS)
|
||||||
|
|
||||||
|
U->>W: Record audio
|
||||||
|
W->>N: Publish ai.voice.user.{id}.request<br/>(msgpack with audio bytes)
|
||||||
|
N->>VA: Deliver voice request
|
||||||
|
|
||||||
|
VA->>STT: Transcribe audio
|
||||||
|
STT-->>VA: Transcription text
|
||||||
|
|
||||||
|
alt RAG Enabled
|
||||||
|
VA->>E: Generate query embedding
|
||||||
|
E-->>VA: Query vector
|
||||||
|
VA->>M: Search similar chunks
|
||||||
|
M-->>VA: Top-K chunks
|
||||||
|
|
||||||
|
opt Reranker Enabled
|
||||||
|
VA->>R: Rerank chunks
|
||||||
|
R-->>VA: Reordered chunks
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
VA->>L: LLM inference
|
||||||
|
L-->>VA: Response text
|
||||||
|
|
||||||
|
VA->>TTS: Synthesize speech
|
||||||
|
TTS-->>VA: Audio bytes
|
||||||
|
|
||||||
|
VA->>N: Publish ai.voice.response.{id}<br/>(text + audio)
|
||||||
|
N-->>W: Deliver response
|
||||||
|
W-->>U: Play audio + show text
|
||||||
|
|
||||||
|
Note over VA,TTS: Total latency target: < 3s
|
||||||
47
diagrams/gpu-allocation.mmd
Normal file
47
diagrams/gpu-allocation.mmd
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
%% GPU Allocation Diagram
|
||||||
|
%% Shows how AI workloads are distributed across GPU nodes
|
||||||
|
|
||||||
|
flowchart TB
|
||||||
|
subgraph khelben["🖥️ khelben (AMD Strix Halo 64GB)"]
|
||||||
|
direction TB
|
||||||
|
vllm["🧠 vLLM<br/>LLM Inference<br/>100% GPU"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph elminster["🖥️ elminster (NVIDIA RTX 2070 8GB)"]
|
||||||
|
direction TB
|
||||||
|
whisper["🎤 Whisper<br/>STT<br/>~50% GPU"]
|
||||||
|
xtts["🔊 XTTS<br/>TTS<br/>~50% GPU"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph drizzt["🖥️ drizzt (AMD Radeon 680M 12GB)"]
|
||||||
|
direction TB
|
||||||
|
embeddings["📊 BGE Embeddings<br/>Vector Encoding<br/>~80% GPU"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph danilo["🖥️ danilo (Intel Arc)"]
|
||||||
|
direction TB
|
||||||
|
reranker["📋 BGE Reranker<br/>Document Ranking<br/>~80% GPU"]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph workloads["Workload Routing"]
|
||||||
|
chat["💬 Chat Request"]
|
||||||
|
voice["🎤 Voice Request"]
|
||||||
|
end
|
||||||
|
|
||||||
|
chat --> embeddings
|
||||||
|
chat --> reranker
|
||||||
|
chat --> vllm
|
||||||
|
|
||||||
|
voice --> whisper
|
||||||
|
voice --> embeddings
|
||||||
|
voice --> reranker
|
||||||
|
voice --> vllm
|
||||||
|
voice --> xtts
|
||||||
|
|
||||||
|
classDef nvidia fill:#76B900,color:white
|
||||||
|
classDef amd fill:#ED1C24,color:white
|
||||||
|
classDef intel fill:#0071C5,color:white
|
||||||
|
|
||||||
|
class whisper,xtts nvidia
|
||||||
|
class vllm,embeddings amd
|
||||||
|
class reranker intel
|
||||||
287
specs/BINARY_MESSAGES_AND_JETSTREAM.md
Normal file
287
specs/BINARY_MESSAGES_AND_JETSTREAM.md
Normal file
@@ -0,0 +1,287 @@
|
|||||||
|
# Binary Messages and JetStream Configuration
|
||||||
|
|
||||||
|
> Technical specification for NATS message handling in the AI platform
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The AI platform uses NATS with JetStream for message persistence. All messages use MessagePack (msgpack) binary format for efficiency, especially when handling audio data.
|
||||||
|
|
||||||
|
## Message Format
|
||||||
|
|
||||||
|
### Why MessagePack?
|
||||||
|
|
||||||
|
1. **Binary efficiency**: Audio data embedded directly without base64 overhead
|
||||||
|
2. **Compact**: 20-50% smaller than equivalent JSON
|
||||||
|
3. **Fast**: Lower serialization/deserialization overhead
|
||||||
|
4. **Compatible**: JSON-like structure, easy debugging
|
||||||
|
|
||||||
|
### Schema
|
||||||
|
|
||||||
|
All messages follow this general structure:
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"request_id": str, # UUID for correlation
|
||||||
|
"user_id": str, # User identifier
|
||||||
|
"timestamp": float, # Unix timestamp
|
||||||
|
"payload": Any, # Type-specific data
|
||||||
|
"metadata": dict # Optional metadata
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Chat Message
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"request_id": "uuid-here",
|
||||||
|
"user_id": "user-123",
|
||||||
|
"username": "john_doe",
|
||||||
|
"message": "Hello, how are you?",
|
||||||
|
"premium": False,
|
||||||
|
"enable_streaming": True,
|
||||||
|
"enable_rag": True,
|
||||||
|
"enable_reranker": True,
|
||||||
|
"top_k": 5,
|
||||||
|
"session_id": "session-abc"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Voice Message
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"request_id": "uuid-here",
|
||||||
|
"user_id": "user-123",
|
||||||
|
"audio": b"...", # Raw bytes, not base64!
|
||||||
|
"format": "wav",
|
||||||
|
"sample_rate": 16000,
|
||||||
|
"premium": False,
|
||||||
|
"enable_rag": True,
|
||||||
|
"language": "en"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Streaming Response Chunk
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"request_id": "uuid-here",
|
||||||
|
"type": "chunk", # "chunk", "done", "error"
|
||||||
|
"content": "token",
|
||||||
|
"done": False,
|
||||||
|
"timestamp": 1706000000.0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## JetStream Configuration
|
||||||
|
|
||||||
|
### Streams
|
||||||
|
|
||||||
|
| Stream | Subjects | Retention | Max Age | Storage | Replicas |
|
||||||
|
|--------|----------|-----------|---------|---------|----------|
|
||||||
|
| `COMPANIONS_LOGINS` | `ai.chat.user.*.login` | Limits | 7 days | File | 1 |
|
||||||
|
| `COMPANIONS_CHAT` | `ai.chat.user.*.message`, `ai.chat.user.*.greeting.*` | Limits | 30 days | File | 1 |
|
||||||
|
| `AI_CHAT_STREAM` | `ai.chat.response.stream.>` | Limits | 5 min | Memory | 1 |
|
||||||
|
| `AI_VOICE_STREAM` | `ai.voice.>` | Limits | 1 hour | File | 1 |
|
||||||
|
| `AI_VOICE_RESPONSE_STREAM` | `ai.voice.response.stream.>` | Limits | 5 min | Memory | 1 |
|
||||||
|
| `AI_PIPELINE` | `ai.pipeline.>` | Limits | 24 hours | File | 1 |
|
||||||
|
|
||||||
|
### Consumer Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Durable consumer for chat handler
|
||||||
|
consumer:
|
||||||
|
name: chat-handler
|
||||||
|
durable_name: chat-handler
|
||||||
|
filter_subjects:
|
||||||
|
- "ai.chat.user.*.message"
|
||||||
|
ack_policy: explicit
|
||||||
|
ack_wait: 30s
|
||||||
|
max_deliver: 3
|
||||||
|
deliver_policy: new
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stream Creation (CLI)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create chat stream
|
||||||
|
nats stream add COMPANIONS_CHAT \
|
||||||
|
--subjects "ai.chat.user.*.message,ai.chat.user.*.greeting.*" \
|
||||||
|
--retention limits \
|
||||||
|
--max-age 30d \
|
||||||
|
--storage file \
|
||||||
|
--replicas 1
|
||||||
|
|
||||||
|
# Create ephemeral stream
|
||||||
|
nats stream add AI_CHAT_STREAM \
|
||||||
|
--subjects "ai.chat.response.stream.>" \
|
||||||
|
--retention limits \
|
||||||
|
--max-age 5m \
|
||||||
|
--storage memory \
|
||||||
|
--replicas 1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Python Implementation
|
||||||
|
|
||||||
|
### Publisher
|
||||||
|
|
||||||
|
```python
|
||||||
|
import nats
|
||||||
|
import msgpack
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
async def publish_chat_message(nc: nats.NATS, user_id: str, message: str):
|
||||||
|
data = {
|
||||||
|
"request_id": str(uuid.uuid4()),
|
||||||
|
"user_id": user_id,
|
||||||
|
"message": message,
|
||||||
|
"timestamp": datetime.utcnow().timestamp(),
|
||||||
|
"enable_streaming": True,
|
||||||
|
"enable_rag": True,
|
||||||
|
}
|
||||||
|
|
||||||
|
subject = f"ai.chat.user.{user_id}.message"
|
||||||
|
await nc.publish(subject, msgpack.packb(data))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Subscriber (JetStream)
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def message_handler(msg):
|
||||||
|
try:
|
||||||
|
data = msgpack.unpackb(msg.data, raw=False)
|
||||||
|
|
||||||
|
# Process message
|
||||||
|
result = await process_chat(data)
|
||||||
|
|
||||||
|
# Publish response
|
||||||
|
response_subject = f"ai.chat.response.{data['request_id']}"
|
||||||
|
await nc.publish(response_subject, msgpack.packb(result))
|
||||||
|
|
||||||
|
# Acknowledge
|
||||||
|
await msg.ack()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Handler error: {e}")
|
||||||
|
await msg.nak(delay=5) # Retry after 5s
|
||||||
|
|
||||||
|
# Subscribe with JetStream
|
||||||
|
js = nc.jetstream()
|
||||||
|
sub = await js.subscribe(
|
||||||
|
"ai.chat.user.*.message",
|
||||||
|
cb=message_handler,
|
||||||
|
durable="chat-handler",
|
||||||
|
manual_ack=True
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Streaming Response
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def stream_response(nc, request_id: str, response_generator):
|
||||||
|
subject = f"ai.chat.response.stream.{request_id}"
|
||||||
|
|
||||||
|
async for token in response_generator:
|
||||||
|
chunk = {
|
||||||
|
"request_id": request_id,
|
||||||
|
"type": "chunk",
|
||||||
|
"content": token,
|
||||||
|
"done": False
|
||||||
|
}
|
||||||
|
await nc.publish(subject, msgpack.packb(chunk))
|
||||||
|
|
||||||
|
# Send done marker
|
||||||
|
done = {
|
||||||
|
"request_id": request_id,
|
||||||
|
"type": "done",
|
||||||
|
"content": "",
|
||||||
|
"done": True
|
||||||
|
}
|
||||||
|
await nc.publish(subject, msgpack.packb(done))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Go Implementation
|
||||||
|
|
||||||
|
### Publisher
|
||||||
|
|
||||||
|
```go
|
||||||
|
import (
|
||||||
|
"github.com/nats-io/nats.go"
|
||||||
|
"github.com/vmihailenco/msgpack/v5"
|
||||||
|
)
|
||||||
|
|
||||||
|
type ChatMessage struct {
|
||||||
|
RequestID string `msgpack:"request_id"`
|
||||||
|
UserID string `msgpack:"user_id"`
|
||||||
|
Message string `msgpack:"message"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func PublishChat(nc *nats.Conn, userID, message string) error {
|
||||||
|
msg := ChatMessage{
|
||||||
|
RequestID: uuid.New().String(),
|
||||||
|
UserID: userID,
|
||||||
|
Message: message,
|
||||||
|
}
|
||||||
|
|
||||||
|
data, err := msgpack.Marshal(msg)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
subject := fmt.Sprintf("ai.chat.user.%s.message", userID)
|
||||||
|
return nc.Publish(subject, data)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
### NAK with Delay
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Temporary failure - retry later
|
||||||
|
await msg.nak(delay=5) # 5 second delay
|
||||||
|
|
||||||
|
# Permanent failure - move to dead letter
|
||||||
|
if attempt >= max_retries:
|
||||||
|
await nc.publish("ai.dlq.chat", msg.data)
|
||||||
|
await msg.term() # Terminate delivery
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dead Letter Queue
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
stream:
|
||||||
|
name: AI_DLQ
|
||||||
|
subjects:
|
||||||
|
- "ai.dlq.>"
|
||||||
|
retention: limits
|
||||||
|
max_age: 7d
|
||||||
|
storage: file
|
||||||
|
```
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
### Key Metrics
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stream info
|
||||||
|
nats stream info COMPANIONS_CHAT
|
||||||
|
|
||||||
|
# Consumer info
|
||||||
|
nats consumer info COMPANIONS_CHAT chat-handler
|
||||||
|
|
||||||
|
# Message rate
|
||||||
|
nats stream report
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prometheus Metrics
|
||||||
|
|
||||||
|
- `nats_stream_messages_total`
|
||||||
|
- `nats_consumer_pending_messages`
|
||||||
|
- `nats_consumer_ack_pending`
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [ADR-0003: Use NATS for Messaging](../decisions/0003-use-nats-for-messaging.md)
|
||||||
|
- [ADR-0004: Use MessagePack](../decisions/0004-use-messagepack-for-nats.md)
|
||||||
|
- [DOMAIN-MODEL.md](../DOMAIN-MODEL.md)
|
||||||
36
specs/README.md
Normal file
36
specs/README.md
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
# Specifications
|
||||||
|
|
||||||
|
This directory contains feature-level specifications and technical designs.
|
||||||
|
|
||||||
|
## Contents
|
||||||
|
|
||||||
|
- [BINARY_MESSAGES_AND_JETSTREAM.md](BINARY_MESSAGES_AND_JETSTREAM.md) - MessagePack format and JetStream configuration
|
||||||
|
- Future specs will be added here
|
||||||
|
|
||||||
|
## Spec Template
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Feature Name
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
Brief description of the feature
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
- Requirement 1
|
||||||
|
- Requirement 2
|
||||||
|
|
||||||
|
## Design
|
||||||
|
Technical design details
|
||||||
|
|
||||||
|
## API
|
||||||
|
Interface definitions
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
Key implementation considerations
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
Test strategy
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
Unresolved items
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user