Files
homelab-design/AGENT-ONBOARDING.md
Billy D. 100ba21eba
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m2s
updates to adrs and fixing to reflect go refactor.
2026-02-23 06:14:30 -05:00

254 lines
10 KiB
Markdown

# 🤖 Agent Onboarding
> **This is the most important file for AI agents working on this codebase.**
## TL;DR
You are working on a **homelab Kubernetes cluster** running:
- **Talos Linux v1.12.1** on bare-metal nodes
- **Kubernetes v1.35.0** with Flux CD GitOps
- **AI/ML platform** with KServe, Kubeflow, Milvus, NATS
- **Multi-GPU** (AMD ROCm, NVIDIA CUDA, Intel Arc)
## 🗺️ Repository Map
| Repo | What It Contains | When to Edit |
|------|------------------|--------------|
| `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes |
| `homelab-design` (this) | Architecture docs, ADRs | Design decisions |
| `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes |
### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
| Repo | Purpose |
|------|---------|
| `handler-base` | Shared Go module for NATS handlers (protobuf, health, OTel, clients) |
| `chat-handler` | Text chat with RAG pipeline (Go) |
| `voice-assistant` | Voice pipeline: STT → RAG → LLM → TTS (Go) |
| `kuberay-images` | GPU-specific Ray worker Docker images |
| `pipeline-bridge` | Bridge between pipelines and services (Go) |
| `stt-module` | Speech-to-text service (Go) |
| `tts-module` | Text-to-speech service (Go) |
| `ray-serve` | Ray Serve inference services |
| `argo` | Argo Workflows (training, batch inference) |
| `kubeflow` | Kubeflow Pipeline definitions |
| `mlflow` | MLflow integration utilities |
| `gradio-ui` | Gradio demo apps (embeddings, STT, TTS) |
| `ntfy-discord` | ntfy → Discord notification bridge |
## 🏗️ System Architecture (30-Second Version)
```
┌─────────────────────────────────────────────────────────────────┐
│ USER INTERFACES │
│ Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI │
└───────────────────────────┬─────────────────────────────────────┘
│ WebSocket/HTTP
┌─────────────────────────────────────────────────────────────────┐
│ NATS MESSAGE BUS │
│ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │
│ Format: Protocol Buffers (binary, see ADR-0061) │
└───────────────────────────┬─────────────────────────────────────┘
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Chat Handler │ │Voice Assistant│ │Pipeline Bridge│
│ (RAG+LLM) │ │ (STT→LLM→TTS) │ │ (KFP/Argo) │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└───────────────────┼───────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ AI SERVICES │
│ Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker │
│ STT │ TTS │ LLM │ RAG │ Embed │ Rank │
└─────────────────────────────────────────────────────────────────┘
```
## 📁 Key File Locations
### Infrastructure (`homelab-k8s2`)
```
kubernetes/apps/
├── ai-ml/ # 🧠 AI/ML services
│ ├── kserve/ # InferenceServices
│ ├── kubeflow/ # Pipelines, Training Operator
│ ├── milvus/ # Vector database
│ ├── nats/ # Message bus
│ └── vllm/ # LLM inference
├── analytics/ # 📊 Spark, Flink, ClickHouse
├── observability/ # 📈 Grafana, Alloy, OpenTelemetry
└── security/ # 🔒 Vault, Authentik, Falco
talos/
├── talconfig.yaml # Node definitions
├── patches/ # GPU-specific patches
│ ├── amd/amdgpu.yaml
│ └── nvidia/nvidia-runtime.yaml
```
### AI/ML Services (Gitea daviestechlabs org)
```
handler-base/ # Shared Go module (NATS, health, OTel, protobuf)
├── clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker)
├── config/ # Env-based configuration (struct tags)
├── gen/messagespb/ # Generated protobuf stubs
├── handler/ # Typed NATS message handler
├── health/ # HTTP health + readiness server
└── natsutil/ # NATS publish/request with protobuf
chat-handler/ # RAG chat service (Go)
├── main.go
├── main_test.go
└── Dockerfile
voice-assistant/ # Voice pipeline service (Go)
├── main.go
├── main_test.go
└── Dockerfile
argo/ # Argo WorkflowTemplates
├── batch-inference.yaml
├── qlora-training.yaml
└── document-ingestion.yaml
kubeflow/ # Kubeflow Pipeline definitions
├── voice_pipeline.py
├── document_ingestion_pipeline.py
└── evaluation_pipeline.py
kuberay-images/ # GPU worker images
├── dockerfiles/
│ ├── Dockerfile.ray-worker-nvidia
│ ├── Dockerfile.ray-worker-strixhalo
│ └── Dockerfile.ray-worker-rdna2
└── ray-serve/ # Serve modules
```
## 🔌 Service Endpoints (Internal)
```go
// Copy-paste ready for Go handler services
const (
NATSUrl = "nats://nats.ai-ml.svc.cluster.local:4222"
VLLMUrl = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
WhisperUrl = "http://whisper-predictor.ai-ml.svc.cluster.local"
TTSUrl = "http://tts-predictor.ai-ml.svc.cluster.local"
EmbeddingsUrl = "http://embeddings-predictor.ai-ml.svc.cluster.local"
RerankerUrl = "http://reranker-predictor.ai-ml.svc.cluster.local"
MilvusHost = "milvus.ai-ml.svc.cluster.local"
MilvusPort = 19530
ValkeyUrl = "redis://valkey.ai-ml.svc.cluster.local:6379"
)
```
```python
# For Python services (Ray Serve, Kubeflow pipelines, Gradio UIs)
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
MILVUS_PORT = 19530
VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"
```
## 📨 NATS Subject Patterns
```python
# Chat
f"ai.chat.user.{user_id}.message" # User sends message
f"ai.chat.response.{request_id}" # Response back
f"ai.chat.response.stream.{request_id}" # Streaming tokens
# Voice
f"ai.voice.user.{user_id}.request" # Voice input
f"ai.voice.response.{request_id}" # Voice output
# Pipelines
"ai.pipeline.trigger" # Trigger any pipeline
f"ai.pipeline.status.{request_id}" # Status updates
```
## 🎮 GPU Allocation
| Node | GPU | Workload | Memory |
|------|-----|----------|--------|
| khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified |
| elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM |
| drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM |
| danilo | Intel Arc | Reranker | 16GB shared |
## ⚡ Common Tasks
### Deploy a New AI Service
1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/`
2. Push to main → Flux deploys automatically
### Add a New NATS Handler
1. Create Go handler repo using `handler-base` module (see [ADR-0061](decisions/0061-go-handler-refactor.md))
2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
3. Push to main → Flux deploys automatically
### Add a New Argo Workflow
1. Add WorkflowTemplate to `argo/` repo
2. Push to main → Gitea syncs to cluster
### Add a New Kubeflow Pipeline
1. Add pipeline .py to `kubeflow/` repo
2. Compile with `python pipeline.py`
3. Upload YAML to Kubeflow UI
### Create Architecture Decision
1. Copy `decisions/0000-template.md` to `decisions/NNNN-title.md`
2. Fill in context, decision, consequences
3. Submit PR
## ❌ Antipatterns to Avoid
1. **Don't hardcode secrets** - Use External Secrets Operator
2. **Don't use `latest` tags** - Pin versions for reproducibility
3. **Don't skip ADRs** - Document significant decisions
4. **Don't bypass Flux** - All changes via Git, never `kubectl apply` directly
## 📚 Where to Learn More
- [ARCHITECTURE.md](ARCHITECTURE.md) - Full system design
- [TECH-STACK.md](TECH-STACK.md) - All technologies used
- [decisions/](decisions/) - Why we made certain choices
- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities
## 🆘 Quick Debugging
```bash
# Check Flux sync status
flux get all -A
# View NATS JetStream streams
kubectl exec -n ai-ml deploy/nats-box -- nats stream ls
# Check GPU allocation
kubectl describe node khelben | grep -A10 "Allocated"
# View KServe inference services
kubectl get inferenceservices -n ai-ml
# Tail AI service logs
kubectl logs -n ai-ml -l app=chat-handler -f
```
---
*This document is the canonical starting point for AI agents. When in doubt, check the ADRs.*