homelab-design/AGENT-ONBOARDING.md

# 🤖 Agent Onboarding

> **This is the most important file for AI agents working on this codebase.**

## TL;DR

You are working on a **homelab Kubernetes cluster** running:
- **Talos Linux v1.12.1** on bare-metal nodes
- **Kubernetes v1.35.0** with Flux CD GitOps
- **AI/ML platform** with KServe, Kubeflow, Milvus, NATS
- **Multi-GPU** (AMD ROCm, NVIDIA CUDA, Intel Arc)

## 🗺️ Repository Map

| Repo | What It Contains | When to Edit |
|------|------------------|--------------|
| `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes |
| `homelab-design` (this) | Architecture docs, ADRs | Design decisions |
| `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes |

### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)

| Repo | Purpose |
|------|---------|
| `handler-base` | Shared Go module for NATS handlers (protobuf, health, OTel, clients) |
| `chat-handler` | Text chat with RAG pipeline (Go) |
| `voice-assistant` | Voice pipeline: STT → RAG → LLM → TTS (Go) |
| `kuberay-images` | GPU-specific Ray worker Docker images |
| `pipeline-bridge` | Bridge between pipelines and services (Go) |
| `stt-module` | Speech-to-text service (Go) |
| `tts-module` | Text-to-speech service (Go) |
| `ray-serve` | Ray Serve inference services |
| `argo` | Argo Workflows (training, batch inference) |
| `kubeflow` | Kubeflow Pipeline definitions |
| `mlflow` | MLflow integration utilities |
| `gradio-ui` | Gradio demo apps (embeddings, STT, TTS) |
| `ntfy-discord` | ntfy → Discord notification bridge |

## 🏗️ System Architecture (30-Second Version)

```
┌─────────────────────────────────────────────────────────────────┐
│                         USER INTERFACES                          │
│  Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI           │
└───────────────────────────┬─────────────────────────────────────┘
                            │ WebSocket/HTTP
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      NATS MESSAGE BUS                            │
│  Subjects: ai.chat.*, ai.voice.*, ai.pipeline.*                 │
│  Format: Protocol Buffers (binary, see ADR-0061)                │
└───────────────────────────┬─────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Chat Handler  │   │Voice Assistant│   │Pipeline Bridge│
│ (RAG+LLM)     │   │ (STT→LLM→TTS) │   │ (KFP/Argo)    │
└───────┬───────┘   └───────┬───────┘   └───────┬───────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                       AI SERVICES                                │
│  Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker         │
│    STT   │ TTS  │ LLM  │  RAG   │   Embed   │  Rank            │
└─────────────────────────────────────────────────────────────────┘
```

## 📁 Key File Locations

### Infrastructure (`homelab-k8s2`)

```
kubernetes/apps/
├── ai-ml/                    # 🧠 AI/ML services
│   ├── kserve/               #   InferenceServices
│   ├── kubeflow/             #   Pipelines, Training Operator
│   ├── milvus/               #   Vector database
│   ├── nats/                 #   Message bus
│   └── vllm/                 #   LLM inference
├── analytics/                # 📊 Spark, Flink, ClickHouse
├── observability/            # 📈 Grafana, Alloy, OpenTelemetry
└── security/                 # 🔒 Vault, Authentik, Falco

talos/
├── talconfig.yaml            # Node definitions
├── patches/                  # GPU-specific patches
│   ├── amd/amdgpu.yaml
│   └── nvidia/nvidia-runtime.yaml
```

### AI/ML Services (Gitea daviestechlabs org)

```
handler-base/                 # Shared Go module (NATS, health, OTel, protobuf)
├── clients/                  #   HTTP clients (LLM, STT, TTS, embeddings, reranker)
├── config/                   #   Env-based configuration (struct tags)
├── gen/messagespb/           #   Generated protobuf stubs
├── handler/                  #   Typed NATS message handler
├── health/                   #   HTTP health + readiness server
└── natsutil/                 #   NATS publish/request with protobuf

chat-handler/                 # RAG chat service (Go)
├── main.go
├── main_test.go
└── Dockerfile

voice-assistant/              # Voice pipeline service (Go)
├── main.go
├── main_test.go
└── Dockerfile

argo/                         # Argo WorkflowTemplates
├── batch-inference.yaml
├── qlora-training.yaml
└── document-ingestion.yaml

kubeflow/                     # Kubeflow Pipeline definitions
├── voice_pipeline.py
├── document_ingestion_pipeline.py
└── evaluation_pipeline.py

kuberay-images/               # GPU worker images
├── dockerfiles/
│   ├── Dockerfile.ray-worker-nvidia
│   ├── Dockerfile.ray-worker-strixhalo
│   └── Dockerfile.ray-worker-rdna2
└── ray-serve/                #   Serve modules
```

## 🔌 Service Endpoints (Internal)

```go
// Copy-paste ready for Go handler services
const (
    NATSUrl        = "nats://nats.ai-ml.svc.cluster.local:4222"
    VLLMUrl        = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
    WhisperUrl     = "http://whisper-predictor.ai-ml.svc.cluster.local"
    TTSUrl         = "http://tts-predictor.ai-ml.svc.cluster.local"
    EmbeddingsUrl  = "http://embeddings-predictor.ai-ml.svc.cluster.local"
    RerankerUrl    = "http://reranker-predictor.ai-ml.svc.cluster.local"
    MilvusHost     = "milvus.ai-ml.svc.cluster.local"
    MilvusPort     = 19530
    ValkeyUrl      = "redis://valkey.ai-ml.svc.cluster.local:6379"
)
```

```python
# For Python services (Ray Serve, Kubeflow pipelines, Gradio UIs)
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
MILVUS_PORT = 19530
VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"
```

## 📨 NATS Subject Patterns

```python
# Chat
f"ai.chat.user.{user_id}.message"      # User sends message
f"ai.chat.response.{request_id}"       # Response back
f"ai.chat.response.stream.{request_id}" # Streaming tokens

# Voice
f"ai.voice.user.{user_id}.request"     # Voice input
f"ai.voice.response.{request_id}"      # Voice output

# Pipelines
"ai.pipeline.trigger"                   # Trigger any pipeline
f"ai.pipeline.status.{request_id}"     # Status updates
```

## 🎮 GPU Allocation

| Node | GPU | Workload | Memory |
|------|-----|----------|--------|
| khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified |
| elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM |
| drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM |
| danilo | Intel Arc | Reranker | 16GB shared |

## ⚡ Common Tasks

### Deploy a New AI Service

1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/`
2. Push to main → Flux deploys automatically

### Add a New NATS Handler

1. Create Go handler repo using `handler-base` module (see [ADR-0061](decisions/0061-go-handler-refactor.md))
2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
3. Push to main → Flux deploys automatically

### Add a New Argo Workflow

1. Add WorkflowTemplate to `argo/` repo
2. Push to main → Gitea syncs to cluster

### Add a New Kubeflow Pipeline

1. Add pipeline .py to `kubeflow/` repo
2. Compile with `python pipeline.py`
3. Upload YAML to Kubeflow UI

### Create Architecture Decision

1. Copy `decisions/0000-template.md` to `decisions/NNNN-title.md`
2. Fill in context, decision, consequences
3. Submit PR

## ❌ Antipatterns to Avoid

1. **Don't hardcode secrets** - Use External Secrets Operator
2. **Don't use `latest` tags** - Pin versions for reproducibility
3. **Don't skip ADRs** - Document significant decisions
4. **Don't bypass Flux** - All changes via Git, never `kubectl apply` directly

## 📚 Where to Learn More

- [ARCHITECTURE.md](ARCHITECTURE.md) - Full system design
- [TECH-STACK.md](TECH-STACK.md) - All technologies used
- [decisions/](decisions/) - Why we made certain choices
- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities

## 🆘 Quick Debugging

```bash
# Check Flux sync status
flux get all -A

# View NATS JetStream streams
kubectl exec -n ai-ml deploy/nats-box -- nats stream ls

# Check GPU allocation
kubectl describe node khelben | grep -A10 "Allocated"

# View KServe inference services
kubectl get inferenceservices -n ai-ml

# Tail AI service logs
kubectl logs -n ai-ml -l app=chat-handler -f
```

---

*This document is the canonical starting point for AI agents. When in doubt, check the ADRs.*