All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller) - ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord) - Replace llm-workflows GitHub links with Gitea daviestechlabs org repos - Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos - Update ADR-0006: fix multi-repo reference - Update ADR-0009: fix broken llm-workflows link - Update ADR-0024: mark ray-serve repo as created, update historical context - Update README: fix ADR-0016 status, add 0038/0039 to table, update badges
235 lines
9.5 KiB
Markdown
235 lines
9.5 KiB
Markdown
# 🤖 Agent Onboarding
|
|
|
|
> **This is the most important file for AI agents working on this codebase.**
|
|
|
|
## TL;DR
|
|
|
|
You are working on a **homelab Kubernetes cluster** running:
|
|
- **Talos Linux v1.12.1** on bare-metal nodes
|
|
- **Kubernetes v1.35.0** with Flux CD GitOps
|
|
- **AI/ML platform** with KServe, Kubeflow, Milvus, NATS
|
|
- **Multi-GPU** (AMD ROCm, NVIDIA CUDA, Intel Arc)
|
|
|
|
## 🗺️ Repository Map
|
|
|
|
| Repo | What It Contains | When to Edit |
|
|
|------|------------------|--------------|
|
|
| `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes |
|
|
| `homelab-design` (this) | Architecture docs, ADRs | Design decisions |
|
|
| `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes |
|
|
|
|
### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
|
|
|
|
| Repo | Purpose |
|
|
|------|---------|
|
|
| `handler-base` | Shared Python library for NATS handlers |
|
|
| `chat-handler` | Text chat with RAG pipeline |
|
|
| `voice-assistant` | Voice pipeline (STT → RAG → LLM → TTS) |
|
|
| `kuberay-images` | GPU-specific Ray worker Docker images |
|
|
| `pipeline-bridge` | Bridge between pipelines and services |
|
|
| `stt-module` | Speech-to-text service |
|
|
| `tts-module` | Text-to-speech service |
|
|
| `ray-serve` | Ray Serve inference services |
|
|
| `argo` | Argo Workflows (training, batch inference) |
|
|
| `kubeflow` | Kubeflow Pipeline definitions |
|
|
| `mlflow` | MLflow integration utilities |
|
|
| `gradio-ui` | Gradio demo apps (embeddings, STT, TTS) |
|
|
| `ntfy-discord` | ntfy → Discord notification bridge |
|
|
|
|
## 🏗️ System Architecture (30-Second Version)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ USER INTERFACES │
|
|
│ Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI │
|
|
└───────────────────────────┬─────────────────────────────────────┘
|
|
│ WebSocket/HTTP
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ NATS MESSAGE BUS │
|
|
│ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │
|
|
│ Format: MessagePack (binary) │
|
|
└───────────────────────────┬─────────────────────────────────────┘
|
|
│
|
|
┌───────────────────┼───────────────────┐
|
|
▼ ▼ ▼
|
|
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
|
│ Chat Handler │ │Voice Assistant│ │Pipeline Bridge│
|
|
│ (RAG+LLM) │ │ (STT→LLM→TTS) │ │ (KFP/Argo) │
|
|
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
|
|
│ │ │
|
|
└───────────────────┼───────────────────┘
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ AI SERVICES │
|
|
│ Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker │
|
|
│ STT │ TTS │ LLM │ RAG │ Embed │ Rank │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## 📁 Key File Locations
|
|
|
|
### Infrastructure (`homelab-k8s2`)
|
|
|
|
```
|
|
kubernetes/apps/
|
|
├── ai-ml/ # 🧠 AI/ML services
|
|
│ ├── kserve/ # InferenceServices
|
|
│ ├── kubeflow/ # Pipelines, Training Operator
|
|
│ ├── milvus/ # Vector database
|
|
│ ├── nats/ # Message bus
|
|
│ └── vllm/ # LLM inference
|
|
├── analytics/ # 📊 Spark, Flink, ClickHouse
|
|
├── observability/ # 📈 Grafana, Alloy, OpenTelemetry
|
|
└── security/ # 🔒 Vault, Authentik, Falco
|
|
|
|
talos/
|
|
├── talconfig.yaml # Node definitions
|
|
├── patches/ # GPU-specific patches
|
|
│ ├── amd/amdgpu.yaml
|
|
│ └── nvidia/nvidia-runtime.yaml
|
|
```
|
|
|
|
### AI/ML Services (Gitea daviestechlabs org)
|
|
|
|
```
|
|
handler-base/ # Shared handler library
|
|
├── handler_base/ # Core classes
|
|
│ ├── handler.py # Base Handler class
|
|
│ ├── nats_client.py # NATS wrapper
|
|
│ └── clients/ # Service clients (STT, TTS, LLM, etc.)
|
|
|
|
chat-handler/ # RAG chat service
|
|
├── chat_handler_v2.py # Handler-base version
|
|
└── Dockerfile.v2
|
|
|
|
voice-assistant/ # Voice pipeline service
|
|
├── voice_assistant_v2.py # Handler-base version
|
|
└── pipelines/voice_pipeline.py
|
|
|
|
argo/ # Argo WorkflowTemplates
|
|
├── batch-inference.yaml
|
|
├── qlora-training.yaml
|
|
└── document-ingestion.yaml
|
|
|
|
kubeflow/ # Kubeflow Pipeline definitions
|
|
├── voice_pipeline.py
|
|
├── document_ingestion_pipeline.py
|
|
└── evaluation_pipeline.py
|
|
|
|
kuberay-images/ # GPU worker images
|
|
├── dockerfiles/
|
|
│ ├── Dockerfile.ray-worker-nvidia
|
|
│ ├── Dockerfile.ray-worker-strixhalo
|
|
│ └── Dockerfile.ray-worker-rdna2
|
|
└── ray-serve/ # Serve modules
|
|
```
|
|
|
|
## 🔌 Service Endpoints (Internal)
|
|
|
|
```python
|
|
# Copy-paste ready for Python code
|
|
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
|
|
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
|
|
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
|
|
TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
|
|
EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
|
|
RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
|
|
MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
|
|
MILVUS_PORT = 19530
|
|
VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"
|
|
```
|
|
|
|
## 📨 NATS Subject Patterns
|
|
|
|
```python
|
|
# Chat
|
|
f"ai.chat.user.{user_id}.message" # User sends message
|
|
f"ai.chat.response.{request_id}" # Response back
|
|
f"ai.chat.response.stream.{request_id}" # Streaming tokens
|
|
|
|
# Voice
|
|
f"ai.voice.user.{user_id}.request" # Voice input
|
|
f"ai.voice.response.{request_id}" # Voice output
|
|
|
|
# Pipelines
|
|
"ai.pipeline.trigger" # Trigger any pipeline
|
|
f"ai.pipeline.status.{request_id}" # Status updates
|
|
```
|
|
|
|
## 🎮 GPU Allocation
|
|
|
|
| Node | GPU | Workload | Memory |
|
|
|------|-----|----------|--------|
|
|
| khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified |
|
|
| elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM |
|
|
| drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM |
|
|
| danilo | Intel Arc | Reranker | 16GB shared |
|
|
|
|
## ⚡ Common Tasks
|
|
|
|
### Deploy a New AI Service
|
|
|
|
1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/`
|
|
2. Push to main → Flux deploys automatically
|
|
|
|
### Add a New NATS Handler
|
|
|
|
1. Create handler repo or add to existing (use `handler-base` library)
|
|
2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
|
|
3. Push to main → Flux deploys automatically
|
|
|
|
### Add a New Argo Workflow
|
|
|
|
1. Add WorkflowTemplate to `argo/` repo
|
|
2. Push to main → Gitea syncs to cluster
|
|
|
|
### Add a New Kubeflow Pipeline
|
|
|
|
1. Add pipeline .py to `kubeflow/` repo
|
|
2. Compile with `python pipeline.py`
|
|
3. Upload YAML to Kubeflow UI
|
|
|
|
### Create Architecture Decision
|
|
|
|
1. Copy `decisions/0000-template.md` to `decisions/NNNN-title.md`
|
|
2. Fill in context, decision, consequences
|
|
3. Submit PR
|
|
|
|
## ❌ Antipatterns to Avoid
|
|
|
|
1. **Don't hardcode secrets** - Use External Secrets Operator
|
|
2. **Don't use `latest` tags** - Pin versions for reproducibility
|
|
3. **Don't skip ADRs** - Document significant decisions
|
|
4. **Don't bypass Flux** - All changes via Git, never `kubectl apply` directly
|
|
|
|
## 📚 Where to Learn More
|
|
|
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - Full system design
|
|
- [TECH-STACK.md](TECH-STACK.md) - All technologies used
|
|
- [decisions/](decisions/) - Why we made certain choices
|
|
- [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities
|
|
|
|
## 🆘 Quick Debugging
|
|
|
|
```bash
|
|
# Check Flux sync status
|
|
flux get all -A
|
|
|
|
# View NATS JetStream streams
|
|
kubectl exec -n ai-ml deploy/nats-box -- nats stream ls
|
|
|
|
# Check GPU allocation
|
|
kubectl describe node khelben | grep -A10 "Allocated"
|
|
|
|
# View KServe inference services
|
|
kubectl get inferenceservices -n ai-ml
|
|
|
|
# Tail AI service logs
|
|
kubectl logs -n ai-ml -l app=chat-handler -f
|
|
```
|
|
|
|
---
|
|
|
|
*This document is the canonical starting point for AI agents. When in doubt, check the ADRs.*
|