All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller) - ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord) - Replace llm-workflows GitHub links with Gitea daviestechlabs org repos - Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos - Update ADR-0006: fix multi-repo reference - Update ADR-0009: fix broken llm-workflows link - Update ADR-0024: mark ray-serve repo as created, update historical context - Update README: fix ADR-0016 status, add 0038/0039 to table, update badges
9.5 KiB
9.5 KiB
🤖 Agent Onboarding
This is the most important file for AI agents working on this codebase.
TL;DR
You are working on a homelab Kubernetes cluster running:
- Talos Linux v1.12.1 on bare-metal nodes
- Kubernetes v1.35.0 with Flux CD GitOps
- AI/ML platform with KServe, Kubeflow, Milvus, NATS
- Multi-GPU (AMD ROCm, NVIDIA CUDA, Intel Arc)
🗺️ Repository Map
| Repo | What It Contains | When to Edit |
|---|---|---|
homelab-k8s2 |
Kubernetes manifests, Talos config, Flux | Infrastructure changes |
homelab-design (this) |
Architecture docs, ADRs | Design decisions |
companions-frontend |
Go server, HTMX UI, VRM avatars | Frontend changes |
AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
| Repo | Purpose |
|---|---|
handler-base |
Shared Python library for NATS handlers |
chat-handler |
Text chat with RAG pipeline |
voice-assistant |
Voice pipeline (STT → RAG → LLM → TTS) |
kuberay-images |
GPU-specific Ray worker Docker images |
pipeline-bridge |
Bridge between pipelines and services |
stt-module |
Speech-to-text service |
tts-module |
Text-to-speech service |
ray-serve |
Ray Serve inference services |
argo |
Argo Workflows (training, batch inference) |
kubeflow |
Kubeflow Pipeline definitions |
mlflow |
MLflow integration utilities |
gradio-ui |
Gradio demo apps (embeddings, STT, TTS) |
ntfy-discord |
ntfy → Discord notification bridge |
🏗️ System Architecture (30-Second Version)
┌─────────────────────────────────────────────────────────────────┐
│ USER INTERFACES │
│ Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI │
└───────────────────────────┬─────────────────────────────────────┘
│ WebSocket/HTTP
▼
┌─────────────────────────────────────────────────────────────────┐
│ NATS MESSAGE BUS │
│ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │
│ Format: MessagePack (binary) │
└───────────────────────────┬─────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Chat Handler │ │Voice Assistant│ │Pipeline Bridge│
│ (RAG+LLM) │ │ (STT→LLM→TTS) │ │ (KFP/Argo) │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└───────────────────┼───────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ AI SERVICES │
│ Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker │
│ STT │ TTS │ LLM │ RAG │ Embed │ Rank │
└─────────────────────────────────────────────────────────────────┘
📁 Key File Locations
Infrastructure (homelab-k8s2)
kubernetes/apps/
├── ai-ml/ # 🧠 AI/ML services
│ ├── kserve/ # InferenceServices
│ ├── kubeflow/ # Pipelines, Training Operator
│ ├── milvus/ # Vector database
│ ├── nats/ # Message bus
│ └── vllm/ # LLM inference
├── analytics/ # 📊 Spark, Flink, ClickHouse
├── observability/ # 📈 Grafana, Alloy, OpenTelemetry
└── security/ # 🔒 Vault, Authentik, Falco
talos/
├── talconfig.yaml # Node definitions
├── patches/ # GPU-specific patches
│ ├── amd/amdgpu.yaml
│ └── nvidia/nvidia-runtime.yaml
AI/ML Services (Gitea daviestechlabs org)
handler-base/ # Shared handler library
├── handler_base/ # Core classes
│ ├── handler.py # Base Handler class
│ ├── nats_client.py # NATS wrapper
│ └── clients/ # Service clients (STT, TTS, LLM, etc.)
chat-handler/ # RAG chat service
├── chat_handler_v2.py # Handler-base version
└── Dockerfile.v2
voice-assistant/ # Voice pipeline service
├── voice_assistant_v2.py # Handler-base version
└── pipelines/voice_pipeline.py
argo/ # Argo WorkflowTemplates
├── batch-inference.yaml
├── qlora-training.yaml
└── document-ingestion.yaml
kubeflow/ # Kubeflow Pipeline definitions
├── voice_pipeline.py
├── document_ingestion_pipeline.py
└── evaluation_pipeline.py
kuberay-images/ # GPU worker images
├── dockerfiles/
│ ├── Dockerfile.ray-worker-nvidia
│ ├── Dockerfile.ray-worker-strixhalo
│ └── Dockerfile.ray-worker-rdna2
└── ray-serve/ # Serve modules
🔌 Service Endpoints (Internal)
# Copy-paste ready for Python code
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
MILVUS_PORT = 19530
VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"
📨 NATS Subject Patterns
# Chat
f"ai.chat.user.{user_id}.message" # User sends message
f"ai.chat.response.{request_id}" # Response back
f"ai.chat.response.stream.{request_id}" # Streaming tokens
# Voice
f"ai.voice.user.{user_id}.request" # Voice input
f"ai.voice.response.{request_id}" # Voice output
# Pipelines
"ai.pipeline.trigger" # Trigger any pipeline
f"ai.pipeline.status.{request_id}" # Status updates
🎮 GPU Allocation
| Node | GPU | Workload | Memory |
|---|---|---|---|
| khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified |
| elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM |
| drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM |
| danilo | Intel Arc | Reranker | 16GB shared |
⚡ Common Tasks
Deploy a New AI Service
- Create InferenceService in
homelab-k8s2/kubernetes/apps/ai-ml/kserve/ - Push to main → Flux deploys automatically
Add a New NATS Handler
- Create handler repo or add to existing (use
handler-baselibrary) - Add K8s Deployment in
homelab-k8s2/kubernetes/apps/ai-ml/ - Push to main → Flux deploys automatically
Add a New Argo Workflow
- Add WorkflowTemplate to
argo/repo - Push to main → Gitea syncs to cluster
Add a New Kubeflow Pipeline
- Add pipeline .py to
kubeflow/repo - Compile with
python pipeline.py - Upload YAML to Kubeflow UI
Create Architecture Decision
- Copy
decisions/0000-template.mdtodecisions/NNNN-title.md - Fill in context, decision, consequences
- Submit PR
❌ Antipatterns to Avoid
- Don't hardcode secrets - Use External Secrets Operator
- Don't use
latesttags - Pin versions for reproducibility - Don't skip ADRs - Document significant decisions
- Don't bypass Flux - All changes via Git, never
kubectl applydirectly
📚 Where to Learn More
- ARCHITECTURE.md - Full system design
- TECH-STACK.md - All technologies used
- decisions/ - Why we made certain choices
- DOMAIN-MODEL.md - Core entities
🆘 Quick Debugging
# Check Flux sync status
flux get all -A
# View NATS JetStream streams
kubectl exec -n ai-ml deploy/nats-box -- nats stream ls
# Check GPU allocation
kubectl describe node khelben | grep -A10 "Allocated"
# View KServe inference services
kubectl get inferenceservices -n ai-ml
# Tail AI service logs
kubectl logs -n ai-ml -l app=chat-handler -f
This document is the canonical starting point for AI agents. When in doubt, check the ADRs.