# ๐Ÿค– Agent Onboarding > **This is the most important file for AI agents working on this codebase.** ## TL;DR You are working on a **homelab Kubernetes cluster** running: - **Talos Linux v1.12.1** on bare-metal nodes - **Kubernetes v1.35.0** with Flux CD GitOps - **AI/ML platform** with KServe, Kubeflow, Milvus, NATS - **Multi-GPU** (AMD ROCm, NVIDIA CUDA, Intel Arc) ## ๐Ÿ—บ๏ธ Repository Map | Repo | What It Contains | When to Edit | |------|------------------|--------------| | `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes | | `homelab-design` (this) | Architecture docs, ADRs | Design decisions | | `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes | ### AI/ML Repos (git.daviestechlabs.io/daviestechlabs) | Repo | Purpose | |------|---------| | `handler-base` | Shared Go module for NATS handlers (protobuf, health, OTel, clients) | | `chat-handler` | Text chat with RAG pipeline (Go) | | `voice-assistant` | Voice pipeline: STT โ†’ RAG โ†’ LLM โ†’ TTS (Go) | | `kuberay-images` | GPU-specific Ray worker Docker images | | `pipeline-bridge` | Bridge between pipelines and services (Go) | | `stt-module` | Speech-to-text service (Go) | | `tts-module` | Text-to-speech service (Go) | | `ray-serve` | Ray Serve inference services | | `argo` | Argo Workflows (training, batch inference) | | `kubeflow` | Kubeflow Pipeline definitions | | `mlflow` | MLflow integration utilities | | `gradio-ui` | Gradio demo apps (embeddings, STT, TTS) | | `ntfy-discord` | ntfy โ†’ Discord notification bridge | ## ๐Ÿ—๏ธ System Architecture (30-Second Version) ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ USER INTERFACES โ”‚ โ”‚ Companions WebApp โ”‚ Voice WebApp โ”‚ Kubeflow UI โ”‚ CLI โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ WebSocket/HTTP โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ NATS MESSAGE BUS โ”‚ โ”‚ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* โ”‚ โ”‚ Format: Protocol Buffers (binary, see ADR-0061) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ–ผ โ–ผ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Chat Handler โ”‚ โ”‚Voice Assistantโ”‚ โ”‚Pipeline Bridgeโ”‚ โ”‚ (RAG+LLM) โ”‚ โ”‚ (STTโ†’LLMโ†’TTS) โ”‚ โ”‚ (KFP/Argo) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ AI SERVICES โ”‚ โ”‚ Whisper โ”‚ XTTS โ”‚ vLLM โ”‚ Milvus โ”‚ BGE Embed โ”‚ Reranker โ”‚ โ”‚ STT โ”‚ TTS โ”‚ LLM โ”‚ RAG โ”‚ Embed โ”‚ Rank โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ## ๐Ÿ“ Key File Locations ### Infrastructure (`homelab-k8s2`) ``` kubernetes/apps/ โ”œโ”€โ”€ ai-ml/ # ๐Ÿง  AI/ML services โ”‚ โ”œโ”€โ”€ kserve/ # InferenceServices โ”‚ โ”œโ”€โ”€ kubeflow/ # Pipelines, Training Operator โ”‚ โ”œโ”€โ”€ milvus/ # Vector database โ”‚ โ”œโ”€โ”€ nats/ # Message bus โ”‚ โ””โ”€โ”€ vllm/ # LLM inference โ”œโ”€โ”€ analytics/ # ๐Ÿ“Š Spark, Flink, ClickHouse โ”œโ”€โ”€ observability/ # ๐Ÿ“ˆ Grafana, Alloy, OpenTelemetry โ””โ”€โ”€ security/ # ๐Ÿ”’ Vault, Authentik, Falco talos/ โ”œโ”€โ”€ talconfig.yaml # Node definitions โ”œโ”€โ”€ patches/ # GPU-specific patches โ”‚ โ”œโ”€โ”€ amd/amdgpu.yaml โ”‚ โ””โ”€โ”€ nvidia/nvidia-runtime.yaml ``` ### AI/ML Services (Gitea daviestechlabs org) ``` handler-base/ # Shared Go module (NATS, health, OTel, protobuf) โ”œโ”€โ”€ clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker) โ”œโ”€โ”€ config/ # Env-based configuration (struct tags) โ”œโ”€โ”€ gen/messagespb/ # Generated protobuf stubs โ”œโ”€โ”€ handler/ # Typed NATS message handler โ”œโ”€โ”€ health/ # HTTP health + readiness server โ””โ”€โ”€ natsutil/ # NATS publish/request with protobuf chat-handler/ # RAG chat service (Go) โ”œโ”€โ”€ main.go โ”œโ”€โ”€ main_test.go โ””โ”€โ”€ Dockerfile voice-assistant/ # Voice pipeline service (Go) โ”œโ”€โ”€ main.go โ”œโ”€โ”€ main_test.go โ””โ”€โ”€ Dockerfile argo/ # Argo WorkflowTemplates โ”œโ”€โ”€ batch-inference.yaml โ”œโ”€โ”€ qlora-training.yaml โ””โ”€โ”€ document-ingestion.yaml kubeflow/ # Kubeflow Pipeline definitions โ”œโ”€โ”€ voice_pipeline.py โ”œโ”€โ”€ document_ingestion_pipeline.py โ””โ”€โ”€ evaluation_pipeline.py kuberay-images/ # GPU worker images โ”œโ”€โ”€ dockerfiles/ โ”‚ โ”œโ”€โ”€ Dockerfile.ray-worker-nvidia โ”‚ โ”œโ”€โ”€ Dockerfile.ray-worker-strixhalo โ”‚ โ””โ”€โ”€ Dockerfile.ray-worker-rdna2 โ””โ”€โ”€ ray-serve/ # Serve modules ``` ## ๐Ÿ”Œ Service Endpoints (Internal) ```go // Copy-paste ready for Go handler services const ( NATSUrl = "nats://nats.ai-ml.svc.cluster.local:4222" VLLMUrl = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1" WhisperUrl = "http://whisper-predictor.ai-ml.svc.cluster.local" TTSUrl = "http://tts-predictor.ai-ml.svc.cluster.local" EmbeddingsUrl = "http://embeddings-predictor.ai-ml.svc.cluster.local" RerankerUrl = "http://reranker-predictor.ai-ml.svc.cluster.local" MilvusHost = "milvus.ai-ml.svc.cluster.local" MilvusPort = 19530 ValkeyUrl = "redis://valkey.ai-ml.svc.cluster.local:6379" ) ``` ```python # For Python services (Ray Serve, Kubeflow pipelines, Gradio UIs) NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222" VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1" WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local" TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local" EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local" RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local" MILVUS_HOST = "milvus.ai-ml.svc.cluster.local" MILVUS_PORT = 19530 VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379" ``` ## ๐Ÿ“จ NATS Subject Patterns ```python # Chat f"ai.chat.user.{user_id}.message" # User sends message f"ai.chat.response.{request_id}" # Response back f"ai.chat.response.stream.{request_id}" # Streaming tokens # Voice f"ai.voice.user.{user_id}.request" # Voice input f"ai.voice.response.{request_id}" # Voice output # Pipelines "ai.pipeline.trigger" # Trigger any pipeline f"ai.pipeline.status.{request_id}" # Status updates ``` ## ๐ŸŽฎ GPU Allocation | Node | GPU | Workload | Memory | |------|-----|----------|--------| | khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified | | elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM | | drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM | | danilo | Intel Arc | Reranker | 16GB shared | ## โšก Common Tasks ### Deploy a New AI Service 1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/` 2. Push to main โ†’ Flux deploys automatically ### Add a New NATS Handler 1. Create Go handler repo using `handler-base` module (see [ADR-0061](decisions/0061-go-handler-refactor.md)) 2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/` 3. Push to main โ†’ Flux deploys automatically ### Add a New Argo Workflow 1. Add WorkflowTemplate to `argo/` repo 2. Push to main โ†’ Gitea syncs to cluster ### Add a New Kubeflow Pipeline 1. Add pipeline .py to `kubeflow/` repo 2. Compile with `python pipeline.py` 3. Upload YAML to Kubeflow UI ### Create Architecture Decision 1. Copy `decisions/0000-template.md` to `decisions/NNNN-title.md` 2. Fill in context, decision, consequences 3. Submit PR ## โŒ Antipatterns to Avoid 1. **Don't hardcode secrets** - Use External Secrets Operator 2. **Don't use `latest` tags** - Pin versions for reproducibility 3. **Don't skip ADRs** - Document significant decisions 4. **Don't bypass Flux** - All changes via Git, never `kubectl apply` directly ## ๐Ÿ“š Where to Learn More - [ARCHITECTURE.md](ARCHITECTURE.md) - Full system design - [TECH-STACK.md](TECH-STACK.md) - All technologies used - [decisions/](decisions/) - Why we made certain choices - [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities ## ๐Ÿ†˜ Quick Debugging ```bash # Check Flux sync status flux get all -A # View NATS JetStream streams kubectl exec -n ai-ml deploy/nats-box -- nats stream ls # Check GPU allocation kubectl describe node khelben | grep -A10 "Allocated" # View KServe inference services kubectl get inferenceservices -n ai-ml # Tail AI service logs kubectl logs -n ai-ml -l app=chat-handler -f ``` --- *This document is the canonical starting point for AI agents. When in doubt, check the ADRs.*