# ๐Ÿค– Agent Onboarding > **This is the most important file for AI agents working on this codebase.** ## TL;DR You are working on a **homelab Kubernetes cluster** running: - **Talos Linux v1.12.1** on bare-metal nodes - **Kubernetes v1.35.0** with Flux CD GitOps - **AI/ML platform** with KServe, Kubeflow, Milvus, NATS - **Multi-GPU** (AMD ROCm, NVIDIA CUDA, Intel Arc) ## ๐Ÿ—บ๏ธ Repository Map | Repo | What It Contains | When to Edit | |------|------------------|--------------| | `homelab-k8s2` | Kubernetes manifests, Talos config, Flux | Infrastructure changes | | `homelab-design` (this) | Architecture docs, ADRs | Design decisions | | `companions-frontend` | Go server, HTMX UI, VRM avatars | Frontend changes | ### AI/ML Repos (git.daviestechlabs.io/daviestechlabs) | Repo | Purpose | |------|---------| | `handler-base` | Shared Python library for NATS handlers | | `chat-handler` | Text chat with RAG pipeline | | `voice-assistant` | Voice pipeline (STT โ†’ RAG โ†’ LLM โ†’ TTS) | | `kuberay-images` | GPU-specific Ray worker Docker images | | `argo` | Argo Workflows (training, batch inference) | | `kubeflow` | Kubeflow Pipeline definitions | | `mlflow` | MLflow integration utilities | | `gradio-ui` | Gradio demo apps (embeddings, STT, TTS) | ## ๐Ÿ—๏ธ System Architecture (30-Second Version) ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ USER INTERFACES โ”‚ โ”‚ Companions WebApp โ”‚ Voice WebApp โ”‚ Kubeflow UI โ”‚ CLI โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ WebSocket/HTTP โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ NATS MESSAGE BUS โ”‚ โ”‚ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* โ”‚ โ”‚ Format: MessagePack (binary) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ–ผ โ–ผ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Chat Handler โ”‚ โ”‚Voice Assistantโ”‚ โ”‚Pipeline Bridgeโ”‚ โ”‚ (RAG+LLM) โ”‚ โ”‚ (STTโ†’LLMโ†’TTS) โ”‚ โ”‚ (KFP/Argo) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ AI SERVICES โ”‚ โ”‚ Whisper โ”‚ XTTS โ”‚ vLLM โ”‚ Milvus โ”‚ BGE Embed โ”‚ Reranker โ”‚ โ”‚ STT โ”‚ TTS โ”‚ LLM โ”‚ RAG โ”‚ Embed โ”‚ Rank โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ## ๐Ÿ“ Key File Locations ### Infrastructure (`homelab-k8s2`) ``` kubernetes/apps/ โ”œโ”€โ”€ ai-ml/ # ๐Ÿง  AI/ML services โ”‚ โ”œโ”€โ”€ kserve/ # InferenceServices โ”‚ โ”œโ”€โ”€ kubeflow/ # Pipelines, Training Operator โ”‚ โ”œโ”€โ”€ milvus/ # Vector database โ”‚ โ”œโ”€โ”€ nats/ # Message bus โ”‚ โ”œโ”€โ”€ vllm/ # LLM inference โ”‚ โ””โ”€โ”€ llm-workflows/ # GitRepo sync to llm-workflows โ”œโ”€โ”€ analytics/ # ๐Ÿ“Š Spark, Flink, ClickHouse โ”œโ”€โ”€ observability/ # ๐Ÿ“ˆ Grafana, Alloy, OpenTelemetry โ””โ”€โ”€ security/ # ๐Ÿ”’ Vault, Authentik, Falco talos/ โ”œโ”€โ”€ talconfig.yaml # Node definitions โ”œโ”€โ”€ patches/ # GPU-specific patches โ”‚ โ”œโ”€โ”€ amd/amdgpu.yaml โ”‚ โ””โ”€โ”€ nvidia/nvidia-runtime.yaml ``` ### AI/ML Services (Gitea daviestechlabs org) ``` handler-base/ # Shared handler library โ”œโ”€โ”€ handler_base/ # Core classes โ”‚ โ”œโ”€โ”€ handler.py # Base Handler class โ”‚ โ”œโ”€โ”€ nats_client.py # NATS wrapper โ”‚ โ””โ”€โ”€ clients/ # Service clients (STT, TTS, LLM, etc.) chat-handler/ # RAG chat service โ”œโ”€โ”€ chat_handler_v2.py # Handler-base version โ””โ”€โ”€ Dockerfile.v2 voice-assistant/ # Voice pipeline service โ”œโ”€โ”€ voice_assistant_v2.py # Handler-base version โ””โ”€โ”€ pipelines/voice_pipeline.py argo/ # Argo WorkflowTemplates โ”œโ”€โ”€ batch-inference.yaml โ”œโ”€โ”€ qlora-training.yaml โ””โ”€โ”€ document-ingestion.yaml kubeflow/ # Kubeflow Pipeline definitions โ”œโ”€โ”€ voice_pipeline.py โ”œโ”€โ”€ document_ingestion_pipeline.py โ””โ”€โ”€ evaluation_pipeline.py kuberay-images/ # GPU worker images โ”œโ”€โ”€ dockerfiles/ โ”‚ โ”œโ”€โ”€ Dockerfile.ray-worker-nvidia โ”‚ โ”œโ”€โ”€ Dockerfile.ray-worker-strixhalo โ”‚ โ””โ”€โ”€ Dockerfile.ray-worker-rdna2 โ””โ”€โ”€ ray-serve/ # Serve modules ``` ## ๐Ÿ”Œ Service Endpoints (Internal) ```python # Copy-paste ready for Python code NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222" VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1" WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local" TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local" EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local" RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local" MILVUS_HOST = "milvus.ai-ml.svc.cluster.local" MILVUS_PORT = 19530 VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379" ``` ## ๐Ÿ“จ NATS Subject Patterns ```python # Chat f"ai.chat.user.{user_id}.message" # User sends message f"ai.chat.response.{request_id}" # Response back f"ai.chat.response.stream.{request_id}" # Streaming tokens # Voice f"ai.voice.user.{user_id}.request" # Voice input f"ai.voice.response.{request_id}" # Voice output # Pipelines "ai.pipeline.trigger" # Trigger any pipeline f"ai.pipeline.status.{request_id}" # Status updates ``` ## ๐ŸŽฎ GPU Allocation | Node | GPU | Workload | Memory | |------|-----|----------|--------| | khelben | AMD Strix Halo | vLLM (dedicated) | 64GB unified | | elminster | NVIDIA RTX 2070 | Whisper + XTTS | 8GB VRAM | | drizzt | AMD Radeon 680M | BGE Embeddings | 12GB VRAM | | danilo | Intel Arc | Reranker | 16GB shared | ## โšก Common Tasks ### Deploy a New AI Service 1. Create InferenceService in `homelab-k8s2/kubernetes/apps/ai-ml/kserve/` 2. Push to main โ†’ Flux deploys automatically ### Add a New NATS Handler 1. Create handler repo or add to existing (use `handler-base` library) 2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/` 3. Push to main โ†’ Flux deploys automatically ### Add a New Argo Workflow 1. Add WorkflowTemplate to `argo/` repo 2. Push to main โ†’ Gitea syncs to cluster ### Add a New Kubeflow Pipeline 1. Add pipeline .py to `kubeflow/` repo 2. Compile with `python pipeline.py` 3. Upload YAML to Kubeflow UI ### Create Architecture Decision 1. Copy `decisions/0000-template.md` to `decisions/NNNN-title.md` 2. Fill in context, decision, consequences 3. Submit PR ## โŒ Antipatterns to Avoid 1. **Don't hardcode secrets** - Use External Secrets Operator 2. **Don't use `latest` tags** - Pin versions for reproducibility 3. **Don't skip ADRs** - Document significant decisions 4. **Don't bypass Flux** - All changes via Git, never `kubectl apply` directly ## ๐Ÿ“š Where to Learn More - [ARCHITECTURE.md](ARCHITECTURE.md) - Full system design - [TECH-STACK.md](TECH-STACK.md) - All technologies used - [decisions/](decisions/) - Why we made certain choices - [DOMAIN-MODEL.md](DOMAIN-MODEL.md) - Core entities ## ๐Ÿ†˜ Quick Debugging ```bash # Check Flux sync status flux get all -A # View NATS JetStream streams kubectl exec -n ai-ml deploy/nats-box -- nats stream ls # Check GPU allocation kubectl describe node khelben | grep -A10 "Allocated" # View KServe inference services kubectl get inferenceservices -n ai-ml # Tail AI service logs kubectl logs -n ai-ml -l app=chat-handler -f ``` --- *This document is the canonical starting point for AI agents. When in doubt, check the ADRs.*