Files
homelab-design/AGENT-ONBOARDING.md
Billy D. 832cda34bd feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00

7.9 KiB

🤖 Agent Onboarding

This is the most important file for AI agents working on this codebase.

TL;DR

You are working on a homelab Kubernetes cluster running:

  • Talos Linux v1.12.1 on bare-metal nodes
  • Kubernetes v1.35.0 with Flux CD GitOps
  • AI/ML platform with KServe, Kubeflow, Milvus, NATS
  • Multi-GPU (AMD ROCm, NVIDIA CUDA, Intel Arc)

🗺️ Repository Map

Repo What It Contains When to Edit
homelab-k8s2 Kubernetes manifests, Talos config, Flux Infrastructure changes
llm-workflows NATS handlers, Argo/KFP workflows Workflow/handler changes
companions-frontend Go server, HTMX UI, VRM avatars Frontend changes
homelab-design (this) Architecture docs, ADRs Design decisions

🏗️ System Architecture (30-Second Version)

┌─────────────────────────────────────────────────────────────────┐
│                         USER INTERFACES                          │
│  Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI           │
└───────────────────────────┬─────────────────────────────────────┘
                            │ WebSocket/HTTP
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      NATS MESSAGE BUS                            │
│  Subjects: ai.chat.*, ai.voice.*, ai.pipeline.*                 │
│  Format: MessagePack (binary)                                   │
└───────────────────────────┬─────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Chat Handler  │   │Voice Assistant│   │Pipeline Bridge│
│ (RAG+LLM)     │   │ (STT→LLM→TTS) │   │ (KFP/Argo)    │
└───────┬───────┘   └───────┬───────┘   └───────┬───────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                       AI SERVICES                                │
│  Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker         │
│    STT   │ TTS  │ LLM  │  RAG   │   Embed   │  Rank            │
└─────────────────────────────────────────────────────────────────┘

📁 Key File Locations

Infrastructure (homelab-k8s2)

kubernetes/apps/
├── ai-ml/                    # 🧠 AI/ML services
│   ├── kserve/               #   InferenceServices
│   ├── kubeflow/             #   Pipelines, Training Operator
│   ├── milvus/               #   Vector database
│   ├── nats/                 #   Message bus
│   ├── vllm/                 #   LLM inference
│   └── llm-workflows/        #   GitRepo sync to llm-workflows
├── analytics/                # 📊 Spark, Flink, ClickHouse
├── observability/            # 📈 Grafana, Alloy, OpenTelemetry
└── security/                 # 🔒 Vault, Authentik, Falco

talos/
├── talconfig.yaml            # Node definitions
├── patches/                  # GPU-specific patches
│   ├── amd/amdgpu.yaml
│   └── nvidia/nvidia-runtime.yaml

Workflows (llm-workflows)

workflows/                    # NATS handler deployments
├── chat-handler.yaml
├── voice-assistant.yaml
└── pipeline-bridge.yaml

argo/                         # Argo WorkflowTemplates
├── document-ingestion.yaml
├── batch-inference.yaml
└── qlora-training.yaml

pipelines/                    # Kubeflow Pipeline Python
├── voice_pipeline.py
└── document_ingestion_pipeline.py

🔌 Service Endpoints (Internal)

# Copy-paste ready for Python code
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
MILVUS_PORT = 19530
VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"

📨 NATS Subject Patterns

# Chat
f"ai.chat.user.{user_id}.message"      # User sends message
f"ai.chat.response.{request_id}"       # Response back
f"ai.chat.response.stream.{request_id}" # Streaming tokens

# Voice
f"ai.voice.user.{user_id}.request"     # Voice input
f"ai.voice.response.{request_id}"      # Voice output

# Pipelines
"ai.pipeline.trigger"                   # Trigger any pipeline
f"ai.pipeline.status.{request_id}"     # Status updates

🎮 GPU Allocation

Node GPU Workload Memory
khelben AMD Strix Halo vLLM (dedicated) 64GB unified
elminster NVIDIA RTX 2070 Whisper + XTTS 8GB VRAM
drizzt AMD Radeon 680M BGE Embeddings 12GB VRAM
danilo Intel Arc Reranker 16GB shared

Common Tasks

Deploy a New AI Service

  1. Create InferenceService in homelab-k8s2/kubernetes/apps/ai-ml/kserve/
  2. Add endpoint to llm-workflows/config/ai-services-config.yaml
  3. Push to main → Flux deploys automatically

Add a New Workflow

  1. Create handler in llm-workflows/chat-handler/ or llm-workflows/voice-assistant/
  2. Add Kubernetes Deployment in llm-workflows/workflows/
  3. Push to main → Flux deploys automatically

Create Architecture Decision

  1. Copy decisions/0000-template.md to decisions/NNNN-title.md
  2. Fill in context, decision, consequences
  3. Submit PR

Antipatterns to Avoid

  1. Don't hardcode secrets - Use External Secrets Operator
  2. Don't use latest tags - Pin versions for reproducibility
  3. Don't skip ADRs - Document significant decisions
  4. Don't bypass Flux - All changes via Git, never kubectl apply directly

📚 Where to Learn More

🆘 Quick Debugging

# Check Flux sync status
flux get all -A

# View NATS JetStream streams
kubectl exec -n ai-ml deploy/nats-box -- nats stream ls

# Check GPU allocation
kubectl describe node khelben | grep -A10 "Allocated"

# View KServe inference services
kubectl get inferenceservices -n ai-ml

# Tail AI service logs
kubectl logs -n ai-ml -l app=chat-handler -f

This document is the canonical starting point for AI agents. When in doubt, check the ADRs.