Files

Billy D. 8e3e2043c3

Update README with ADR Index / update-readme (push) Successful in 6s

Details

docs: add ADR-0038/0039 and replace llm-workflows references with decomposed repos

- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller)
- ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord)
- Replace llm-workflows GitHub links with Gitea daviestechlabs org repos
- Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos
- Update ADR-0006: fix multi-repo reference
- Update ADR-0009: fix broken llm-workflows link
- Update ADR-0024: mark ray-serve repo as created, update historical context
- Update README: fix ADR-0016 status, add 0038/0039 to table, update badges

2026-02-09 18:12:37 -05:00

9.5 KiB

Raw Blame History

🤖 Agent Onboarding

This is the most important file for AI agents working on this codebase.

TL;DR

You are working on a homelab Kubernetes cluster running:

Talos Linux v1.12.1 on bare-metal nodes
Kubernetes v1.35.0 with Flux CD GitOps
AI/ML platform with KServe, Kubeflow, Milvus, NATS
Multi-GPU (AMD ROCm, NVIDIA CUDA, Intel Arc)

🗺️ Repository Map

Repo	What It Contains	When to Edit
`homelab-k8s2`	Kubernetes manifests, Talos config, Flux	Infrastructure changes
`homelab-design` (this)	Architecture docs, ADRs	Design decisions
`companions-frontend`	Go server, HTMX UI, VRM avatars	Frontend changes

AI/ML Repos (git.daviestechlabs.io/daviestechlabs)

Repo	Purpose
`handler-base`	Shared Python library for NATS handlers
`chat-handler`	Text chat with RAG pipeline
`voice-assistant`	Voice pipeline (STT → RAG → LLM → TTS)
`kuberay-images`	GPU-specific Ray worker Docker images
`pipeline-bridge`	Bridge between pipelines and services
`stt-module`	Speech-to-text service
`tts-module`	Text-to-speech service
`ray-serve`	Ray Serve inference services
`argo`	Argo Workflows (training, batch inference)
`kubeflow`	Kubeflow Pipeline definitions
`mlflow`	MLflow integration utilities
`gradio-ui`	Gradio demo apps (embeddings, STT, TTS)
`ntfy-discord`	ntfy → Discord notification bridge

🏗️ System Architecture (30-Second Version)

┌─────────────────────────────────────────────────────────────────┐
│                         USER INTERFACES                          │
│  Companions WebApp │ Voice WebApp │ Kubeflow UI │ CLI           │
└───────────────────────────┬─────────────────────────────────────┘
                            │ WebSocket/HTTP
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      NATS MESSAGE BUS                            │
│  Subjects: ai.chat.*, ai.voice.*, ai.pipeline.*                 │
│  Format: MessagePack (binary)                                   │
└───────────────────────────┬─────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Chat Handler  │   │Voice Assistant│   │Pipeline Bridge│
│ (RAG+LLM)     │   │ (STT→LLM→TTS) │   │ (KFP/Argo)    │
└───────┬───────┘   └───────┬───────┘   └───────┬───────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                       AI SERVICES                                │
│  Whisper │ XTTS │ vLLM │ Milvus │ BGE Embed │ Reranker         │
│    STT   │ TTS  │ LLM  │  RAG   │   Embed   │  Rank            │
└─────────────────────────────────────────────────────────────────┘

📁 Key File Locations

Infrastructure (`homelab-k8s2`)

kubernetes/apps/
├── ai-ml/                    # 🧠 AI/ML services
│   ├── kserve/               #   InferenceServices
│   ├── kubeflow/             #   Pipelines, Training Operator
│   ├── milvus/               #   Vector database
│   ├── nats/                 #   Message bus
│   └── vllm/                 #   LLM inference
├── analytics/                # 📊 Spark, Flink, ClickHouse
├── observability/            # 📈 Grafana, Alloy, OpenTelemetry
└── security/                 # 🔒 Vault, Authentik, Falco

talos/
├── talconfig.yaml            # Node definitions
├── patches/                  # GPU-specific patches
│   ├── amd/amdgpu.yaml
│   └── nvidia/nvidia-runtime.yaml

AI/ML Services (Gitea daviestechlabs org)

handler-base/                 # Shared handler library
├── handler_base/             #   Core classes
│   ├── handler.py            #   Base Handler class
│   ├── nats_client.py        #   NATS wrapper
│   └── clients/              #   Service clients (STT, TTS, LLM, etc.)

chat-handler/                 # RAG chat service
├── chat_handler_v2.py        #   Handler-base version
└── Dockerfile.v2

voice-assistant/              # Voice pipeline service
├── voice_assistant_v2.py     #   Handler-base version
└── pipelines/voice_pipeline.py

argo/                         # Argo WorkflowTemplates
├── batch-inference.yaml
├── qlora-training.yaml
└── document-ingestion.yaml

kubeflow/                     # Kubeflow Pipeline definitions
├── voice_pipeline.py
├── document_ingestion_pipeline.py
└── evaluation_pipeline.py

kuberay-images/               # GPU worker images
├── dockerfiles/
│   ├── Dockerfile.ray-worker-nvidia
│   ├── Dockerfile.ray-worker-strixhalo
│   └── Dockerfile.ray-worker-rdna2
└── ray-serve/                #   Serve modules

🔌 Service Endpoints (Internal)

# Copy-paste ready for Python code
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
TTS_URL = "http://tts-predictor.ai-ml.svc.cluster.local"
EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
MILVUS_PORT = 19530
VALKEY_URL = "redis://valkey.ai-ml.svc.cluster.local:6379"

📨 NATS Subject Patterns

# Chat
f"ai.chat.user.{user_id}.message"      # User sends message
f"ai.chat.response.{request_id}"       # Response back
f"ai.chat.response.stream.{request_id}" # Streaming tokens

# Voice
f"ai.voice.user.{user_id}.request"     # Voice input
f"ai.voice.response.{request_id}"      # Voice output

# Pipelines
"ai.pipeline.trigger"                   # Trigger any pipeline
f"ai.pipeline.status.{request_id}"     # Status updates

🎮 GPU Allocation

Node	GPU	Workload	Memory
khelben	AMD Strix Halo	vLLM (dedicated)	64GB unified
elminster	NVIDIA RTX 2070	Whisper + XTTS	8GB VRAM
drizzt	AMD Radeon 680M	BGE Embeddings	12GB VRAM
danilo	Intel Arc	Reranker	16GB shared

⚡ Common Tasks

Deploy a New AI Service

Create InferenceService in homelab-k8s2/kubernetes/apps/ai-ml/kserve/
Push to main → Flux deploys automatically

Add a New NATS Handler

Create handler repo or add to existing (use handler-base library)
Add K8s Deployment in homelab-k8s2/kubernetes/apps/ai-ml/
Push to main → Flux deploys automatically

Add a New Argo Workflow

Add WorkflowTemplate to argo/ repo
Push to main → Gitea syncs to cluster

Add a New Kubeflow Pipeline

Add pipeline .py to kubeflow/ repo
Compile with python pipeline.py
Upload YAML to Kubeflow UI

Create Architecture Decision

Copy decisions/0000-template.md to decisions/NNNN-title.md
Fill in context, decision, consequences
Submit PR

❌ Antipatterns to Avoid

Don't hardcode secrets - Use External Secrets Operator
Don't use latest tags - Pin versions for reproducibility
Don't skip ADRs - Document significant decisions
Don't bypass Flux - All changes via Git, never kubectl apply directly

📚 Where to Learn More

ARCHITECTURE.md - Full system design
TECH-STACK.md - All technologies used
decisions/ - Why we made certain choices
DOMAIN-MODEL.md - Core entities

🆘 Quick Debugging

# Check Flux sync status
flux get all -A

# View NATS JetStream streams
kubectl exec -n ai-ml deploy/nats-box -- nats stream ls

# Check GPU allocation
kubectl describe node khelben | grep -A10 "Allocated"

# View KServe inference services
kubectl get inferenceservices -n ai-ml

# Tail AI service logs
kubectl logs -n ai-ml -l app=chat-handler -f

This document is the canonical starting point for AI agents. When in doubt, check the ADRs.

9.5 KiB Raw Blame History