Files
homelab-design/TECH-STACK.md
Billy D. 100ba21eba
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m2s
updates to adrs and fixing to reflect go refactor.
2026-02-23 06:14:30 -05:00

9.4 KiB

🛠️ Technology Stack

Complete inventory of technologies used in the DaviesTechLabs homelab

Platform Layer

Operating System

Component Version Purpose
Talos Linux v1.12.1 Immutable, API-driven Kubernetes OS
Kernel 6.18.2-talos Linux kernel with GPU drivers

Container Orchestration

Component Version Purpose
Kubernetes v1.35.0 Container orchestration
containerd 2.1.6 Container runtime
Cilium Latest CNI, network policies, eBPF

GitOps

Component Version Purpose
Flux CD v2 GitOps continuous delivery
SOPS Latest Secret encryption
Age Latest Encryption key management

AI/ML Layer

GPU Inference (KubeRay RayService)

All AI inference runs on a unified Ray Serve endpoint with fractional GPU allocation:

Service Model GPU Node GPU Type Allocation
/llm vLLM (Llama 3.1 70B) khelben AMD Strix Halo 64GB 0.95 GPU
/whisper faster-whisper v3 elminster NVIDIA RTX 2070 8GB 0.5 GPU
/tts XTTS elminster NVIDIA RTX 2070 8GB 0.5 GPU
/embeddings BGE-Large drizzt AMD Radeon 680M 12GB 0.8 GPU
/reranker BGE-Reranker danilo Intel Arc 16GB 0.8 GPU

Endpoint: ai-inference-serve-svc.ai-ml.svc.cluster.local:8000/{service}

ML Serving Stack

Component Version Purpose
KubeRay 1.4+ Ray cluster operator
Ray Serve 2.53.0 Unified inference endpoints
KServe v0.12+ Abstraction layer (ExternalName aliases)

ML Workflows

Component Version Purpose
Kubeflow Pipelines 2.15.0 ML pipeline orchestration
Argo Workflows v3.7.8 DAG-based workflows
Argo Events Latest Event-driven triggers
MLflow 3.7.0 Experiment tracking, model registry

GPU Scheduling

Component Version Purpose
Volcano Latest GPU-aware scheduling
AMD GPU Device Plugin v1.4.1 ROCm GPU allocation
NVIDIA Device Plugin Latest CUDA GPU allocation
Node Feature Discovery v0.18.2 Hardware detection

Data Layer

Databases

Component Version Purpose
CloudNative-PG 16.11 PostgreSQL for metadata
Milvus Latest Vector database for RAG
ClickHouse Latest Analytics, access logs
Valkey Latest Redis-compatible cache

Object Storage

Component Version Purpose
MinIO Latest S3-compatible storage
Longhorn v1.10.1 Distributed block storage
NFS CSI Driver Latest Shared filesystem

Messaging

Component Version Purpose
NATS Latest Message bus
NATS JetStream Built-in Persistent streaming

Data Processing

Component Version Purpose
Apache Spark Latest Batch analytics
Apache Flink Latest Stream processing
Apache Iceberg Latest Table format
Nessie Latest Data catalog
Trino 479 SQL query engine

Application Layer

Web Frameworks

Application Language Framework Purpose
Companions Go net/http + HTMX AI chat interface (SSR)
Chat Handler Go handler-base RAG + LLM text pipeline
Voice Assistant Go handler-base STT → RAG → LLM → TTS pipeline
Pipeline Bridge Go handler-base Kubeflow/Argo workflow triggers
STT Module Go handler-base Speech-to-text bridge
TTS Module Go handler-base Text-to-speech bridge
Voice WebApp Python Gradio Voice assistant UI (dev/testing)
Ray Serve Python Ray Serve GPU inference endpoints

Frontend

Technology Purpose
HTMX Dynamic HTML updates
Alpine.js Lightweight reactivity
VRM 3D avatar rendering

Networking Layer

Ingress

Component Version Purpose
Envoy Gateway v1.6.3 Gateway API implementation
cloudflared Latest Cloudflare tunnel

DNS & Certificates

Component Version Purpose
external-dns Latest Automatic DNS management
cert-manager Latest TLS certificate automation

Service Mesh

Component Purpose
Spegel P2P container image distribution

Security Layer

Identity & Access

Component Version Purpose
Authentik 2025.12.1 Identity provider, SSO
Vault 1.21.2 Secret management
External Secrets Operator v1.3.1 Kubernetes secrets sync

Runtime Security

Component Version Purpose
Falco 0.42.1 Runtime threat detection
Cilium Network Policies Built-in Network segmentation

Backup

Component Version Purpose
Velero v1.17.1 Cluster backup/restore

Observability Layer

Metrics

Component Purpose
Prometheus Metrics collection
Grafana Dashboards & visualization

Logging

Component Version Purpose
Grafana Alloy v1.12.0 Log collection
Loki Latest Log aggregation

Tracing

Component Purpose
OpenTelemetry Collector Trace collection
Tempo/Jaeger Trace storage & query

Development Tools

Local Development

Tool Purpose
mise Tool version management
Task Task runner (Taskfile.yaml)
flux-local Local Flux testing

CI/CD

Tool Purpose
GitHub Actions CI/CD pipelines
Renovate Dependency updates

Image Building

Tool Purpose
Docker Container builds
GHCR Container registry

Media & Entertainment

Component Version Purpose
Jellyfin 10.11.5 Media server
Nextcloud 32.0.5 File sync & share
Prowlarr, Bazarr, etc. Various *arr stack
Kasm 1.18.1 Browser isolation

Go Dependencies (handler-base)

Shared Go module for all NATS handler services: handler-base

// go.mod (handler-base v1.0.0)
require (
    github.com/nats-io/nats.go          // NATS client
    google.golang.org/protobuf           // Protocol Buffers encoding
    github.com/zitadel/oidc/v3           // OIDC client
    go.opentelemetry.io/otel             // OpenTelemetry traces + metrics
    github.com/milvus-io/milvus-sdk-go   // Milvus vector search
)

See ADR-0061 for the full refactoring rationale.

Python Dependencies (ML/AI only)

Python is retained for ML inference, pipeline orchestration, and dev tools:

# ray-serve (GPU inference)
ray[serve]>=2.53.0
vllm>=0.8.0
faster-whisper>=1.0.0
TTS>=0.22.0
sentence-transformers>=3.0.0

# kubeflow (pipeline definitions)
kfp>=2.12.1

# mlflow (experiment tracking)
mlflow>=3.7.0
pymilvus>=2.4.0

Version Pinning Strategy

Component Type Strategy
Base images Pin major.minor
Helm charts Pin exact version
Python packages Pin minimum version
System extensions Pin via Talos schematic