- AGENT-ONBOARDING: New repo map with daviestechlabs Gitea repos - TECH-STACK: Reference handler-base instead of llm-workflows - CODING-CONVENTIONS: Update project structure for new repos - ADR 0006: Update GitRepository examples for Gitea repos llm-workflows has been split into: - handler-base, chat-handler, voice-assistant - kuberay-images, argo, kubeflow, mlflow, gradio-ui
278 lines
8.3 KiB
Markdown
278 lines
8.3 KiB
Markdown
# 🛠️ Technology Stack
|
|
|
|
> **Complete inventory of technologies used in the DaviesTechLabs homelab**
|
|
|
|
## Platform Layer
|
|
|
|
### Operating System
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Talos Linux](https://talos.dev) | v1.12.1 | Immutable, API-driven Kubernetes OS |
|
|
| Kernel | 6.18.2-talos | Linux kernel with GPU drivers |
|
|
|
|
### Container Orchestration
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Kubernetes](https://kubernetes.io) | v1.35.0 | Container orchestration |
|
|
| [containerd](https://containerd.io) | 2.1.6 | Container runtime |
|
|
| [Cilium](https://cilium.io) | Latest | CNI, network policies, eBPF |
|
|
|
|
### GitOps
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Flux CD](https://fluxcd.io) | v2 | GitOps continuous delivery |
|
|
| [SOPS](https://github.com/getsops/sops) | Latest | Secret encryption |
|
|
| [Age](https://github.com/FiloSottile/age) | Latest | Encryption key management |
|
|
|
|
---
|
|
|
|
## AI/ML Layer
|
|
|
|
### Inference Engines
|
|
|
|
| Service | Framework | GPU | Model Type |
|
|
|---------|-----------|-----|------------|
|
|
| [vLLM](https://vllm.ai) | ROCm | AMD Strix Halo | Large Language Models |
|
|
| [faster-whisper](https://github.com/guillaumekln/faster-whisper) | CUDA | NVIDIA RTX 2070 | Speech-to-Text |
|
|
| [XTTS](https://github.com/coqui-ai/TTS) | CUDA | NVIDIA RTX 2070 | Text-to-Speech |
|
|
| [BGE Embeddings](https://huggingface.co/BAAI/bge-large-en-v1.5) | ROCm | AMD Radeon 680M | Text Embeddings |
|
|
| [BGE Reranker](https://huggingface.co/BAAI/bge-reranker-large) | Intel | Intel Arc | Document Reranking |
|
|
|
|
### ML Serving
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [KServe](https://kserve.github.io) | v0.12+ | Model serving framework |
|
|
| [Ray Serve](https://ray.io/serve) | 2.53.0 | Unified inference endpoints |
|
|
|
|
### ML Workflows
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Kubeflow Pipelines](https://kubeflow.org) | 2.15.0 | ML pipeline orchestration |
|
|
| [Argo Workflows](https://argoproj.github.io/workflows) | v3.7.8 | DAG-based workflows |
|
|
| [Argo Events](https://argoproj.github.io/events) | Latest | Event-driven triggers |
|
|
| [MLflow](https://mlflow.org) | 3.7.0 | Experiment tracking, model registry |
|
|
|
|
### GPU Scheduling
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Volcano](https://volcano.sh) | Latest | GPU-aware scheduling |
|
|
| AMD GPU Device Plugin | v1.4.1 | ROCm GPU allocation |
|
|
| NVIDIA Device Plugin | Latest | CUDA GPU allocation |
|
|
| [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) | v0.18.2 | Hardware detection |
|
|
|
|
---
|
|
|
|
## Data Layer
|
|
|
|
### Databases
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [CloudNative-PG](https://cloudnative-pg.io) | 16.11 | PostgreSQL for metadata |
|
|
| [Milvus](https://milvus.io) | Latest | Vector database for RAG |
|
|
| [ClickHouse](https://clickhouse.com) | Latest | Analytics, access logs |
|
|
| [Valkey](https://valkey.io) | Latest | Redis-compatible cache |
|
|
|
|
### Object Storage
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [MinIO](https://min.io) | Latest | S3-compatible storage |
|
|
| [Longhorn](https://longhorn.io) | v1.10.1 | Distributed block storage |
|
|
| NFS CSI Driver | Latest | Shared filesystem |
|
|
|
|
### Messaging
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [NATS](https://nats.io) | Latest | Message bus |
|
|
| NATS JetStream | Built-in | Persistent streaming |
|
|
|
|
### Data Processing
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Apache Spark](https://spark.apache.org) | Latest | Batch analytics |
|
|
| [Apache Flink](https://flink.apache.org) | Latest | Stream processing |
|
|
| [Apache Iceberg](https://iceberg.apache.org) | Latest | Table format |
|
|
| [Nessie](https://projectnessie.org) | Latest | Data catalog |
|
|
| [Trino](https://trino.io) | 479 | SQL query engine |
|
|
|
|
---
|
|
|
|
## Application Layer
|
|
|
|
### Web Frameworks
|
|
|
|
| Application | Language | Framework | Purpose |
|
|
|-------------|----------|-----------|---------|
|
|
| Companions | Go | net/http + HTMX | AI chat interface |
|
|
| Voice WebApp | Python | Gradio | Voice assistant UI |
|
|
| Various handlers | Python | asyncio + nats.py | NATS event handlers |
|
|
|
|
### Frontend
|
|
|
|
| Technology | Purpose |
|
|
|------------|---------|
|
|
| [HTMX](https://htmx.org) | Dynamic HTML updates |
|
|
| [Alpine.js](https://alpinejs.dev) | Lightweight reactivity |
|
|
| [VRM](https://vrm.dev) | 3D avatar rendering |
|
|
|
|
---
|
|
|
|
## Networking Layer
|
|
|
|
### Ingress
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Envoy Gateway](https://gateway.envoyproxy.io) | v1.6.3 | Gateway API implementation |
|
|
| [cloudflared](https://developers.cloudflare.com/cloudflare-one/connections/connect-apps) | Latest | Cloudflare tunnel |
|
|
|
|
### DNS & Certificates
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [external-dns](https://github.com/kubernetes-sigs/external-dns) | Latest | Automatic DNS management |
|
|
| [cert-manager](https://cert-manager.io) | Latest | TLS certificate automation |
|
|
|
|
### Service Mesh
|
|
|
|
| Component | Purpose |
|
|
|-----------|---------|
|
|
| [Spegel](https://github.com/spegel-org/spegel) | P2P container image distribution |
|
|
|
|
---
|
|
|
|
## Security Layer
|
|
|
|
### Identity & Access
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Authentik](https://goauthentik.io) | 2025.12.1 | Identity provider, SSO |
|
|
| [Vault](https://vaultproject.io) | 1.21.2 | Secret management |
|
|
| [External Secrets Operator](https://external-secrets.io) | v1.3.1 | Kubernetes secrets sync |
|
|
|
|
### Runtime Security
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Falco](https://falco.org) | 0.42.1 | Runtime threat detection |
|
|
| Cilium Network Policies | Built-in | Network segmentation |
|
|
|
|
### Backup
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Velero](https://velero.io) | v1.17.1 | Cluster backup/restore |
|
|
|
|
---
|
|
|
|
## Observability Layer
|
|
|
|
### Metrics
|
|
|
|
| Component | Purpose |
|
|
|-----------|---------|
|
|
| [Prometheus](https://prometheus.io) | Metrics collection |
|
|
| [Grafana](https://grafana.com) | Dashboards & visualization |
|
|
|
|
### Logging
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Grafana Alloy](https://grafana.com/oss/alloy) | v1.12.0 | Log collection |
|
|
| [Loki](https://grafana.com/oss/loki) | Latest | Log aggregation |
|
|
|
|
### Tracing
|
|
|
|
| Component | Purpose |
|
|
|-----------|---------|
|
|
| [OpenTelemetry Collector](https://opentelemetry.io) | Trace collection |
|
|
| Tempo/Jaeger | Trace storage & query |
|
|
|
|
---
|
|
|
|
## Development Tools
|
|
|
|
### Local Development
|
|
|
|
| Tool | Purpose |
|
|
|------|---------|
|
|
| [mise](https://mise.jdx.dev) | Tool version management |
|
|
| [Task](https://taskfile.dev) | Task runner (Taskfile.yaml) |
|
|
| [flux-local](https://github.com/allenporter/flux-local) | Local Flux testing |
|
|
|
|
### CI/CD
|
|
|
|
| Tool | Purpose |
|
|
|------|---------|
|
|
| GitHub Actions | CI/CD pipelines |
|
|
| [Renovate](https://renovatebot.com) | Dependency updates |
|
|
|
|
### Image Building
|
|
|
|
| Tool | Purpose |
|
|
|------|---------|
|
|
| Docker | Container builds |
|
|
| GHCR | Container registry |
|
|
|
|
---
|
|
|
|
## Media & Entertainment
|
|
|
|
| Component | Version | Purpose |
|
|
|-----------|---------|---------|
|
|
| [Jellyfin](https://jellyfin.org) | 10.11.5 | Media server |
|
|
| [Nextcloud](https://nextcloud.com) | 32.0.5 | File sync & share |
|
|
| Prowlarr, Bazarr, etc. | Various | *arr stack |
|
|
| [Kasm](https://kasmweb.com) | 1.18.1 | Browser isolation |
|
|
|
|
---
|
|
|
|
## Python Dependencies (handler-base)
|
|
|
|
Core library for all NATS handlers: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
|
|
|
|
```toml
|
|
# Core
|
|
nats-py>=2.7.0 # NATS client
|
|
msgpack>=1.0.0 # Binary serialization
|
|
httpx>=0.27.0 # HTTP client
|
|
|
|
# ML/AI
|
|
pymilvus>=2.4.0 # Milvus client
|
|
openai>=1.0.0 # vLLM OpenAI API
|
|
|
|
# Observability
|
|
opentelemetry-api>=1.20.0
|
|
opentelemetry-sdk>=1.20.0
|
|
mlflow>=2.10.0 # Experiment tracking
|
|
|
|
# Kubeflow (kubeflow repo)
|
|
kfp>=2.12.1 # Pipeline SDK
|
|
```
|
|
|
|
---
|
|
|
|
## Version Pinning Strategy
|
|
|
|
| Component Type | Strategy |
|
|
|----------------|----------|
|
|
| Base images | Pin major.minor |
|
|
| Helm charts | Pin exact version |
|
|
| Python packages | Pin minimum version |
|
|
| System extensions | Pin via Talos schematic |
|
|
|
|
## Related Documents
|
|
|
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - How components connect
|
|
- [decisions/](decisions/) - Why we chose specific technologies
|