- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
106 lines
3.9 KiB
Markdown
106 lines
3.9 KiB
Markdown
# 🏠 DaviesTechLabs Homelab Architecture
|
|
|
|
> **Production-grade AI/ML platform running on bare-metal Kubernetes**
|
|
|
|
[](https://talos.dev)
|
|
[](https://kubernetes.io)
|
|
[](https://fluxcd.io)
|
|
[](LICENSE)
|
|
|
|
## 📖 Quick Navigation
|
|
|
|
| Document | Purpose |
|
|
|----------|---------|
|
|
| [AGENT-ONBOARDING.md](AGENT-ONBOARDING.md) | **Start here if you're an AI agent** |
|
|
| [ARCHITECTURE.md](ARCHITECTURE.md) | High-level system overview |
|
|
| [TECH-STACK.md](TECH-STACK.md) | Complete technology stack |
|
|
| [DOMAIN-MODEL.md](DOMAIN-MODEL.md) | Core entities and bounded contexts |
|
|
| [GLOSSARY.md](GLOSSARY.md) | Terminology reference |
|
|
| [decisions/](decisions/) | Architecture Decision Records (ADRs) |
|
|
|
|
## 🎯 What This Is
|
|
|
|
A comprehensive architecture documentation repository for the DaviesTechLabs homelab Kubernetes cluster, featuring:
|
|
|
|
- **AI/ML Platform**: KServe inference services, RAG pipelines, voice assistants
|
|
- **Multi-GPU Support**: AMD ROCm (RDNA3/Strix Halo), NVIDIA CUDA, Intel Arc
|
|
- **GitOps**: Flux CD with SOPS encryption
|
|
- **Event-Driven**: NATS JetStream for real-time messaging
|
|
- **ML Workflows**: Kubeflow Pipelines + Argo Workflows
|
|
|
|
## 🖥️ Cluster Overview
|
|
|
|
| Node | Role | Hardware | GPU |
|
|
|------|------|----------|-----|
|
|
| storm | Control Plane | Intel 13th Gen | Integrated |
|
|
| bruenor | Control Plane | Intel 13th Gen | Integrated |
|
|
| catti | Control Plane | Intel 13th Gen | Integrated |
|
|
| elminster | Worker | NVIDIA RTX 2070 | 8GB CUDA |
|
|
| khelben | Worker (vLLM) | AMD Strix Halo | 64GB Unified |
|
|
| drizzt | Worker | AMD Radeon 680M | 12GB RDNA2 |
|
|
| danilo | Worker | Intel Core Ultra 9 | Intel Arc |
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### View Current Cluster State
|
|
|
|
```bash
|
|
# Get node status
|
|
kubectl get nodes -o wide
|
|
|
|
# View AI/ML workloads
|
|
kubectl get pods -n ai-ml
|
|
|
|
# Check KServe inference services
|
|
kubectl get inferenceservices -n ai-ml
|
|
```
|
|
|
|
### Key Endpoints
|
|
|
|
| Service | URL | Purpose |
|
|
|---------|-----|---------|
|
|
| Kubeflow | `kubeflow.lab.daviestechlabs.io` | ML Pipeline UI |
|
|
| Companions | `companions-chat.lab.daviestechlabs.io` | AI Chat Interface |
|
|
| Voice | `voice.lab.daviestechlabs.io` | Voice Assistant |
|
|
| Gitea | `git.daviestechlabs.io` | Self-hosted Git |
|
|
|
|
## 📂 Repository Structure
|
|
|
|
```
|
|
homelab-design/
|
|
├── README.md # This file
|
|
├── AGENT-ONBOARDING.md # AI agent quick-start
|
|
├── ARCHITECTURE.md # High-level system overview
|
|
├── CONTEXT-DIAGRAM.mmd # C4 Level 1 (Mermaid)
|
|
├── CONTAINER-DIAGRAM.mmd # C4 Level 2
|
|
├── TECH-STACK.md # Complete tech stack
|
|
├── DOMAIN-MODEL.md # Core entities
|
|
├── CODING-CONVENTIONS.md # Patterns & practices
|
|
├── GLOSSARY.md # Terminology
|
|
├── decisions/ # ADRs
|
|
│ ├── 0000-template.md
|
|
│ ├── 0001-record-architecture-decisions.md
|
|
│ ├── 0002-use-talos-linux.md
|
|
│ └── ...
|
|
├── specs/ # Feature specifications
|
|
└── diagrams/ # Additional diagrams
|
|
```
|
|
|
|
## 🔗 Related Repositories
|
|
|
|
| Repository | Purpose |
|
|
|------------|---------|
|
|
| [homelab-k8s2](https://github.com/Billy-Davies-2/homelab-k8s2) | Kubernetes manifests, Flux GitOps |
|
|
| [llm-workflows](https://github.com/Billy-Davies-2/llm-workflows) | NATS handlers, Argo/KFP workflows |
|
|
| [companions-frontend](https://github.com/Billy-Davies-2/companions-frontend) | Go web server, HTMX frontend |
|
|
|
|
## 📝 Contributing
|
|
|
|
1. For architecture changes, create an ADR in `decisions/`
|
|
2. Update relevant documentation
|
|
3. Submit a PR with context
|
|
|
|
---
|
|
|
|
*Last updated: 2026-02-01*
|