feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
This commit is contained in:
2026-02-01 14:30:05 -05:00
parent 4d4f6f464c
commit 832cda34bd
26 changed files with 3805 additions and 2 deletions

106
README.md
View File

@@ -1,3 +1,105 @@
# homelab-design
# 🏠 DaviesTechLabs Homelab Architecture
homelab design process goes here.
> **Production-grade AI/ML platform running on bare-metal Kubernetes**
[![Talos](https://img.shields.io/badge/Talos-v1.12.1-blue?logo=linux)](https://talos.dev)
[![Kubernetes](https://img.shields.io/badge/Kubernetes-v1.35.0-326CE5?logo=kubernetes)](https://kubernetes.io)
[![Flux](https://img.shields.io/badge/GitOps-Flux-blue?logo=flux)](https://fluxcd.io)
[![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)
## 📖 Quick Navigation
| Document | Purpose |
|----------|---------|
| [AGENT-ONBOARDING.md](AGENT-ONBOARDING.md) | **Start here if you're an AI agent** |
| [ARCHITECTURE.md](ARCHITECTURE.md) | High-level system overview |
| [TECH-STACK.md](TECH-STACK.md) | Complete technology stack |
| [DOMAIN-MODEL.md](DOMAIN-MODEL.md) | Core entities and bounded contexts |
| [GLOSSARY.md](GLOSSARY.md) | Terminology reference |
| [decisions/](decisions/) | Architecture Decision Records (ADRs) |
## 🎯 What This Is
A comprehensive architecture documentation repository for the DaviesTechLabs homelab Kubernetes cluster, featuring:
- **AI/ML Platform**: KServe inference services, RAG pipelines, voice assistants
- **Multi-GPU Support**: AMD ROCm (RDNA3/Strix Halo), NVIDIA CUDA, Intel Arc
- **GitOps**: Flux CD with SOPS encryption
- **Event-Driven**: NATS JetStream for real-time messaging
- **ML Workflows**: Kubeflow Pipelines + Argo Workflows
## 🖥️ Cluster Overview
| Node | Role | Hardware | GPU |
|------|------|----------|-----|
| storm | Control Plane | Intel 13th Gen | Integrated |
| bruenor | Control Plane | Intel 13th Gen | Integrated |
| catti | Control Plane | Intel 13th Gen | Integrated |
| elminster | Worker | NVIDIA RTX 2070 | 8GB CUDA |
| khelben | Worker (vLLM) | AMD Strix Halo | 64GB Unified |
| drizzt | Worker | AMD Radeon 680M | 12GB RDNA2 |
| danilo | Worker | Intel Core Ultra 9 | Intel Arc |
## 🚀 Quick Start
### View Current Cluster State
```bash
# Get node status
kubectl get nodes -o wide
# View AI/ML workloads
kubectl get pods -n ai-ml
# Check KServe inference services
kubectl get inferenceservices -n ai-ml
```
### Key Endpoints
| Service | URL | Purpose |
|---------|-----|---------|
| Kubeflow | `kubeflow.lab.daviestechlabs.io` | ML Pipeline UI |
| Companions | `companions-chat.lab.daviestechlabs.io` | AI Chat Interface |
| Voice | `voice.lab.daviestechlabs.io` | Voice Assistant |
| Gitea | `git.daviestechlabs.io` | Self-hosted Git |
## 📂 Repository Structure
```
homelab-design/
├── README.md # This file
├── AGENT-ONBOARDING.md # AI agent quick-start
├── ARCHITECTURE.md # High-level system overview
├── CONTEXT-DIAGRAM.mmd # C4 Level 1 (Mermaid)
├── CONTAINER-DIAGRAM.mmd # C4 Level 2
├── TECH-STACK.md # Complete tech stack
├── DOMAIN-MODEL.md # Core entities
├── CODING-CONVENTIONS.md # Patterns & practices
├── GLOSSARY.md # Terminology
├── decisions/ # ADRs
│ ├── 0000-template.md
│ ├── 0001-record-architecture-decisions.md
│ ├── 0002-use-talos-linux.md
│ └── ...
├── specs/ # Feature specifications
└── diagrams/ # Additional diagrams
```
## 🔗 Related Repositories
| Repository | Purpose |
|------------|---------|
| [homelab-k8s2](https://github.com/Billy-Davies-2/homelab-k8s2) | Kubernetes manifests, Flux GitOps |
| [llm-workflows](https://github.com/Billy-Davies-2/llm-workflows) | NATS handlers, Argo/KFP workflows |
| [companions-frontend](https://github.com/Billy-Davies-2/companions-frontend) | Go web server, HTMX frontend |
## 📝 Contributing
1. For architecture changes, create an ADR in `decisions/`
2. Update relevant documentation
3. Submit a PR with context
---
*Last updated: 2026-02-01*