feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
This commit is contained in:
106
README.md
106
README.md
@@ -1,3 +1,105 @@
|
||||
# homelab-design
|
||||
# 🏠 DaviesTechLabs Homelab Architecture
|
||||
|
||||
homelab design process goes here.
|
||||
> **Production-grade AI/ML platform running on bare-metal Kubernetes**
|
||||
|
||||
[](https://talos.dev)
|
||||
[](https://kubernetes.io)
|
||||
[](https://fluxcd.io)
|
||||
[](LICENSE)
|
||||
|
||||
## 📖 Quick Navigation
|
||||
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [AGENT-ONBOARDING.md](AGENT-ONBOARDING.md) | **Start here if you're an AI agent** |
|
||||
| [ARCHITECTURE.md](ARCHITECTURE.md) | High-level system overview |
|
||||
| [TECH-STACK.md](TECH-STACK.md) | Complete technology stack |
|
||||
| [DOMAIN-MODEL.md](DOMAIN-MODEL.md) | Core entities and bounded contexts |
|
||||
| [GLOSSARY.md](GLOSSARY.md) | Terminology reference |
|
||||
| [decisions/](decisions/) | Architecture Decision Records (ADRs) |
|
||||
|
||||
## 🎯 What This Is
|
||||
|
||||
A comprehensive architecture documentation repository for the DaviesTechLabs homelab Kubernetes cluster, featuring:
|
||||
|
||||
- **AI/ML Platform**: KServe inference services, RAG pipelines, voice assistants
|
||||
- **Multi-GPU Support**: AMD ROCm (RDNA3/Strix Halo), NVIDIA CUDA, Intel Arc
|
||||
- **GitOps**: Flux CD with SOPS encryption
|
||||
- **Event-Driven**: NATS JetStream for real-time messaging
|
||||
- **ML Workflows**: Kubeflow Pipelines + Argo Workflows
|
||||
|
||||
## 🖥️ Cluster Overview
|
||||
|
||||
| Node | Role | Hardware | GPU |
|
||||
|------|------|----------|-----|
|
||||
| storm | Control Plane | Intel 13th Gen | Integrated |
|
||||
| bruenor | Control Plane | Intel 13th Gen | Integrated |
|
||||
| catti | Control Plane | Intel 13th Gen | Integrated |
|
||||
| elminster | Worker | NVIDIA RTX 2070 | 8GB CUDA |
|
||||
| khelben | Worker (vLLM) | AMD Strix Halo | 64GB Unified |
|
||||
| drizzt | Worker | AMD Radeon 680M | 12GB RDNA2 |
|
||||
| danilo | Worker | Intel Core Ultra 9 | Intel Arc |
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### View Current Cluster State
|
||||
|
||||
```bash
|
||||
# Get node status
|
||||
kubectl get nodes -o wide
|
||||
|
||||
# View AI/ML workloads
|
||||
kubectl get pods -n ai-ml
|
||||
|
||||
# Check KServe inference services
|
||||
kubectl get inferenceservices -n ai-ml
|
||||
```
|
||||
|
||||
### Key Endpoints
|
||||
|
||||
| Service | URL | Purpose |
|
||||
|---------|-----|---------|
|
||||
| Kubeflow | `kubeflow.lab.daviestechlabs.io` | ML Pipeline UI |
|
||||
| Companions | `companions-chat.lab.daviestechlabs.io` | AI Chat Interface |
|
||||
| Voice | `voice.lab.daviestechlabs.io` | Voice Assistant |
|
||||
| Gitea | `git.daviestechlabs.io` | Self-hosted Git |
|
||||
|
||||
## 📂 Repository Structure
|
||||
|
||||
```
|
||||
homelab-design/
|
||||
├── README.md # This file
|
||||
├── AGENT-ONBOARDING.md # AI agent quick-start
|
||||
├── ARCHITECTURE.md # High-level system overview
|
||||
├── CONTEXT-DIAGRAM.mmd # C4 Level 1 (Mermaid)
|
||||
├── CONTAINER-DIAGRAM.mmd # C4 Level 2
|
||||
├── TECH-STACK.md # Complete tech stack
|
||||
├── DOMAIN-MODEL.md # Core entities
|
||||
├── CODING-CONVENTIONS.md # Patterns & practices
|
||||
├── GLOSSARY.md # Terminology
|
||||
├── decisions/ # ADRs
|
||||
│ ├── 0000-template.md
|
||||
│ ├── 0001-record-architecture-decisions.md
|
||||
│ ├── 0002-use-talos-linux.md
|
||||
│ └── ...
|
||||
├── specs/ # Feature specifications
|
||||
└── diagrams/ # Additional diagrams
|
||||
```
|
||||
|
||||
## 🔗 Related Repositories
|
||||
|
||||
| Repository | Purpose |
|
||||
|------------|---------|
|
||||
| [homelab-k8s2](https://github.com/Billy-Davies-2/homelab-k8s2) | Kubernetes manifests, Flux GitOps |
|
||||
| [llm-workflows](https://github.com/Billy-Davies-2/llm-workflows) | NATS handlers, Argo/KFP workflows |
|
||||
| [companions-frontend](https://github.com/Billy-Davies-2/companions-frontend) | Go web server, HTMX frontend |
|
||||
|
||||
## 📝 Contributing
|
||||
|
||||
1. For architecture changes, create an ADR in `decisions/`
|
||||
2. Update relevant documentation
|
||||
3. Submit a PR with context
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-02-01*
|
||||
|
||||
Reference in New Issue
Block a user