feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
This commit is contained in:
35
diagrams/README.md
Normal file
35
diagrams/README.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Diagrams
|
||||
|
||||
This directory contains additional architecture diagrams beyond the main C4 diagrams.
|
||||
|
||||
## Available Diagrams
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| [gpu-allocation.mmd](gpu-allocation.mmd) | GPU workload distribution |
|
||||
| [data-flow-chat.mmd](data-flow-chat.mmd) | Chat request data flow |
|
||||
| [data-flow-voice.mmd](data-flow-voice.mmd) | Voice request data flow |
|
||||
|
||||
## Rendering Diagrams
|
||||
|
||||
### VS Code
|
||||
|
||||
Install the "Markdown Preview Mermaid Support" extension.
|
||||
|
||||
### CLI
|
||||
|
||||
```bash
|
||||
# Using mmdc (Mermaid CLI)
|
||||
npx @mermaid-js/mermaid-cli mmdc -i diagram.mmd -o diagram.png
|
||||
```
|
||||
|
||||
### Online
|
||||
|
||||
Use [Mermaid Live Editor](https://mermaid.live)
|
||||
|
||||
## Diagram Conventions
|
||||
|
||||
1. Use `.mmd` extension for Mermaid diagrams
|
||||
2. Include title as comment at top of file
|
||||
3. Use consistent styling classes
|
||||
4. Keep diagrams focused (one concept per diagram)
|
||||
51
diagrams/data-flow-chat.mmd
Normal file
51
diagrams/data-flow-chat.mmd
Normal file
@@ -0,0 +1,51 @@
|
||||
%% Chat Request Data Flow
|
||||
%% Sequence diagram showing chat message processing
|
||||
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant U as User
|
||||
participant W as WebApp<br/>(companions)
|
||||
participant N as NATS
|
||||
participant C as Chat Handler
|
||||
participant V as Valkey<br/>(Cache)
|
||||
participant E as BGE Embeddings
|
||||
participant M as Milvus
|
||||
participant R as Reranker
|
||||
participant L as vLLM
|
||||
|
||||
U->>W: Send message
|
||||
W->>N: Publish ai.chat.user.{id}.message
|
||||
N->>C: Deliver message
|
||||
|
||||
C->>V: Get session history
|
||||
V-->>C: Previous messages
|
||||
|
||||
alt RAG Enabled
|
||||
C->>E: Generate query embedding
|
||||
E-->>C: Query vector
|
||||
C->>M: Search similar chunks
|
||||
M-->>C: Top-K chunks
|
||||
|
||||
opt Reranker Enabled
|
||||
C->>R: Rerank chunks
|
||||
R-->>C: Reordered chunks
|
||||
end
|
||||
end
|
||||
|
||||
C->>L: LLM inference (context + query)
|
||||
|
||||
alt Streaming Enabled
|
||||
loop For each token
|
||||
L-->>C: Token
|
||||
C->>N: Publish ai.chat.response.stream.{id}
|
||||
N-->>W: Deliver chunk
|
||||
W-->>U: Display token
|
||||
end
|
||||
else Non-streaming
|
||||
L-->>C: Full response
|
||||
C->>N: Publish ai.chat.response.{id}
|
||||
N-->>W: Deliver response
|
||||
W-->>U: Display response
|
||||
end
|
||||
|
||||
C->>V: Save to session history
|
||||
46
diagrams/data-flow-voice.mmd
Normal file
46
diagrams/data-flow-voice.mmd
Normal file
@@ -0,0 +1,46 @@
|
||||
%% Voice Request Data Flow
|
||||
%% Sequence diagram showing voice assistant processing
|
||||
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant U as User
|
||||
participant W as Voice WebApp
|
||||
participant N as NATS
|
||||
participant VA as Voice Assistant
|
||||
participant STT as Whisper<br/>(STT)
|
||||
participant E as BGE Embeddings
|
||||
participant M as Milvus
|
||||
participant R as Reranker
|
||||
participant L as vLLM
|
||||
participant TTS as XTTS<br/>(TTS)
|
||||
|
||||
U->>W: Record audio
|
||||
W->>N: Publish ai.voice.user.{id}.request<br/>(msgpack with audio bytes)
|
||||
N->>VA: Deliver voice request
|
||||
|
||||
VA->>STT: Transcribe audio
|
||||
STT-->>VA: Transcription text
|
||||
|
||||
alt RAG Enabled
|
||||
VA->>E: Generate query embedding
|
||||
E-->>VA: Query vector
|
||||
VA->>M: Search similar chunks
|
||||
M-->>VA: Top-K chunks
|
||||
|
||||
opt Reranker Enabled
|
||||
VA->>R: Rerank chunks
|
||||
R-->>VA: Reordered chunks
|
||||
end
|
||||
end
|
||||
|
||||
VA->>L: LLM inference
|
||||
L-->>VA: Response text
|
||||
|
||||
VA->>TTS: Synthesize speech
|
||||
TTS-->>VA: Audio bytes
|
||||
|
||||
VA->>N: Publish ai.voice.response.{id}<br/>(text + audio)
|
||||
N-->>W: Deliver response
|
||||
W-->>U: Play audio + show text
|
||||
|
||||
Note over VA,TTS: Total latency target: < 3s
|
||||
47
diagrams/gpu-allocation.mmd
Normal file
47
diagrams/gpu-allocation.mmd
Normal file
@@ -0,0 +1,47 @@
|
||||
%% GPU Allocation Diagram
|
||||
%% Shows how AI workloads are distributed across GPU nodes
|
||||
|
||||
flowchart TB
|
||||
subgraph khelben["🖥️ khelben (AMD Strix Halo 64GB)"]
|
||||
direction TB
|
||||
vllm["🧠 vLLM<br/>LLM Inference<br/>100% GPU"]
|
||||
end
|
||||
|
||||
subgraph elminster["🖥️ elminster (NVIDIA RTX 2070 8GB)"]
|
||||
direction TB
|
||||
whisper["🎤 Whisper<br/>STT<br/>~50% GPU"]
|
||||
xtts["🔊 XTTS<br/>TTS<br/>~50% GPU"]
|
||||
end
|
||||
|
||||
subgraph drizzt["🖥️ drizzt (AMD Radeon 680M 12GB)"]
|
||||
direction TB
|
||||
embeddings["📊 BGE Embeddings<br/>Vector Encoding<br/>~80% GPU"]
|
||||
end
|
||||
|
||||
subgraph danilo["🖥️ danilo (Intel Arc)"]
|
||||
direction TB
|
||||
reranker["📋 BGE Reranker<br/>Document Ranking<br/>~80% GPU"]
|
||||
end
|
||||
|
||||
subgraph workloads["Workload Routing"]
|
||||
chat["💬 Chat Request"]
|
||||
voice["🎤 Voice Request"]
|
||||
end
|
||||
|
||||
chat --> embeddings
|
||||
chat --> reranker
|
||||
chat --> vllm
|
||||
|
||||
voice --> whisper
|
||||
voice --> embeddings
|
||||
voice --> reranker
|
||||
voice --> vllm
|
||||
voice --> xtts
|
||||
|
||||
classDef nvidia fill:#76B900,color:white
|
||||
classDef amd fill:#ED1C24,color:white
|
||||
classDef intel fill:#0071C5,color:white
|
||||
|
||||
class whisper,xtts nvidia
|
||||
class vllm,embeddings amd
|
||||
class reranker intel
|
||||
Reference in New Issue
Block a user