feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00
parent 4d4f6f464c
commit 832cda34bd
26 changed files with 3805 additions and 2 deletions
--- a/diagrams/data-flow-voice.mmd
+++ b/diagrams/data-flow-voice.mmd
@@ -0,0 +1,46 @@
+%% Voice Request Data Flow
+%% Sequence diagram showing voice assistant processing
+
+sequenceDiagram
+    autonumber
+    participant U as User
+    participant W as Voice WebApp
+    participant N as NATS
+    participant VA as Voice Assistant
+    participant STT as Whisper<br/>(STT)
+    participant E as BGE Embeddings
+    participant M as Milvus
+    participant R as Reranker
+    participant L as vLLM
+    participant TTS as XTTS<br/>(TTS)
+
+    U->>W: Record audio
+    W->>N: Publish ai.voice.user.{id}.request<br/>(msgpack with audio bytes)
+    N->>VA: Deliver voice request
+    
+    VA->>STT: Transcribe audio
+    STT-->>VA: Transcription text
+    
+    alt RAG Enabled
+        VA->>E: Generate query embedding
+        E-->>VA: Query vector
+        VA->>M: Search similar chunks
+        M-->>VA: Top-K chunks
+        
+        opt Reranker Enabled
+            VA->>R: Rerank chunks
+            R-->>VA: Reordered chunks
+        end
+    end
+    
+    VA->>L: LLM inference
+    L-->>VA: Response text
+    
+    VA->>TTS: Synthesize speech
+    TTS-->>VA: Audio bytes
+    
+    VA->>N: Publish ai.voice.response.{id}<br/>(text + audio)
+    N-->>W: Deliver response
+    W-->>U: Play audio + show text
+
+    Note over VA,TTS: Total latency target: < 3s