feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00
parent 4d4f6f464c
commit 832cda34bd
26 changed files with 3805 additions and 2 deletions
--- a/diagrams/data-flow-chat.mmd
+++ b/diagrams/data-flow-chat.mmd
@@ -0,0 +1,51 @@
+%% Chat Request Data Flow
+%% Sequence diagram showing chat message processing
+
+sequenceDiagram
+    autonumber
+    participant U as User
+    participant W as WebApp<br/>(companions)
+    participant N as NATS
+    participant C as Chat Handler
+    participant V as Valkey<br/>(Cache)
+    participant E as BGE Embeddings
+    participant M as Milvus
+    participant R as Reranker
+    participant L as vLLM
+
+    U->>W: Send message
+    W->>N: Publish ai.chat.user.{id}.message
+    N->>C: Deliver message
+    
+    C->>V: Get session history
+    V-->>C: Previous messages
+    
+    alt RAG Enabled
+        C->>E: Generate query embedding
+        E-->>C: Query vector
+        C->>M: Search similar chunks
+        M-->>C: Top-K chunks
+        
+        opt Reranker Enabled
+            C->>R: Rerank chunks
+            R-->>C: Reordered chunks
+        end
+    end
+    
+    C->>L: LLM inference (context + query)
+    
+    alt Streaming Enabled
+        loop For each token
+            L-->>C: Token
+            C->>N: Publish ai.chat.response.stream.{id}
+            N-->>W: Deliver chunk
+            W-->>U: Display token
+        end
+    else Non-streaming
+        L-->>C: Full response
+        C->>N: Publish ai.chat.response.{id}
+        N-->>W: Deliver response
+        W-->>U: Display response
+    end
+    
+    C->>V: Save to session history