feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
This commit is contained in:
51
diagrams/data-flow-chat.mmd
Normal file
51
diagrams/data-flow-chat.mmd
Normal file
@@ -0,0 +1,51 @@
|
||||
%% Chat Request Data Flow
|
||||
%% Sequence diagram showing chat message processing
|
||||
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant U as User
|
||||
participant W as WebApp<br/>(companions)
|
||||
participant N as NATS
|
||||
participant C as Chat Handler
|
||||
participant V as Valkey<br/>(Cache)
|
||||
participant E as BGE Embeddings
|
||||
participant M as Milvus
|
||||
participant R as Reranker
|
||||
participant L as vLLM
|
||||
|
||||
U->>W: Send message
|
||||
W->>N: Publish ai.chat.user.{id}.message
|
||||
N->>C: Deliver message
|
||||
|
||||
C->>V: Get session history
|
||||
V-->>C: Previous messages
|
||||
|
||||
alt RAG Enabled
|
||||
C->>E: Generate query embedding
|
||||
E-->>C: Query vector
|
||||
C->>M: Search similar chunks
|
||||
M-->>C: Top-K chunks
|
||||
|
||||
opt Reranker Enabled
|
||||
C->>R: Rerank chunks
|
||||
R-->>C: Reordered chunks
|
||||
end
|
||||
end
|
||||
|
||||
C->>L: LLM inference (context + query)
|
||||
|
||||
alt Streaming Enabled
|
||||
loop For each token
|
||||
L-->>C: Token
|
||||
C->>N: Publish ai.chat.response.stream.{id}
|
||||
N-->>W: Deliver chunk
|
||||
W-->>U: Display token
|
||||
end
|
||||
else Non-streaming
|
||||
L-->>C: Full response
|
||||
C->>N: Publish ai.chat.response.{id}
|
||||
N-->>W: Deliver response
|
||||
W-->>U: Display response
|
||||
end
|
||||
|
||||
C->>V: Save to session history
|
||||
Reference in New Issue
Block a user