Files
homelab-design/diagrams/data-flow-chat.mmd
Billy D. 832cda34bd feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00

52 lines
1.3 KiB
Plaintext

%% Chat Request Data Flow
%% Sequence diagram showing chat message processing
sequenceDiagram
autonumber
participant U as User
participant W as WebApp<br/>(companions)
participant N as NATS
participant C as Chat Handler
participant V as Valkey<br/>(Cache)
participant E as BGE Embeddings
participant M as Milvus
participant R as Reranker
participant L as vLLM
U->>W: Send message
W->>N: Publish ai.chat.user.{id}.message
N->>C: Deliver message
C->>V: Get session history
V-->>C: Previous messages
alt RAG Enabled
C->>E: Generate query embedding
E-->>C: Query vector
C->>M: Search similar chunks
M-->>C: Top-K chunks
opt Reranker Enabled
C->>R: Rerank chunks
R-->>C: Reordered chunks
end
end
C->>L: LLM inference (context + query)
alt Streaming Enabled
loop For each token
L-->>C: Token
C->>N: Publish ai.chat.response.stream.{id}
N-->>W: Deliver chunk
W-->>U: Display token
end
else Non-streaming
L-->>C: Full response
C->>N: Publish ai.chat.response.{id}
N-->>W: Deliver response
W-->>U: Display response
end
C->>V: Save to session history