- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
47 lines
1.2 KiB
Plaintext
47 lines
1.2 KiB
Plaintext
%% Voice Request Data Flow
|
|
%% Sequence diagram showing voice assistant processing
|
|
|
|
sequenceDiagram
|
|
autonumber
|
|
participant U as User
|
|
participant W as Voice WebApp
|
|
participant N as NATS
|
|
participant VA as Voice Assistant
|
|
participant STT as Whisper<br/>(STT)
|
|
participant E as BGE Embeddings
|
|
participant M as Milvus
|
|
participant R as Reranker
|
|
participant L as vLLM
|
|
participant TTS as XTTS<br/>(TTS)
|
|
|
|
U->>W: Record audio
|
|
W->>N: Publish ai.voice.user.{id}.request<br/>(msgpack with audio bytes)
|
|
N->>VA: Deliver voice request
|
|
|
|
VA->>STT: Transcribe audio
|
|
STT-->>VA: Transcription text
|
|
|
|
alt RAG Enabled
|
|
VA->>E: Generate query embedding
|
|
E-->>VA: Query vector
|
|
VA->>M: Search similar chunks
|
|
M-->>VA: Top-K chunks
|
|
|
|
opt Reranker Enabled
|
|
VA->>R: Rerank chunks
|
|
R-->>VA: Reordered chunks
|
|
end
|
|
end
|
|
|
|
VA->>L: LLM inference
|
|
L-->>VA: Response text
|
|
|
|
VA->>TTS: Synthesize speech
|
|
TTS-->>VA: Audio bytes
|
|
|
|
VA->>N: Publish ai.voice.response.{id}<br/>(text + audio)
|
|
N-->>W: Deliver response
|
|
W-->>U: Play audio + show text
|
|
|
|
Note over VA,TTS: Total latency target: < 3s
|