%% Voice Request Data Flow
%% Sequence diagram showing voice assistant processing
sequenceDiagram
autonumber
participant U as User
participant W as Voice WebApp
participant N as NATS
participant VA as Voice Assistant
participant STT as Whisper
(STT)
participant E as BGE Embeddings
participant M as Milvus
participant R as Reranker
participant L as vLLM
participant TTS as XTTS
(TTS)
U->>W: Record audio
W->>N: Publish ai.voice.user.{id}.request
(msgpack with audio bytes)
N->>VA: Deliver voice request
VA->>STT: Transcribe audio
STT-->>VA: Transcription text
alt RAG Enabled
VA->>E: Generate query embedding
E-->>VA: Query vector
VA->>M: Search similar chunks
M-->>VA: Top-K chunks
opt Reranker Enabled
VA->>R: Rerank chunks
R-->>VA: Reordered chunks
end
end
VA->>L: LLM inference
L-->>VA: Response text
VA->>TTS: Synthesize speech
TTS-->>VA: Audio bytes
VA->>N: Publish ai.voice.response.{id}
(text + audio)
N-->>W: Deliver response
W-->>U: Play audio + show text
Note over VA,TTS: Total latency target: < 3s