%% Voice Request Data Flow %% Sequence diagram showing voice assistant processing sequenceDiagram autonumber participant U as User participant W as Voice WebApp participant N as NATS participant VA as Voice Assistant participant STT as Whisper
(STT) participant E as BGE Embeddings participant M as Milvus participant R as Reranker participant L as vLLM participant TTS as XTTS
(TTS) U->>W: Record audio W->>N: Publish ai.voice.user.{id}.request
(msgpack with audio bytes) N->>VA: Deliver voice request VA->>STT: Transcribe audio STT-->>VA: Transcription text alt RAG Enabled VA->>E: Generate query embedding E-->>VA: Query vector VA->>M: Search similar chunks M-->>VA: Top-K chunks opt Reranker Enabled VA->>R: Rerank chunks R-->>VA: Reordered chunks end end VA->>L: LLM inference L-->>VA: Response text VA->>TTS: Synthesize speech TTS-->>VA: Audio bytes VA->>N: Publish ai.voice.response.{id}
(text + audio) N-->>W: Deliver response W-->>U: Play audio + show text Note over VA,TTS: Total latency target: < 3s