- Replace msgpack encoding with protobuf wire format
- Update field names to proto convention (UserId, RequestId, EnableRag, etc.)
- Use messages.EffectiveQuery() standalone function
- Cast TopK to int32 for proto compatibility
- Rewrite tests for proto round-trips
Use handler-base StreamGenerate() to publish real token-by-token
ChatStreamChunk messages to NATS as they arrive from Ray Serve,
instead of calling Generate() and splitting into 4-word chunks.
Add 8 streaming tests: happy path, system prompt, RAG context,
nil callback, timeout, HTTP error, context canceled, fallback.
Replace Python chat handler with Go for smaller container images.
Uses handler-base Go module for NATS, health, telemetry, and service clients.
- RAG pipeline: embed → Milvus → rerank → LLM
- Streaming response chunks
- Optional TTS synthesis
- Custom response_subject support for companions-frontend