- chat_handler.py: Standalone NATS handler with RAG - chat_handler_v2.py: Handler-base implementation - Dockerfiles for both versions Pipeline: Embeddings → Milvus → Rerank → LLM → (optional TTS)
3.1 KiB
3.1 KiB
Chat Handler
Text-based chat pipeline for the DaviesTechLabs AI/ML platform.
Overview
A NATS-based service that handles chat completion requests with RAG (Retrieval Augmented Generation).
Pipeline: Query → Embeddings → Milvus → Rerank → LLM → (optional TTS)
Versions
| File | Description |
|---|---|
chat_handler.py |
Standalone implementation (v1) |
chat_handler_v2.py |
Uses handler-base library (recommended) |
Dockerfile |
Standalone image |
Dockerfile.v2 |
Handler-base image |
Architecture
NATS (ai.chat.request)
│
▼
┌───────────────────┐
│ Chat Handler │
└───────────────────┘
│
├──▶ BGE Embeddings (drizzt)
│ │
│ ▼
├──▶ Milvus Vector Search
│ │
│ ▼
├──▶ BGE Reranker (danilo)
│ │
│ ▼
├──▶ vLLM (khelben)
│ │
│ ▼ (optional)
└──▶ XTTS TTS (elminster)
│
▼
NATS (ai.chat.response.{id})
NATS Message Format
Request (ai.chat.request)
{
"request_id": "uuid",
"query": "What is the capital of France?",
"collection": "knowledge_base",
"enable_tts": false,
"system_prompt": "Optional custom system prompt"
}
Response (ai.chat.response.{request_id})
{
"request_id": "uuid",
"response": "The capital of France is Paris.",
"sources": [
{"text": "Paris is the capital...", "score": 0.95}
],
"audio": "base64-encoded-audio (if TTS enabled)"
}
Configuration
| Environment Variable | Default | Description |
|---|---|---|
NATS_URL |
nats://nats.ai-ml.svc.cluster.local:4222 |
NATS server |
EMBEDDINGS_URL |
http://embeddings-predictor.ai-ml.svc.cluster.local |
Embeddings |
RERANKER_URL |
http://reranker-predictor.ai-ml.svc.cluster.local |
Reranker |
VLLM_URL |
http://llm-draft.ai-ml.svc.cluster.local:8000 |
LLM service |
TTS_URL |
http://tts-predictor.ai-ml.svc.cluster.local |
TTS (optional) |
MILVUS_HOST |
milvus.ai-ml.svc.cluster.local |
Vector DB |
COLLECTION_NAME |
knowledge_base |
Default Milvus collection |
ENABLE_TTS |
false |
Enable audio responses |
Building
# Standalone image (v1)
docker build -f Dockerfile -t chat-handler:latest .
# Handler-base image (v2 - recommended)
docker build -f Dockerfile.v2 -t chat-handler:v2 .
Dependencies
The v2 handler depends on handler-base:
pip install git+https://git.daviestechlabs.io/daviestechlabs/handler-base.git
Related
- handler-base - Base handler library
- voice-assistant - Voice pipeline
- homelab-design - Architecture docs