Files

Billy D. 6ef42b3d2c feat: Add chat handler with RAG pipeline

- chat_handler.py: Standalone NATS handler with RAG
- chat_handler_v2.py: Handler-base implementation
- Dockerfiles for both versions

Pipeline: Embeddings → Milvus → Rerank → LLM → (optional TTS)

2026-02-01 20:37:34 -05:00

3.1 KiB

Raw Blame History

Chat Handler

Text-based chat pipeline for the DaviesTechLabs AI/ML platform.

Overview

A NATS-based service that handles chat completion requests with RAG (Retrieval Augmented Generation).

Pipeline: Query → Embeddings → Milvus → Rerank → LLM → (optional TTS)

Versions

File	Description
`chat_handler.py`	Standalone implementation (v1)
`chat_handler_v2.py`	Uses handler-base library (recommended)
`Dockerfile`	Standalone image
`Dockerfile.v2`	Handler-base image

Architecture

NATS (ai.chat.request)
        │
        ▼
┌───────────────────┐
│   Chat Handler    │
└───────────────────┘
        │
        ├──▶ BGE Embeddings (drizzt)
        │         │
        │         ▼
        ├──▶ Milvus Vector Search
        │         │
        │         ▼
        ├──▶ BGE Reranker (danilo)
        │         │
        │         ▼
        ├──▶ vLLM (khelben)
        │         │
        │         ▼ (optional)
        └──▶ XTTS TTS (elminster)
                  │
                  ▼
         NATS (ai.chat.response.{id})

NATS Message Format

Request (ai.chat.request)

{
  "request_id": "uuid",
  "query": "What is the capital of France?",
  "collection": "knowledge_base",
  "enable_tts": false,
  "system_prompt": "Optional custom system prompt"
}

Response (ai.chat.response.{request_id})

{
  "request_id": "uuid",
  "response": "The capital of France is Paris.",
  "sources": [
    {"text": "Paris is the capital...", "score": 0.95}
  ],
  "audio": "base64-encoded-audio (if TTS enabled)"
}

Configuration

Environment Variable	Default	Description
`NATS_URL`	`nats://nats.ai-ml.svc.cluster.local:4222`	NATS server
`EMBEDDINGS_URL`	`http://embeddings-predictor.ai-ml.svc.cluster.local`	Embeddings
`RERANKER_URL`	`http://reranker-predictor.ai-ml.svc.cluster.local`	Reranker
`VLLM_URL`	`http://llm-draft.ai-ml.svc.cluster.local:8000`	LLM service
`TTS_URL`	`http://tts-predictor.ai-ml.svc.cluster.local`	TTS (optional)
`MILVUS_HOST`	`milvus.ai-ml.svc.cluster.local`	Vector DB
`COLLECTION_NAME`	`knowledge_base`	Default Milvus collection
`ENABLE_TTS`	`false`	Enable audio responses

Building

# Standalone image (v1)
docker build -f Dockerfile -t chat-handler:latest .

# Handler-base image (v2 - recommended)
docker build -f Dockerfile.v2 -t chat-handler:v2 .

Dependencies

The v2 handler depends on handler-base:

pip install git+https://git.daviestechlabs.io/daviestechlabs/handler-base.git

handler-base - Base handler library
voice-assistant - Voice pipeline
homelab-design - Architecture docs

3.1 KiB Raw Blame History