Chat Handler

Text-based chat pipeline for the DaviesTechLabs AI/ML platform.

Overview

A NATS-based service that handles chat completion requests with RAG (Retrieval Augmented Generation). It uses the handler-base library for standardized NATS handling, telemetry, and health checks.

Pipeline: Query → Embeddings → Milvus → Rerank → LLM → (optional TTS)

Architecture

NATS (ai.chat.request)
        │
        ▼
┌───────────────────┐
│   Chat Handler    │
└───────────────────┘
        │
        ├──▶ BGE Embeddings (drizzt)
        │         │
        │         ▼
        ├──▶ Milvus Vector Search
        │         │
        │         ▼
        ├──▶ BGE Reranker (danilo)
        │         │
        │         ▼
        ├──▶ vLLM (khelben)
        │         │
        │         ▼ (optional)
        └──▶ XTTS TTS (elminster)
                  │
                  ▼
         NATS (ai.chat.response.{id})

NATS Message Format

Request (ai.chat.request)

{
  "request_id": "uuid",
  "query": "What is the capital of France?",
  "collection": "knowledge_base",
  "enable_tts": false,
  "system_prompt": "Optional custom system prompt"
}

Response (ai.chat.response.{request_id})

{
  "request_id": "uuid",
  "response": "The capital of France is Paris.",
  "sources": [
    {"text": "Paris is the capital...", "score": 0.95}
  ],
  "audio": "base64-encoded-audio (if TTS enabled)"
}

Configuration

Environment Variable Default Description
NATS_URL nats://nats.ai-ml.svc.cluster.local:4222 NATS server
EMBEDDINGS_URL http://embeddings-predictor.ai-ml.svc.cluster.local Embeddings
RERANKER_URL http://reranker-predictor.ai-ml.svc.cluster.local Reranker
VLLM_URL http://llm-draft.ai-ml.svc.cluster.local:8000 LLM service
TTS_URL http://tts-predictor.ai-ml.svc.cluster.local TTS (optional)
MILVUS_HOST milvus.ai-ml.svc.cluster.local Vector DB
COLLECTION_NAME knowledge_base Default Milvus collection
ENABLE_TTS false Enable audio responses

Building

docker build -t chat-handler:latest .

# With specific handler-base tag
docker build --build-arg BASE_TAG=latest -t chat-handler:latest .
Description
No description provided
Readme MIT 346 KiB
Languages
Go 97.7%
Dockerfile 2.3%