voice-assistant/README.md

# Voice Assistant

End-to-end voice assistant pipeline for the DaviesTechLabs AI/ML platform.

## Components

### Real-time Handler (NATS-based)

The voice assistant service listens on NATS for audio requests and returns synthesized speech responses. It uses the [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base) library for standardized NATS handling, telemetry, and health checks.

**Pipeline:** STT → Embeddings → Milvus RAG → Rerank → LLM → TTS

### Kubeflow Pipeline (Batch)

For batch processing or async workflows via Kubeflow Pipelines.

| Pipeline | Description |
|----------|-------------|
| `voice_pipeline.yaml` | Full STT → RAG → TTS pipeline |
| `rag_pipeline.yaml` | Text-only RAG pipeline |
| `tts_pipeline.yaml` | Simple text-to-speech |

```bash
# Compile pipelines
cd pipelines
pip install kfp==2.12.1
python voice_pipeline.py
```

## Architecture

```
NATS (voice.request)
        │
        ▼
┌───────────────────┐
│  Voice Assistant  │
│    Handler        │
└───────────────────┘
        │
        ├──▶ Whisper STT (elminster)
        │         │
        │         ▼
        ├──▶ BGE Embeddings (drizzt)
        │         │
        │         ▼
        ├──▶ Milvus Vector Search
        │         │
        │         ▼
        ├──▶ BGE Reranker (danilo)
        │         │
        │         ▼
        ├──▶ vLLM (khelben)
        │         │
        │         ▼
        └──▶ XTTS TTS (elminster)
                  │
                  ▼
         NATS (voice.response.{id})
```

## Configuration

| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `NATS_URL` | `nats://nats.ai-ml.svc.cluster.local:4222` | NATS server |
| `WHISPER_URL` | `http://whisper-predictor.ai-ml.svc.cluster.local` | STT service |
| `EMBEDDINGS_URL` | `http://embeddings-predictor.ai-ml.svc.cluster.local` | Embeddings |
| `RERANKER_URL` | `http://reranker-predictor.ai-ml.svc.cluster.local` | Reranker |
| `VLLM_URL` | `http://llm-draft.ai-ml.svc.cluster.local:8000` | LLM service |
| `TTS_URL` | `http://tts-predictor.ai-ml.svc.cluster.local` | TTS service |
| `MILVUS_HOST` | `milvus.ai-ml.svc.cluster.local` | Vector DB |
| `COLLECTION_NAME` | `knowledge_base` | Milvus collection |

## NATS Message Format

### Request (voice.request)

```json
{
  "request_id": "uuid",
  "audio": "base64-encoded-audio",
  "language": "en",
  "collection": "knowledge_base"
}
```

### Response (voice.response.{request_id})

```json
{
  "request_id": "uuid",
  "transcription": "user question",
  "response": "assistant answer",
  "audio": "base64-encoded-audio"
}
```

## Building

```bash
docker build -t voice-assistant:latest .

# With specific handler-base tag
docker build --build-arg BASE_TAG=latest -t voice-assistant:latest .
```

## Related

- [homelab-design](https://git.daviestechlabs.io/daviestechlabs/homelab-design) - Architecture docs
- [kuberay-images](https://git.daviestechlabs.io/daviestechlabs/kuberay-images) - Ray worker images
- [handler-base](https://github.com/Billy-Davies-2/llm-workflows/tree/main/handler-base) - Base handler library