- Add GET /health endpoint returning model name and GPU status
- Don't pass language/speaker to Coqui TTS when model doesn't support
multilingual/multi-speaker (fixes 500 on ljspeech/tacotron2-DDC)
- Applied to all three endpoints: POST /, GET /api/tts, POST /stream
- Add POST /stream SSE endpoint that splits text into sentences,
synthesizes each individually, and streams base64 WAV via SSE events
- Add _split_sentences() helper for robust sentence boundary detection
- Enables progressive audio playback for lower time-to-first-audio
- Add FastAPI ingress to TTSDeployment with two routes:
POST / — JSON API with base64 audio (backward compat)
GET /api/tts?text=&language_id= — raw WAV bytes (zero overhead)
- GET /speakers endpoint for speaker listing
- Properly uses _fastapi naming to avoid collision with Ray binding
- app = TTSDeployment.bind() for rayservice.yaml compatibility
The strixhalo LLM worker uses py_executable pointing to the Docker
image venv which doesn't have the updated ray-serve-apps package.
Wrap all InferenceLogger imports in try/except and guard usage with
None checks so apps degrade gracefully without MLflow logging.
Implements ADR-0024: Ray Repository Structure
- Ray Serve deployments for GPU-shared AI inference
- Published as PyPI package for dynamic code loading
- Deployments: LLM, embeddings, reranker, whisper, TTS
- CI/CD workflow publishes to Gitea PyPI on push to main
Extracted from kuberay-images repo per ADR-0024