ray-serve

Author	SHA1	Message	Date
Billy D.	84ffeca8f2	fix(tts): add /health endpoint, fix language param for single-lang models All checks were successful Build and Publish ray-serve-apps / build-and-publish (push) Successful in 1m54s Details - Add GET /health endpoint returning model name and GPU status - Don't pass language/speaker to Coqui TTS when model doesn't support multilingual/multi-speaker (fixes 500 on ljspeech/tacotron2-DDC) - Applied to all three endpoints: POST /, GET /api/tts, POST /stream	2026-02-22 12:19:06 -05:00
Billy D.	194a431e8c	feat(tts): add streaming SSE endpoint and sentence splitter All checks were successful Build and Publish ray-serve-apps / build-and-publish (push) Successful in 6m53s Details - Add POST /stream SSE endpoint that splits text into sentences, synthesizes each individually, and streams base64 WAV via SSE events - Add _split_sentences() helper for robust sentence boundary detection - Enables progressive audio playback for lower time-to-first-audio	2026-02-22 10:45:58 -05:00
Billy D.	0fb325fa05	feat: FastAPI ingress for TTS — GET /api/tts returns raw WAV All checks were successful Build and Publish ray-serve-apps / build-and-publish (push) Successful in 2m5s Details - Add FastAPI ingress to TTSDeployment with two routes: POST / — JSON API with base64 audio (backward compat) GET /api/tts?text=&language_id= — raw WAV bytes (zero overhead) - GET /speakers endpoint for speaker listing - Properly uses _fastapi naming to avoid collision with Ray binding - app = TTSDeployment.bind() for rayservice.yaml compatibility	2026-02-21 12:49:44 -05:00
Billy D.	15e4b8afa3	fix: make mlflow_logger import optional with no-op fallback All checks were successful Build and Publish ray-serve-apps / build-and-publish (push) Successful in 11s Details The strixhalo LLM worker uses py_executable pointing to the Docker image venv which doesn't have the updated ray-serve-apps package. Wrap all InferenceLogger imports in try/except and guard usage with None checks so apps degrade gracefully without MLflow logging.	2026-02-12 07:01:17 -05:00
Billy D.	7ec2107e0c	feat: add MLflow inference logging to all Ray Serve apps All checks were successful Build and Publish ray-serve-apps / build-and-publish (push) Successful in 16s Details - Add mlflow_logger.py: lightweight REST-based MLflow logger (no mlflow dep) - Instrument serve_llm.py with latency, token counts, tokens/sec metrics - Instrument serve_embeddings.py with latency, batch_size, total_tokens - Instrument serve_whisper.py with latency, audio_duration, realtime_factor - Instrument serve_tts.py with latency, audio_duration, text_chars - Instrument serve_reranker.py with latency, num_pairs, top_k	2026-02-12 06:14:30 -05:00
Billy D.	8ef914ec12	feat: initial ray-serve-apps PyPI package Some checks failed Build and Publish ray-serve-apps / lint (push) Failing after 11m2s Details Build and Publish ray-serve-apps / publish (push) Has been cancelled Details Implements ADR-0024: Ray Repository Structure - Ray Serve deployments for GPU-shared AI inference - Published as PyPI package for dynamic code loading - Deployments: LLM, embeddings, reranker, whisper, TTS - CI/CD workflow publishes to Gitea PyPI on push to main Extracted from kuberay-images repo per ADR-0024	2026-02-03 07:03:39 -05:00

6 Commits