Go to file

Build and Publish ray-serve-apps / build-and-publish (push) Successful in 2m5s

Details

feat: FastAPI ingress for TTS — GET /api/tts returns raw WAV

- Add FastAPI ingress to TTSDeployment with two routes:
  POST / — JSON API with base64 audio (backward compat)
  GET /api/tts?text=&language_id= — raw WAV bytes (zero overhead)
- GET /speakers endpoint for speaker listing
- Properly uses _fastapi naming to avoid collision with Ray binding
- app = TTSDeployment.bind() for rayservice.yaml compatibility

2026-02-21 12:49:44 -05:00

.gitea/workflows

ci: semver based on commit message keywords

2026-02-03 15:25:15 -05:00

ray_serve

feat: FastAPI ingress for TTS — GET /api/tts returns raw WAV

2026-02-21 12:49:44 -05:00

LICENSE

Initial commit

2026-02-03 11:59:56 +00:00

pyproject.toml

fixing coqui

2026-02-09 09:14:30 -05:00

README.md

feat: initial ray-serve-apps PyPI package

2026-02-03 07:03:39 -05:00

renovate.json

chore: add Renovate config for automated dependency updates

2026-02-13 15:34:08 -05:00

requirements.txt

fixing coqui

2026-02-09 09:14:30 -05:00

README.md

ray-serve-apps

Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.

Architecture

This repo contains application code only - no Docker images or Kubernetes manifests.

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  kuberay-images │     │   ray-serve     │     │  homelab-k8s2   │
│                 │     │                 │     │                 │
│  Docker images  │     │  PyPI package   │     │  K8s manifests  │
│  (GPU runtimes) │     │  (this repo)    │     │  (deployment)   │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         ▼                       ▼                       ▼
   Container Registry      PyPI Registry           GitOps (Flux)

Deployments

Module	Purpose	Hardware Target
`serve_llm`	vLLM OpenAI-compatible API	Strix Halo (ROCm)
`serve_embeddings`	Sentence Transformers	Any GPU
`serve_reranker`	Cross-encoder reranking	Any GPU
`serve_whisper`	Faster Whisper STT	NVIDIA/Intel
`serve_tts`	Coqui TTS	Any GPU

Installation

# From Gitea PyPI
pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple

# With optional dependencies
pip install ray-serve-apps[llm]        # vLLM support
pip install ray-serve-apps[embeddings] # Sentence Transformers
pip install ray-serve-apps[stt]        # Faster Whisper
pip install ray-serve-apps[tts]        # Coqui TTS

Usage

Ray clusters pull this package at runtime:

# In RayService spec
rayClusterConfig:
  headGroupSpec:
    template:
      spec:
        containers:
          - name: ray-head
            command:
              - /bin/bash
              - -c
              - |
                pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
                ray start --head --dashboard-host=0.0.0.0
                serve run ray_serve.serve_llm:app

Development

# Install dev dependencies
pip install -e ".[dev]"

# Lint
ruff check .
ruff format .

# Test
pytest

Publishing

Pushes to main automatically publish to Gitea PyPI via CI/CD.

To bump version, edit pyproject.toml:

[project]
version = "1.1.0"