Files
ray-serve/README.md
Billy D. 8ef914ec12
Some checks failed
Build and Publish ray-serve-apps / lint (push) Failing after 11m2s
Build and Publish ray-serve-apps / publish (push) Has been cancelled
feat: initial ray-serve-apps PyPI package
Implements ADR-0024: Ray Repository Structure

- Ray Serve deployments for GPU-shared AI inference
- Published as PyPI package for dynamic code loading
- Deployments: LLM, embeddings, reranker, whisper, TTS
- CI/CD workflow publishes to Gitea PyPI on push to main

Extracted from kuberay-images repo per ADR-0024
2026-02-03 07:03:39 -05:00

88 lines
2.7 KiB
Markdown

# ray-serve-apps
Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.
## Architecture
This repo contains **application code only** - no Docker images or Kubernetes manifests.
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ kuberay-images │ │ ray-serve │ │ homelab-k8s2 │
│ │ │ │ │ │
│ Docker images │ │ PyPI package │ │ K8s manifests │
│ (GPU runtimes) │ │ (this repo) │ │ (deployment) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
▼ ▼ ▼
Container Registry PyPI Registry GitOps (Flux)
```
## Deployments
| Module | Purpose | Hardware Target |
|--------|---------|-----------------|
| `serve_llm` | vLLM OpenAI-compatible API | Strix Halo (ROCm) |
| `serve_embeddings` | Sentence Transformers | Any GPU |
| `serve_reranker` | Cross-encoder reranking | Any GPU |
| `serve_whisper` | Faster Whisper STT | NVIDIA/Intel |
| `serve_tts` | Coqui TTS | Any GPU |
## Installation
```bash
# From Gitea PyPI
pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
# With optional dependencies
pip install ray-serve-apps[llm] # vLLM support
pip install ray-serve-apps[embeddings] # Sentence Transformers
pip install ray-serve-apps[stt] # Faster Whisper
pip install ray-serve-apps[tts] # Coqui TTS
```
## Usage
Ray clusters pull this package at runtime:
```yaml
# In RayService spec
rayClusterConfig:
headGroupSpec:
template:
spec:
containers:
- name: ray-head
command:
- /bin/bash
- -c
- |
pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
ray start --head --dashboard-host=0.0.0.0
serve run ray_serve.serve_llm:app
```
## Development
```bash
# Install dev dependencies
pip install -e ".[dev]"
# Lint
ruff check .
ruff format .
# Test
pytest
```
## Publishing
Pushes to `main` automatically publish to Gitea PyPI via CI/CD.
To bump version, edit `pyproject.toml`:
```toml
[project]
version = "1.1.0"
```