Implements ADR-0024: Ray Repository Structure - Ray Serve deployments for GPU-shared AI inference - Published as PyPI package for dynamic code loading - Deployments: LLM, embeddings, reranker, whisper, TTS - CI/CD workflow publishes to Gitea PyPI on push to main Extracted from kuberay-images repo per ADR-0024
88 lines
2.7 KiB
Markdown
88 lines
2.7 KiB
Markdown
# ray-serve-apps
|
|
|
|
Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.
|
|
|
|
## Architecture
|
|
|
|
This repo contains **application code only** - no Docker images or Kubernetes manifests.
|
|
|
|
```
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
│ kuberay-images │ │ ray-serve │ │ homelab-k8s2 │
|
|
│ │ │ │ │ │
|
|
│ Docker images │ │ PyPI package │ │ K8s manifests │
|
|
│ (GPU runtimes) │ │ (this repo) │ │ (deployment) │
|
|
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
Container Registry PyPI Registry GitOps (Flux)
|
|
```
|
|
|
|
## Deployments
|
|
|
|
| Module | Purpose | Hardware Target |
|
|
|--------|---------|-----------------|
|
|
| `serve_llm` | vLLM OpenAI-compatible API | Strix Halo (ROCm) |
|
|
| `serve_embeddings` | Sentence Transformers | Any GPU |
|
|
| `serve_reranker` | Cross-encoder reranking | Any GPU |
|
|
| `serve_whisper` | Faster Whisper STT | NVIDIA/Intel |
|
|
| `serve_tts` | Coqui TTS | Any GPU |
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
# From Gitea PyPI
|
|
pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
|
|
|
|
# With optional dependencies
|
|
pip install ray-serve-apps[llm] # vLLM support
|
|
pip install ray-serve-apps[embeddings] # Sentence Transformers
|
|
pip install ray-serve-apps[stt] # Faster Whisper
|
|
pip install ray-serve-apps[tts] # Coqui TTS
|
|
```
|
|
|
|
## Usage
|
|
|
|
Ray clusters pull this package at runtime:
|
|
|
|
```yaml
|
|
# In RayService spec
|
|
rayClusterConfig:
|
|
headGroupSpec:
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: ray-head
|
|
command:
|
|
- /bin/bash
|
|
- -c
|
|
- |
|
|
pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
|
|
ray start --head --dashboard-host=0.0.0.0
|
|
serve run ray_serve.serve_llm:app
|
|
```
|
|
|
|
## Development
|
|
|
|
```bash
|
|
# Install dev dependencies
|
|
pip install -e ".[dev]"
|
|
|
|
# Lint
|
|
ruff check .
|
|
ruff format .
|
|
|
|
# Test
|
|
pytest
|
|
```
|
|
|
|
## Publishing
|
|
|
|
Pushes to `main` automatically publish to Gitea PyPI via CI/CD.
|
|
|
|
To bump version, edit `pyproject.toml`:
|
|
```toml
|
|
[project]
|
|
version = "1.1.0"
|
|
```
|