feat: initial ray-serve-apps PyPI package
Implements ADR-0024: Ray Repository Structure - Ray Serve deployments for GPU-shared AI inference - Published as PyPI package for dynamic code loading - Deployments: LLM, embeddings, reranker, whisper, TTS - CI/CD workflow publishes to Gitea PyPI on push to main Extracted from kuberay-images repo per ADR-0024
This commit is contained in:
87
README.md
87
README.md
@@ -1,2 +1,87 @@
|
||||
# ray-serve
|
||||
# ray-serve-apps
|
||||
|
||||
Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.
|
||||
|
||||
## Architecture
|
||||
|
||||
This repo contains **application code only** - no Docker images or Kubernetes manifests.
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ kuberay-images │ │ ray-serve │ │ homelab-k8s2 │
|
||||
│ │ │ │ │ │
|
||||
│ Docker images │ │ PyPI package │ │ K8s manifests │
|
||||
│ (GPU runtimes) │ │ (this repo) │ │ (deployment) │
|
||||
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Container Registry PyPI Registry GitOps (Flux)
|
||||
```
|
||||
|
||||
## Deployments
|
||||
|
||||
| Module | Purpose | Hardware Target |
|
||||
|--------|---------|-----------------|
|
||||
| `serve_llm` | vLLM OpenAI-compatible API | Strix Halo (ROCm) |
|
||||
| `serve_embeddings` | Sentence Transformers | Any GPU |
|
||||
| `serve_reranker` | Cross-encoder reranking | Any GPU |
|
||||
| `serve_whisper` | Faster Whisper STT | NVIDIA/Intel |
|
||||
| `serve_tts` | Coqui TTS | Any GPU |
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# From Gitea PyPI
|
||||
pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
|
||||
|
||||
# With optional dependencies
|
||||
pip install ray-serve-apps[llm] # vLLM support
|
||||
pip install ray-serve-apps[embeddings] # Sentence Transformers
|
||||
pip install ray-serve-apps[stt] # Faster Whisper
|
||||
pip install ray-serve-apps[tts] # Coqui TTS
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Ray clusters pull this package at runtime:
|
||||
|
||||
```yaml
|
||||
# In RayService spec
|
||||
rayClusterConfig:
|
||||
headGroupSpec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: ray-head
|
||||
command:
|
||||
- /bin/bash
|
||||
- -c
|
||||
- |
|
||||
pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
|
||||
ray start --head --dashboard-host=0.0.0.0
|
||||
serve run ray_serve.serve_llm:app
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Install dev dependencies
|
||||
pip install -e ".[dev]"
|
||||
|
||||
# Lint
|
||||
ruff check .
|
||||
ruff format .
|
||||
|
||||
# Test
|
||||
pytest
|
||||
```
|
||||
|
||||
## Publishing
|
||||
|
||||
Pushes to `main` automatically publish to Gitea PyPI via CI/CD.
|
||||
|
||||
To bump version, edit `pyproject.toml`:
|
||||
```toml
|
||||
[project]
|
||||
version = "1.1.0"
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user