feat: initial ray-serve-apps PyPI package
Some checks failed
Build and Publish ray-serve-apps / lint (push) Failing after 11m2s
Build and Publish ray-serve-apps / publish (push) Has been cancelled

Implements ADR-0024: Ray Repository Structure

- Ray Serve deployments for GPU-shared AI inference
- Published as PyPI package for dynamic code loading
- Deployments: LLM, embeddings, reranker, whisper, TTS
- CI/CD workflow publishes to Gitea PyPI on push to main

Extracted from kuberay-images repo per ADR-0024
This commit is contained in:
2026-02-03 07:03:39 -05:00
parent eac8f27f2e
commit 8ef914ec12
11 changed files with 887 additions and 1 deletions

View File

@@ -1,2 +1,87 @@
# ray-serve
# ray-serve-apps
Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.
## Architecture
This repo contains **application code only** - no Docker images or Kubernetes manifests.
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ kuberay-images │ │ ray-serve │ │ homelab-k8s2 │
│ │ │ │ │ │
│ Docker images │ │ PyPI package │ │ K8s manifests │
│ (GPU runtimes) │ │ (this repo) │ │ (deployment) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
▼ ▼ ▼
Container Registry PyPI Registry GitOps (Flux)
```
## Deployments
| Module | Purpose | Hardware Target |
|--------|---------|-----------------|
| `serve_llm` | vLLM OpenAI-compatible API | Strix Halo (ROCm) |
| `serve_embeddings` | Sentence Transformers | Any GPU |
| `serve_reranker` | Cross-encoder reranking | Any GPU |
| `serve_whisper` | Faster Whisper STT | NVIDIA/Intel |
| `serve_tts` | Coqui TTS | Any GPU |
## Installation
```bash
# From Gitea PyPI
pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
# With optional dependencies
pip install ray-serve-apps[llm] # vLLM support
pip install ray-serve-apps[embeddings] # Sentence Transformers
pip install ray-serve-apps[stt] # Faster Whisper
pip install ray-serve-apps[tts] # Coqui TTS
```
## Usage
Ray clusters pull this package at runtime:
```yaml
# In RayService spec
rayClusterConfig:
headGroupSpec:
template:
spec:
containers:
- name: ray-head
command:
- /bin/bash
- -c
- |
pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
ray start --head --dashboard-host=0.0.0.0
serve run ray_serve.serve_llm:app
```
## Development
```bash
# Install dev dependencies
pip install -e ".[dev]"
# Lint
ruff check .
ruff format .
# Test
pytest
```
## Publishing
Pushes to `main` automatically publish to Gitea PyPI via CI/CD.
To bump version, edit `pyproject.toml`:
```toml
[project]
version = "1.1.0"
```