feat: initial ray-serve-apps PyPI package

Implements ADR-0024: Ray Repository Structure - Ray Serve deployments for GPU-shared AI inference - Published as PyPI package for dynamic code loading - Deployments: LLM, embeddings, reranker, whisper, TTS - CI/CD workflow publishes to Gitea PyPI on push to main Extracted from kuberay-images repo per ADR-0024
2026-02-03 07:03:39 -05:00
parent eac8f27f2e
commit 8ef914ec12
11 changed files with 887 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,87 @@
-# ray-serve
+# ray-serve-apps

+Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.
+
+## Architecture
+
+This repo contains **application code only** - no Docker images or Kubernetes manifests.
+
+```
+┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
+│  kuberay-images │     │   ray-serve     │     │  homelab-k8s2   │
+│                 │     │                 │     │                 │
+│  Docker images  │     │  PyPI package   │     │  K8s manifests  │
+│  (GPU runtimes) │     │  (this repo)    │     │  (deployment)   │
+└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
+         │                       │                       │
+         ▼                       ▼                       ▼
+   Container Registry      PyPI Registry           GitOps (Flux)
+```
+
+## Deployments
+
+| Module | Purpose | Hardware Target |
+|--------|---------|-----------------|
+| `serve_llm` | vLLM OpenAI-compatible API | Strix Halo (ROCm) |
+| `serve_embeddings` | Sentence Transformers | Any GPU |
+| `serve_reranker` | Cross-encoder reranking | Any GPU |
+| `serve_whisper` | Faster Whisper STT | NVIDIA/Intel |
+| `serve_tts` | Coqui TTS | Any GPU |
+
+## Installation
+
+```bash
+# From Gitea PyPI
+pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
+
+# With optional dependencies
+pip install ray-serve-apps[llm]        # vLLM support
+pip install ray-serve-apps[embeddings] # Sentence Transformers
+pip install ray-serve-apps[stt]        # Faster Whisper
+pip install ray-serve-apps[tts]        # Coqui TTS
+```
+
+## Usage
+
+Ray clusters pull this package at runtime:
+
+```yaml
+# In RayService spec
+rayClusterConfig:
+  headGroupSpec:
+    template:
+      spec:
+        containers:
+          - name: ray-head
+            command:
+              - /bin/bash
+              - -c
+              - |
+                pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
+                ray start --head --dashboard-host=0.0.0.0
+                serve run ray_serve.serve_llm:app
+```
+
+## Development
+
+```bash
+# Install dev dependencies
+pip install -e ".[dev]"
+
+# Lint
+ruff check .
+ruff format .
+
+# Test
+pytest
+```
+
+## Publishing
+
+Pushes to `main` automatically publish to Gitea PyPI via CI/CD.
+
+To bump version, edit `pyproject.toml`:
+```toml
+[project]
+version = "1.1.0"
+```