ray-serve/README.md

# ray-serve-apps

Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.

## Architecture

This repo contains **application code only** - no Docker images or Kubernetes manifests.

```
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  kuberay-images │     │   ray-serve     │     │  homelab-k8s2   │
│                 │     │                 │     │                 │
│  Docker images  │     │  PyPI package   │     │  K8s manifests  │
│  (GPU runtimes) │     │  (this repo)    │     │  (deployment)   │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         ▼                       ▼                       ▼
   Container Registry      PyPI Registry           GitOps (Flux)
```

## Deployments

| Module | Purpose | Hardware Target |
|--------|---------|-----------------|
| `serve_llm` | vLLM OpenAI-compatible API | Strix Halo (ROCm) |
| `serve_embeddings` | Sentence Transformers | Any GPU |
| `serve_reranker` | Cross-encoder reranking | Any GPU |
| `serve_whisper` | Faster Whisper STT | NVIDIA/Intel |
| `serve_tts` | Coqui TTS | Any GPU |

## Installation

```bash
# From Gitea PyPI
pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple

# With optional dependencies
pip install ray-serve-apps[llm]        # vLLM support
pip install ray-serve-apps[embeddings] # Sentence Transformers
pip install ray-serve-apps[stt]        # Faster Whisper
pip install ray-serve-apps[tts]        # Coqui TTS
```

## Usage

Ray clusters pull this package at runtime:

```yaml
# In RayService spec
rayClusterConfig:
  headGroupSpec:
    template:
      spec:
        containers:
          - name: ray-head
            command:
              - /bin/bash
              - -c
              - |
                pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
                ray start --head --dashboard-host=0.0.0.0
                serve run ray_serve.serve_llm:app
```

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Lint
ruff check .
ruff format .

# Test
pytest
```

## Publishing

Pushes to `main` automatically publish to Gitea PyPI via CI/CD.

To bump version, edit `pyproject.toml`:
```toml
[project]
version = "1.1.0"
```