# ray-serve-apps Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters. ## Architecture This repo contains **application code only** - no Docker images or Kubernetes manifests. ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ kuberay-images │ │ ray-serve │ │ homelab-k8s2 │ │ │ │ │ │ │ │ Docker images │ │ PyPI package │ │ K8s manifests │ │ (GPU runtimes) │ │ (this repo) │ │ (deployment) │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ ▼ ▼ ▼ Container Registry PyPI Registry GitOps (Flux) ``` ## Deployments | Module | Purpose | Hardware Target | |--------|---------|-----------------| | `serve_llm` | vLLM OpenAI-compatible API | Strix Halo (ROCm) | | `serve_embeddings` | Sentence Transformers | Any GPU | | `serve_reranker` | Cross-encoder reranking | Any GPU | | `serve_whisper` | Faster Whisper STT | NVIDIA/Intel | | `serve_tts` | Coqui TTS | Any GPU | ## Installation ```bash # From Gitea PyPI pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple # With optional dependencies pip install ray-serve-apps[llm] # vLLM support pip install ray-serve-apps[embeddings] # Sentence Transformers pip install ray-serve-apps[stt] # Faster Whisper pip install ray-serve-apps[tts] # Coqui TTS ``` ## Usage Ray clusters pull this package at runtime: ```yaml # In RayService spec rayClusterConfig: headGroupSpec: template: spec: containers: - name: ray-head command: - /bin/bash - -c - | pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple ray start --head --dashboard-host=0.0.0.0 serve run ray_serve.serve_llm:app ``` ## Development ```bash # Install dev dependencies pip install -e ".[dev]" # Lint ruff check . ruff format . # Test pytest ``` ## Publishing Pushes to `main` automatically publish to Gitea PyPI via CI/CD. To bump version, edit `pyproject.toml`: ```toml [project] version = "1.1.0" ```