7ec2107e0c34d82fa5d88bf420c778ff3d1b9b9f
All checks were successful
Build and Publish ray-serve-apps / build-and-publish (push) Successful in 16s
- Add mlflow_logger.py: lightweight REST-based MLflow logger (no mlflow dep) - Instrument serve_llm.py with latency, token counts, tokens/sec metrics - Instrument serve_embeddings.py with latency, batch_size, total_tokens - Instrument serve_whisper.py with latency, audio_duration, realtime_factor - Instrument serve_tts.py with latency, audio_duration, text_chars - Instrument serve_reranker.py with latency, num_pairs, top_k
ray-serve-apps
Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.
Architecture
This repo contains application code only - no Docker images or Kubernetes manifests.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ kuberay-images │ │ ray-serve │ │ homelab-k8s2 │
│ │ │ │ │ │
│ Docker images │ │ PyPI package │ │ K8s manifests │
│ (GPU runtimes) │ │ (this repo) │ │ (deployment) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
▼ ▼ ▼
Container Registry PyPI Registry GitOps (Flux)
Deployments
| Module | Purpose | Hardware Target |
|---|---|---|
serve_llm |
vLLM OpenAI-compatible API | Strix Halo (ROCm) |
serve_embeddings |
Sentence Transformers | Any GPU |
serve_reranker |
Cross-encoder reranking | Any GPU |
serve_whisper |
Faster Whisper STT | NVIDIA/Intel |
serve_tts |
Coqui TTS | Any GPU |
Installation
# From Gitea PyPI
pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
# With optional dependencies
pip install ray-serve-apps[llm] # vLLM support
pip install ray-serve-apps[embeddings] # Sentence Transformers
pip install ray-serve-apps[stt] # Faster Whisper
pip install ray-serve-apps[tts] # Coqui TTS
Usage
Ray clusters pull this package at runtime:
# In RayService spec
rayClusterConfig:
headGroupSpec:
template:
spec:
containers:
- name: ray-head
command:
- /bin/bash
- -c
- |
pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
ray start --head --dashboard-host=0.0.0.0
serve run ray_serve.serve_llm:app
Development
# Install dev dependencies
pip install -e ".[dev]"
# Lint
ruff check .
ruff format .
# Test
pytest
Publishing
Pushes to main automatically publish to Gitea PyPI via CI/CD.
To bump version, edit pyproject.toml:
[project]
version = "1.1.0"
Languages
Python
100%