Go to file

Build and Publish ray-serve-apps / build-and-publish (push) Successful in 16s

Details

feat: add MLflow inference logging to all Ray Serve apps

- Add mlflow_logger.py: lightweight REST-based MLflow logger (no mlflow dep)
- Instrument serve_llm.py with latency, token counts, tokens/sec metrics
- Instrument serve_embeddings.py with latency, batch_size, total_tokens
- Instrument serve_whisper.py with latency, audio_duration, realtime_factor
- Instrument serve_tts.py with latency, audio_duration, text_chars
- Instrument serve_reranker.py with latency, num_pairs, top_k

2026-02-12 06:14:30 -05:00

.gitea/workflows

ci: semver based on commit message keywords

2026-02-03 15:25:15 -05:00

ray_serve

feat: add MLflow inference logging to all Ray Serve apps

2026-02-12 06:14:30 -05:00

LICENSE

Initial commit

2026-02-03 11:59:56 +00:00

pyproject.toml

fixing coqui

2026-02-09 09:14:30 -05:00

README.md

feat: initial ray-serve-apps PyPI package

2026-02-03 07:03:39 -05:00

requirements.txt

fixing coqui

2026-02-09 09:14:30 -05:00

README.md

ray-serve-apps

Ray Serve deployments for GPU-shared AI inference. Published as a PyPI package to enable dynamic code loading by Ray clusters.

Architecture

This repo contains application code only - no Docker images or Kubernetes manifests.

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  kuberay-images │     │   ray-serve     │     │  homelab-k8s2   │
│                 │     │                 │     │                 │
│  Docker images  │     │  PyPI package   │     │  K8s manifests  │
│  (GPU runtimes) │     │  (this repo)    │     │  (deployment)   │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         ▼                       ▼                       ▼
   Container Registry      PyPI Registry           GitOps (Flux)

Deployments

Module	Purpose	Hardware Target
`serve_llm`	vLLM OpenAI-compatible API	Strix Halo (ROCm)
`serve_embeddings`	Sentence Transformers	Any GPU
`serve_reranker`	Cross-encoder reranking	Any GPU
`serve_whisper`	Faster Whisper STT	NVIDIA/Intel
`serve_tts`	Coqui TTS	Any GPU

Installation

# From Gitea PyPI
pip install ray-serve-apps --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple

# With optional dependencies
pip install ray-serve-apps[llm]        # vLLM support
pip install ray-serve-apps[embeddings] # Sentence Transformers
pip install ray-serve-apps[stt]        # Faster Whisper
pip install ray-serve-apps[tts]        # Coqui TTS

Usage

Ray clusters pull this package at runtime:

# In RayService spec
rayClusterConfig:
  headGroupSpec:
    template:
      spec:
        containers:
          - name: ray-head
            command:
              - /bin/bash
              - -c
              - |
                pip install ray-serve-apps==1.0.0 --index-url https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple
                ray start --head --dashboard-host=0.0.0.0
                serve run ray_serve.serve_llm:app

Development

# Install dev dependencies
pip install -e ".[dev]"

# Lint
ruff check .
ruff format .

# Test
pytest

Publishing

Pushes to main automatically publish to Gitea PyPI via CI/CD.

To bump version, edit pyproject.toml:

[project]
version = "1.1.0"