Files
homelab-design/decisions/0024-ray-repository-structure.md
Billy D. 8e3e2043c3
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
docs: add ADR-0038/0039 and replace llm-workflows references with decomposed repos
- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller)
- ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord)
- Replace llm-workflows GitHub links with Gitea daviestechlabs org repos
- Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos
- Update ADR-0006: fix multi-repo reference
- Update ADR-0009: fix broken llm-workflows link
- Update ADR-0024: mark ray-serve repo as created, update historical context
- Update README: fix ADR-0016 status, add 0038/0039 to table, update badges
2026-02-09 18:12:37 -05:00

8.8 KiB

Ray Repository Structure

  • Status: accepted
  • Date: 2026-02-03
  • Deciders: Billy
  • Technical Story: Document repository layout for Ray Serve and KubeRay image components

Context

Factor Details
Problem Need to document the Ray-specific repository structure
Impact Clarity on where Ray components live post-migration
Current State kuberay-images standalone, ray-serve needs extraction
Goal Clean separation with independent release cycles

Historical Context

llm-workflows was the original monolithic repository containing all ML/AI infrastructure code. It has been archived after being fully decomposed into focused, independent repositories:

Repository Purpose
ai-apps Gradio applications (STT, TTS, embeddings UIs)
ai-pipelines Kubeflow pipeline definitions
ai-services Core ML service implementations
chat-handler Chat orchestration and routing
handler-base Base handler framework
pipeline-bridge Bridge between pipelines and services
stt-module Speech-to-text service
tts-module Text-to-speech service
voice-assistant Voice assistant integration
gradio-ui Shared Gradio UI components
kuberay-images GPU-specific Ray worker base images
ntfy-discord Notification bridge
spark-analytics-jobs Spark batch analytics
flink-analytics-jobs Flink streaming analytics

Ray Component Repositories

Both Ray repositories now exist as standalone repos in the Gitea daviestechlabs organization:

Component Location Purpose
kuberay-images kuberay-images/ (standalone repo) Docker images for Ray workers (NVIDIA, AMD, Intel)
ray-serve ray-serve/ (standalone repo) Ray Serve inference services

Problems with Monolithic Structure (Historical)

These were the problems with the original monolithic llm-workflows structure (now resolved):

  1. Tight Coupling: ray-serve changes required llm-workflows repo access
  2. CI/CD Complexity: Building ray-serve images triggered unrelated workflow steps
  3. Version Management: Couldn't independently version ray-serve deployments
  4. Team Access: Contributors to ray-serve needed access to entire llm-workflows repo
  5. Build Times: Changes to unrelated code could trigger ray-serve rebuilds

Decision

Establish two dedicated Ray repositories with distinct purposes:

Repository Type Contents Release Cycle
kuberay-images Docker images Ray worker base images (GPU-specific) On dependency updates
ray-serve PyPI package Ray Serve application code Per model/feature update

Key Design: Dynamic Code Loading

Ray Serve applications are deployed as PyPI packages, not baked into Docker images. This enables:

  • Dynamic Decoupling: Update model serving logic without rebuilding containers
  • Runtime Flexibility: Ray cluster pulls code via pip install at runtime
  • Faster Iteration: Code changes don't require image rebuilds or pod restarts
  • Version Pinning: Kubernetes manifests specify package versions independently

Repository Structure

kuberay-images/                    # Docker images - GPU runtime environments
├── Dockerfile.ray-worker-nvidia
├── Dockerfile.ray-worker-rdna2
├── Dockerfile.ray-worker-strixhalo
├── Dockerfile.ray-worker-intel
├── Makefile
└── .gitea/workflows/
    └── build-push.yaml            # Builds & pushes to container registry

ray-serve/                         # PyPI package - application code
├── src/
│   └── ray_serve/
│       ├── __init__.py
│       ├── model_configs.py
│       └── serve_apps.py
├── pyproject.toml
├── README.md
└── .gitea/workflows/
    └── publish-ray-serve.yaml     # Publishes to PyPI registry

Note: Kubernetes deployment manifests live in homelab-k8s2, not in either Ray repo. This maintains separation between:

  • Infrastructure (kuberay-images) - How to run Ray workers
  • Application (ray-serve) - What code to run
  • Orchestration (homelab-k8s2) - Where and when to deploy

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        RAY INFRASTRUCTURE                            │
└─────────────────────────────────────────────────────────────────────┘
                                  │
              ┌───────────────────┴───────────────────┐
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ kuberay-images│                       │   ray-serve   │
      │               │                       │               │
      │ Base worker   │                       │ PyPI package  │
      │ Docker images │                       │ Ray Serve     │
      │               │                       │ application   │
      │ NVIDIA/AMD/   │                       │               │
      │ Intel GPUs    │                       │ Model configs │
      └───────────────┘                       └───────────────┘
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ Container     │                       │ PyPI          │
      │ Registry      │                       │ Registry      │
      │ registry.lab/ │                       │ registry.lab/ │
      │ kuberay/*     │                       │ pypi/ray-serve│
      └───────────────┘                       └───────────────┘
              │                                       │
              └───────────────────┬───────────────────┘
                                  │
                                  ▼
                        ┌───────────────────┐
                        │   Ray Cluster     │
                        │                   │
                        │ 1. Pull container │
                        │ 2. pip install    │
                        │    ray-serve      │
                        │ 3. Run serve app  │
                        └───────────────────┘

Consequences

Positive

  • Dynamic Updates: Deploy new model serving code without rebuilding images
  • Independent Releases: Containers and application code versioned separately
  • Faster Iteration: PyPI publish is seconds vs minutes for Docker builds
  • Clear Separation: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
  • Runtime Flexibility: Same container can run different ray-serve versions

Negative

  • Runtime Dependencies: Pod startup requires pip install (cached in practice)
  • Version Coordination: Must track compatible versions between kuberay-images and ray-serve

Migration Steps

  1. kuberay-images already exists as standalone repo
  2. llm-workflows archived - all components extracted to dedicated repos
  3. ray-serve repo created on Gitea (git.daviestechlabs.io/daviestechlabs/ray-serve)
  4. CI workflows moved to new repo
  5. pyproject.toml configured for PyPI publishing
  6. Update RayService manifests to pip install ray-serve==X.Y.Z
  7. Verify Ray cluster pulls package correctly at runtime

Version Compatibility Matrix

kuberay-images ray-serve Notes
1.0.0 1.0.0 Initial structure

References