Files
homelab-design/docs/adr/ADR-0024-ray-repository-structure.md
2026-02-03 06:46:50 -05:00

8.6 KiB

ADR-0024: Ray Repository Structure

Status

Accepted

Date

2026-02-03

Context

Factor Details
Problem Need to document the Ray-specific repository structure
Impact Clarity on where Ray components live post-migration
Current State kuberay-images standalone, ray-serve needs extraction
Goal Clean separation with independent release cycles

Historical Context

llm-workflows was the original monolithic repository containing all ML/AI infrastructure code. It has been archived after being fully decomposed into focused, independent repositories:

Repository Purpose
ai-apps Gradio applications (STT, TTS, embeddings UIs)
ai-pipelines Kubeflow pipeline definitions
ai-services Core ML service implementations
chat-handler Chat orchestration and routing
handler-base Base handler framework
pipeline-bridge Bridge between pipelines and services
stt-module Speech-to-text service
tts-module Text-to-speech service
voice-assistant Voice assistant integration
gradio-ui Shared Gradio UI components
kuberay-images GPU-specific Ray worker base images
ntfy-discord Notification bridge
spark-analytics-jobs Spark batch analytics
flink-analytics-jobs Flink streaming analytics

Remaining Ray Component

The ray-serve code still needs a dedicated repository for Ray Serve model inference services.

Component Current Location Purpose
kuberay-images kuberay-images/ (standalone) Docker images for Ray workers (NVIDIA, AMD, Intel)
ray-serve llm-workflows/ray-serve/ Ray Serve inference services
llm-workflows llm-workflows/ Pipelines, handlers, STT/TTS, embeddings

Problems with Current Structure

  1. Tight Coupling: ray-serve changes require llm-workflows repo access
  2. CI/CD Complexity: Building ray-serve images triggers unrelated workflow steps
  3. Version Management: Can't independently version ray-serve deployments
  4. Team Access: Contributors to ray-serve need access to entire llm-workflows repo
  5. Build Times: Changes to unrelated code can trigger ray-serve rebuilds

Decision

Establish two dedicated Ray repositories with distinct purposes:

Repository Type Contents Release Cycle
kuberay-images Docker images Ray worker base images (GPU-specific) On dependency updates
ray-serve PyPI package Ray Serve application code Per model/feature update

Key Design: Dynamic Code Loading

Ray Serve applications are deployed as PyPI packages, not baked into Docker images. This enables:

  • Dynamic Decoupling: Update model serving logic without rebuilding containers
  • Runtime Flexibility: Ray cluster pulls code via pip install at runtime
  • Faster Iteration: Code changes don't require image rebuilds or pod restarts
  • Version Pinning: Kubernetes manifests specify package versions independently

Repository Structure

kuberay-images/                    # Docker images - GPU runtime environments
├── Dockerfile.ray-worker-nvidia
├── Dockerfile.ray-worker-rdna2
├── Dockerfile.ray-worker-strixhalo
├── Dockerfile.ray-worker-intel
├── Makefile
└── .gitea/workflows/
    └── build-push.yaml            # Builds & pushes to container registry

ray-serve/                         # PyPI package - application code
├── src/
│   └── ray_serve/
│       ├── __init__.py
│       ├── model_configs.py
│       └── serve_apps.py
├── pyproject.toml
├── README.md
└── .gitea/workflows/
    └── publish-ray-serve.yaml     # Publishes to PyPI registry

Note: Kubernetes deployment manifests live in homelab-k8s2, not in either Ray repo. This maintains separation between:

  • Infrastructure (kuberay-images) - How to run Ray workers
  • Application (ray-serve) - What code to run
  • Orchestration (homelab-k8s2) - Where and when to deploy

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        RAY INFRASTRUCTURE                            │
└─────────────────────────────────────────────────────────────────────┘
                                  │
              ┌───────────────────┴───────────────────┐
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ kuberay-images│                       │   ray-serve   │
      │               │                       │               │
      │ Base worker   │                       │ PyPI package  │
      │ Docker images │                       │ Ray Serve     │
      │               │                       │ application   │
      │ NVIDIA/AMD/   │                       │               │
      │ Intel GPUs    │                       │ Model configs │
      └───────────────┘                       └───────────────┘
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ Container     │                       │ PyPI          │
      │ Registry      │                       │ Registry      │
      │ registry.lab/ │                       │ registry.lab/ │
      │ kuberay/*     │                       │ pypi/ray-serve│
      └───────────────┘                       └───────────────┘
              │                                       │
              └───────────────────┬───────────────────┘
                                  │
                                  ▼
                        ┌───────────────────┐
                        │   Ray Cluster     │
                        │                   │
                        │ 1. Pull container │
                        │ 2. pip install    │
                        │    ray-serve      │
                        │ 3. Run serve app  │
                        └───────────────────┘

Consequences

Positive

  • Dynamic Updates: Deploy new model serving code without rebuilding images
  • Independent Releases: Containers and application code versioned separately
  • Faster Iteration: PyPI publish is seconds vs minutes for Docker builds
  • Clear Separation: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
  • Runtime Flexibility: Same container can run different ray-serve versions

Negative

  • Runtime Dependencies: Pod startup requires pip install (cached in practice)
  • Version Coordination: Must track compatible versions between kuberay-images and ray-serve

Migration Steps

  1. kuberay-images already exists as standalone repo
  2. llm-workflows archived - all components extracted to dedicated repos
  3. Create ray-serve repo on Gitea
  4. Move .gitea/workflows/publish-ray-serve.yaml to new repo
  5. Set up pyproject.toml for PyPI publishing
  6. Update RayService manifests to pip install ray-serve==X.Y.Z
  7. Verify Ray cluster pulls package correctly at runtime

Version Compatibility Matrix

kuberay-images ray-serve Notes
1.0.0 1.0.0 Initial structure

References