daviestechlabs/homelab-design

Fork 0

Files

Billy D. 85b1a9019b feat: adr-0024

2026-02-03 06:46:50 -05:00

8.6 KiB

Raw Blame History

ADR-0024: Ray Repository Structure

Status

Accepted

Date

2026-02-03

Context

Factor	Details
Problem	Need to document the Ray-specific repository structure
Impact	Clarity on where Ray components live post-migration
Current State	kuberay-images standalone, ray-serve needs extraction
Goal	Clean separation with independent release cycles

Historical Context

llm-workflows was the original monolithic repository containing all ML/AI infrastructure code. It has been archived after being fully decomposed into focused, independent repositories:

Repository	Purpose
`ai-apps`	Gradio applications (STT, TTS, embeddings UIs)
`ai-pipelines`	Kubeflow pipeline definitions
`ai-services`	Core ML service implementations
`chat-handler`	Chat orchestration and routing
`handler-base`	Base handler framework
`pipeline-bridge`	Bridge between pipelines and services
`stt-module`	Speech-to-text service
`tts-module`	Text-to-speech service
`voice-assistant`	Voice assistant integration
`gradio-ui`	Shared Gradio UI components
`kuberay-images`	GPU-specific Ray worker base images
`ntfy-discord`	Notification bridge
`spark-analytics-jobs`	Spark batch analytics
`flink-analytics-jobs`	Flink streaming analytics

Remaining Ray Component

The ray-serve code still needs a dedicated repository for Ray Serve model inference services.

Component	Current Location	Purpose
kuberay-images	`kuberay-images/` (standalone)	Docker images for Ray workers (NVIDIA, AMD, Intel)
ray-serve	`llm-workflows/ray-serve/`	Ray Serve inference services
llm-workflows	`llm-workflows/`	Pipelines, handlers, STT/TTS, embeddings

Problems with Current Structure

Tight Coupling: ray-serve changes require llm-workflows repo access
CI/CD Complexity: Building ray-serve images triggers unrelated workflow steps
Version Management: Can't independently version ray-serve deployments
Team Access: Contributors to ray-serve need access to entire llm-workflows repo
Build Times: Changes to unrelated code can trigger ray-serve rebuilds

Decision

Establish two dedicated Ray repositories with distinct purposes:

Repository	Type	Contents	Release Cycle
`kuberay-images`	Docker images	Ray worker base images (GPU-specific)	On dependency updates
`ray-serve`	PyPI package	Ray Serve application code	Per model/feature update

Key Design: Dynamic Code Loading

Ray Serve applications are deployed as PyPI packages, not baked into Docker images. This enables:

Dynamic Decoupling: Update model serving logic without rebuilding containers
Runtime Flexibility: Ray cluster pulls code via pip install at runtime
Faster Iteration: Code changes don't require image rebuilds or pod restarts
Version Pinning: Kubernetes manifests specify package versions independently

Repository Structure

kuberay-images/                    # Docker images - GPU runtime environments
├── Dockerfile.ray-worker-nvidia
├── Dockerfile.ray-worker-rdna2
├── Dockerfile.ray-worker-strixhalo
├── Dockerfile.ray-worker-intel
├── Makefile
└── .gitea/workflows/
    └── build-push.yaml            # Builds & pushes to container registry

ray-serve/                         # PyPI package - application code
├── src/
│   └── ray_serve/
│       ├── __init__.py
│       ├── model_configs.py
│       └── serve_apps.py
├── pyproject.toml
├── README.md
└── .gitea/workflows/
    └── publish-ray-serve.yaml     # Publishes to PyPI registry

Note: Kubernetes deployment manifests live in homelab-k8s2, not in either Ray repo. This maintains separation between:

Infrastructure (kuberay-images) - How to run Ray workers
Application (ray-serve) - What code to run
Orchestration (homelab-k8s2) - Where and when to deploy

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        RAY INFRASTRUCTURE                            │
└─────────────────────────────────────────────────────────────────────┘
                                  │
              ┌───────────────────┴───────────────────┐
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ kuberay-images│                       │   ray-serve   │
      │               │                       │               │
      │ Base worker   │                       │ PyPI package  │
      │ Docker images │                       │ Ray Serve     │
      │               │                       │ application   │
      │ NVIDIA/AMD/   │                       │               │
      │ Intel GPUs    │                       │ Model configs │
      └───────────────┘                       └───────────────┘
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ Container     │                       │ PyPI          │
      │ Registry      │                       │ Registry      │
      │ registry.lab/ │                       │ registry.lab/ │
      │ kuberay/*     │                       │ pypi/ray-serve│
      └───────────────┘                       └───────────────┘
              │                                       │
              └───────────────────┬───────────────────┘
                                  │
                                  ▼
                        ┌───────────────────┐
                        │   Ray Cluster     │
                        │                   │
                        │ 1. Pull container │
                        │ 2. pip install    │
                        │    ray-serve      │
                        │ 3. Run serve app  │
                        └───────────────────┘

Consequences

Positive

Dynamic Updates: Deploy new model serving code without rebuilding images
Independent Releases: Containers and application code versioned separately
Faster Iteration: PyPI publish is seconds vs minutes for Docker builds
Clear Separation: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
Runtime Flexibility: Same container can run different ray-serve versions

Negative

Runtime Dependencies: Pod startup requires pip install (cached in practice)
Version Coordination: Must track compatible versions between kuberay-images and ray-serve

Migration Steps

✅ kuberay-images already exists as standalone repo
✅ llm-workflows archived - all components extracted to dedicated repos
Create ray-serve repo on Gitea
Move .gitea/workflows/publish-ray-serve.yaml to new repo
Set up pyproject.toml for PyPI publishing
Update RayService manifests to pip install ray-serve==X.Y.Z
Verify Ray cluster pulls package correctly at runtime

Version Compatibility Matrix

kuberay-images	ray-serve	Notes
1.0.0	1.0.0	Initial structure

8.6 KiB Raw Blame History