8.6 KiB
8.6 KiB
ADR-0024: Ray Repository Structure
Status
Accepted
Date
2026-02-03
Context
| Factor | Details |
|---|---|
| Problem | Need to document the Ray-specific repository structure |
| Impact | Clarity on where Ray components live post-migration |
| Current State | kuberay-images standalone, ray-serve needs extraction |
| Goal | Clean separation with independent release cycles |
Historical Context
llm-workflows was the original monolithic repository containing all ML/AI infrastructure code. It has been archived after being fully decomposed into focused, independent repositories:
| Repository | Purpose |
|---|---|
ai-apps |
Gradio applications (STT, TTS, embeddings UIs) |
ai-pipelines |
Kubeflow pipeline definitions |
ai-services |
Core ML service implementations |
chat-handler |
Chat orchestration and routing |
handler-base |
Base handler framework |
pipeline-bridge |
Bridge between pipelines and services |
stt-module |
Speech-to-text service |
tts-module |
Text-to-speech service |
voice-assistant |
Voice assistant integration |
gradio-ui |
Shared Gradio UI components |
kuberay-images |
GPU-specific Ray worker base images |
ntfy-discord |
Notification bridge |
spark-analytics-jobs |
Spark batch analytics |
flink-analytics-jobs |
Flink streaming analytics |
Remaining Ray Component
The ray-serve code still needs a dedicated repository for Ray Serve model inference services.
| Component | Current Location | Purpose |
|---|---|---|
| kuberay-images | kuberay-images/ (standalone) |
Docker images for Ray workers (NVIDIA, AMD, Intel) |
| ray-serve | llm-workflows/ray-serve/ |
Ray Serve inference services |
| llm-workflows | llm-workflows/ |
Pipelines, handlers, STT/TTS, embeddings |
Problems with Current Structure
- Tight Coupling: ray-serve changes require llm-workflows repo access
- CI/CD Complexity: Building ray-serve images triggers unrelated workflow steps
- Version Management: Can't independently version ray-serve deployments
- Team Access: Contributors to ray-serve need access to entire llm-workflows repo
- Build Times: Changes to unrelated code can trigger ray-serve rebuilds
Decision
Establish two dedicated Ray repositories with distinct purposes:
| Repository | Type | Contents | Release Cycle |
|---|---|---|---|
kuberay-images |
Docker images | Ray worker base images (GPU-specific) | On dependency updates |
ray-serve |
PyPI package | Ray Serve application code | Per model/feature update |
Key Design: Dynamic Code Loading
Ray Serve applications are deployed as PyPI packages, not baked into Docker images. This enables:
- Dynamic Decoupling: Update model serving logic without rebuilding containers
- Runtime Flexibility: Ray cluster pulls code via
pip installat runtime - Faster Iteration: Code changes don't require image rebuilds or pod restarts
- Version Pinning: Kubernetes manifests specify package versions independently
Repository Structure
kuberay-images/ # Docker images - GPU runtime environments
├── Dockerfile.ray-worker-nvidia
├── Dockerfile.ray-worker-rdna2
├── Dockerfile.ray-worker-strixhalo
├── Dockerfile.ray-worker-intel
├── Makefile
└── .gitea/workflows/
└── build-push.yaml # Builds & pushes to container registry
ray-serve/ # PyPI package - application code
├── src/
│ └── ray_serve/
│ ├── __init__.py
│ ├── model_configs.py
│ └── serve_apps.py
├── pyproject.toml
├── README.md
└── .gitea/workflows/
└── publish-ray-serve.yaml # Publishes to PyPI registry
Note: Kubernetes deployment manifests live in homelab-k8s2, not in either Ray repo. This maintains separation between:
- Infrastructure (kuberay-images) - How to run Ray workers
- Application (ray-serve) - What code to run
- Orchestration (homelab-k8s2) - Where and when to deploy
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ RAY INFRASTRUCTURE │
└─────────────────────────────────────────────────────────────────────┘
│
┌───────────────────┴───────────────────┐
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ kuberay-images│ │ ray-serve │
│ │ │ │
│ Base worker │ │ PyPI package │
│ Docker images │ │ Ray Serve │
│ │ │ application │
│ NVIDIA/AMD/ │ │ │
│ Intel GPUs │ │ Model configs │
└───────────────┘ └───────────────┘
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Container │ │ PyPI │
│ Registry │ │ Registry │
│ registry.lab/ │ │ registry.lab/ │
│ kuberay/* │ │ pypi/ray-serve│
└───────────────┘ └───────────────┘
│ │
└───────────────────┬───────────────────┘
│
▼
┌───────────────────┐
│ Ray Cluster │
│ │
│ 1. Pull container │
│ 2. pip install │
│ ray-serve │
│ 3. Run serve app │
└───────────────────┘
Consequences
Positive
- Dynamic Updates: Deploy new model serving code without rebuilding images
- Independent Releases: Containers and application code versioned separately
- Faster Iteration: PyPI publish is seconds vs minutes for Docker builds
- Clear Separation: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
- Runtime Flexibility: Same container can run different ray-serve versions
Negative
- Runtime Dependencies: Pod startup requires
pip install(cached in practice) - Version Coordination: Must track compatible versions between kuberay-images and ray-serve
Migration Steps
- ✅
kuberay-imagesalready exists as standalone repo - ✅
llm-workflowsarchived - all components extracted to dedicated repos - Create
ray-serverepo on Gitea - Move
.gitea/workflows/publish-ray-serve.yamlto new repo - Set up pyproject.toml for PyPI publishing
- Update RayService manifests to
pip install ray-serve==X.Y.Z - Verify Ray cluster pulls package correctly at runtime
Version Compatibility Matrix
| kuberay-images | ray-serve | Notes |
|---|---|---|
| 1.0.0 | 1.0.0 | Initial structure |