# ADR-0024: Ray Repository Structure ## Status Accepted ## Date 2026-02-03 ## Context | Factor | Details | |--------|---------| | Problem | Need to document the Ray-specific repository structure | | Impact | Clarity on where Ray components live post-migration | | Current State | kuberay-images standalone, ray-serve needs extraction | | Goal | Clean separation with independent release cycles | ### Historical Context `llm-workflows` was the original monolithic repository containing all ML/AI infrastructure code. It has been **archived** after being fully decomposed into focused, independent repositories: | Repository | Purpose | |------------|---------| | `ai-apps` | Gradio applications (STT, TTS, embeddings UIs) | | `ai-pipelines` | Kubeflow pipeline definitions | | `ai-services` | Core ML service implementations | | `chat-handler` | Chat orchestration and routing | | `handler-base` | Base handler framework | | `pipeline-bridge` | Bridge between pipelines and services | | `stt-module` | Speech-to-text service | | `tts-module` | Text-to-speech service | | `voice-assistant` | Voice assistant integration | | `gradio-ui` | Shared Gradio UI components | | `kuberay-images` | GPU-specific Ray worker base images | | `ntfy-discord` | Notification bridge | | `spark-analytics-jobs` | Spark batch analytics | | `flink-analytics-jobs` | Flink streaming analytics | ### Remaining Ray Component The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services. | Component | Current Location | Purpose | |-----------|------------------|---------| | kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) | | ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services | | llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings | ### Problems with Current Structure 1. **Tight Coupling**: ray-serve changes require llm-workflows repo access 2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps 3. **Version Management**: Can't independently version ray-serve deployments 4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo 5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds ## Decision **Establish two dedicated Ray repositories with distinct purposes:** | Repository | Type | Contents | Release Cycle | |------------|------|----------|---------------| | `kuberay-images` | Docker images | Ray worker base images (GPU-specific) | On dependency updates | | `ray-serve` | PyPI package | Ray Serve application code | Per model/feature update | ### Key Design: Dynamic Code Loading Ray Serve applications are deployed as **PyPI packages**, not baked into Docker images. This enables: - **Dynamic Decoupling**: Update model serving logic without rebuilding containers - **Runtime Flexibility**: Ray cluster pulls code via `pip install` at runtime - **Faster Iteration**: Code changes don't require image rebuilds or pod restarts - **Version Pinning**: Kubernetes manifests specify package versions independently ### Repository Structure ``` kuberay-images/ # Docker images - GPU runtime environments ├── Dockerfile.ray-worker-nvidia ├── Dockerfile.ray-worker-rdna2 ├── Dockerfile.ray-worker-strixhalo ├── Dockerfile.ray-worker-intel ├── Makefile └── .gitea/workflows/ └── build-push.yaml # Builds & pushes to container registry ray-serve/ # PyPI package - application code ├── src/ │ └── ray_serve/ │ ├── __init__.py │ ├── model_configs.py │ └── serve_apps.py ├── pyproject.toml ├── README.md └── .gitea/workflows/ └── publish-ray-serve.yaml # Publishes to PyPI registry ``` **Note**: Kubernetes deployment manifests live in `homelab-k8s2`, not in either Ray repo. This maintains separation between: - **Infrastructure** (kuberay-images) - How to run Ray workers - **Application** (ray-serve) - What code to run - **Orchestration** (homelab-k8s2) - Where and when to deploy ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ RAY INFRASTRUCTURE │ └─────────────────────────────────────────────────────────────────────┘ │ ┌───────────────────┴───────────────────┐ │ │ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ kuberay-images│ │ ray-serve │ │ │ │ │ │ Base worker │ │ PyPI package │ │ Docker images │ │ Ray Serve │ │ │ │ application │ │ NVIDIA/AMD/ │ │ │ │ Intel GPUs │ │ Model configs │ └───────────────┘ └───────────────┘ │ │ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ Container │ │ PyPI │ │ Registry │ │ Registry │ │ registry.lab/ │ │ registry.lab/ │ │ kuberay/* │ │ pypi/ray-serve│ └───────────────┘ └───────────────┘ │ │ └───────────────────┬───────────────────┘ │ ▼ ┌───────────────────┐ │ Ray Cluster │ │ │ │ 1. Pull container │ │ 2. pip install │ │ ray-serve │ │ 3. Run serve app │ └───────────────────┘ ``` ## Consequences ### Positive - **Dynamic Updates**: Deploy new model serving code without rebuilding images - **Independent Releases**: Containers and application code versioned separately - **Faster Iteration**: PyPI publish is seconds vs minutes for Docker builds - **Clear Separation**: Infrastructure (images) vs Application (code) vs Orchestration (k8s) - **Runtime Flexibility**: Same container can run different ray-serve versions ### Negative - **Runtime Dependencies**: Pod startup requires `pip install` (cached in practice) - **Version Coordination**: Must track compatible versions between kuberay-images and ray-serve ### Migration Steps 1. ✅ `kuberay-images` already exists as standalone repo 2. ✅ `llm-workflows` archived - all components extracted to dedicated repos 3. [ ] Create `ray-serve` repo on Gitea 4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo 5. [ ] Set up pyproject.toml for PyPI publishing 6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z` 7. [ ] Verify Ray cluster pulls package correctly at runtime ## Version Compatibility Matrix | kuberay-images | ray-serve | Notes | |----------------|-----------|-------| | 1.0.0 | 1.0.0 | Initial structure | ## References - [ADR-0020: Internal Registry for CI/CD](./ADR-0020-internal-registry-for-cicd.md) - [KubeRay Documentation](https://ray-project.github.io/kuberay/) - [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html) - [KubeRay Documentation](https://ray-project.github.io/kuberay/) - [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)