diff --git a/docs/adr/ADR-0024-ray-repository-structure.md b/docs/adr/ADR-0024-ray-repository-structure.md new file mode 100644 index 0000000..918277d --- /dev/null +++ b/docs/adr/ADR-0024-ray-repository-structure.md @@ -0,0 +1,184 @@ +# ADR-0024: Ray Repository Structure + +## Status + +Accepted + +## Date + +2026-02-03 + +## Context + +| Factor | Details | +|--------|---------| +| Problem | Need to document the Ray-specific repository structure | +| Impact | Clarity on where Ray components live post-migration | +| Current State | kuberay-images standalone, ray-serve needs extraction | +| Goal | Clean separation with independent release cycles | + +### Historical Context + +`llm-workflows` was the original monolithic repository containing all ML/AI infrastructure code. It has been **archived** after being fully decomposed into focused, independent repositories: + +| Repository | Purpose | +|------------|---------| +| `ai-apps` | Gradio applications (STT, TTS, embeddings UIs) | +| `ai-pipelines` | Kubeflow pipeline definitions | +| `ai-services` | Core ML service implementations | +| `chat-handler` | Chat orchestration and routing | +| `handler-base` | Base handler framework | +| `pipeline-bridge` | Bridge between pipelines and services | +| `stt-module` | Speech-to-text service | +| `tts-module` | Text-to-speech service | +| `voice-assistant` | Voice assistant integration | +| `gradio-ui` | Shared Gradio UI components | +| `kuberay-images` | GPU-specific Ray worker base images | +| `ntfy-discord` | Notification bridge | +| `spark-analytics-jobs` | Spark batch analytics | +| `flink-analytics-jobs` | Flink streaming analytics | + +### Remaining Ray Component + +The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services. + +| Component | Current Location | Purpose | +|-----------|------------------|---------| +| kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) | +| ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services | +| llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings | + +### Problems with Current Structure + +1. **Tight Coupling**: ray-serve changes require llm-workflows repo access +2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps +3. **Version Management**: Can't independently version ray-serve deployments +4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo +5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds + +## Decision + +**Establish two dedicated Ray repositories with distinct purposes:** + +| Repository | Type | Contents | Release Cycle | +|------------|------|----------|---------------| +| `kuberay-images` | Docker images | Ray worker base images (GPU-specific) | On dependency updates | +| `ray-serve` | PyPI package | Ray Serve application code | Per model/feature update | + +### Key Design: Dynamic Code Loading + +Ray Serve applications are deployed as **PyPI packages**, not baked into Docker images. This enables: + +- **Dynamic Decoupling**: Update model serving logic without rebuilding containers +- **Runtime Flexibility**: Ray cluster pulls code via `pip install` at runtime +- **Faster Iteration**: Code changes don't require image rebuilds or pod restarts +- **Version Pinning**: Kubernetes manifests specify package versions independently + +### Repository Structure + +``` +kuberay-images/ # Docker images - GPU runtime environments +├── Dockerfile.ray-worker-nvidia +├── Dockerfile.ray-worker-rdna2 +├── Dockerfile.ray-worker-strixhalo +├── Dockerfile.ray-worker-intel +├── Makefile +└── .gitea/workflows/ + └── build-push.yaml # Builds & pushes to container registry + +ray-serve/ # PyPI package - application code +├── src/ +│ └── ray_serve/ +│ ├── __init__.py +│ ├── model_configs.py +│ └── serve_apps.py +├── pyproject.toml +├── README.md +└── .gitea/workflows/ + └── publish-ray-serve.yaml # Publishes to PyPI registry +``` + +**Note**: Kubernetes deployment manifests live in `homelab-k8s2`, not in either Ray repo. This maintains separation between: +- **Infrastructure** (kuberay-images) - How to run Ray workers +- **Application** (ray-serve) - What code to run +- **Orchestration** (homelab-k8s2) - Where and when to deploy + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ RAY INFRASTRUCTURE │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ┌───────────────────┴───────────────────┐ + │ │ + ▼ ▼ + ┌───────────────┐ ┌───────────────┐ + │ kuberay-images│ │ ray-serve │ + │ │ │ │ + │ Base worker │ │ PyPI package │ + │ Docker images │ │ Ray Serve │ + │ │ │ application │ + │ NVIDIA/AMD/ │ │ │ + │ Intel GPUs │ │ Model configs │ + └───────────────┘ └───────────────┘ + │ │ + ▼ ▼ + ┌───────────────┐ ┌───────────────┐ + │ Container │ │ PyPI │ + │ Registry │ │ Registry │ + │ registry.lab/ │ │ registry.lab/ │ + │ kuberay/* │ │ pypi/ray-serve│ + └───────────────┘ └───────────────┘ + │ │ + └───────────────────┬───────────────────┘ + │ + ▼ + ┌───────────────────┐ + │ Ray Cluster │ + │ │ + │ 1. Pull container │ + │ 2. pip install │ + │ ray-serve │ + │ 3. Run serve app │ + └───────────────────┘ +``` + +## Consequences + +### Positive + +- **Dynamic Updates**: Deploy new model serving code without rebuilding images +- **Independent Releases**: Containers and application code versioned separately +- **Faster Iteration**: PyPI publish is seconds vs minutes for Docker builds +- **Clear Separation**: Infrastructure (images) vs Application (code) vs Orchestration (k8s) +- **Runtime Flexibility**: Same container can run different ray-serve versions + +### Negative + +- **Runtime Dependencies**: Pod startup requires `pip install` (cached in practice) +- **Version Coordination**: Must track compatible versions between kuberay-images and ray-serve + +### Migration Steps + +1. ✅ `kuberay-images` already exists as standalone repo +2. ✅ `llm-workflows` archived - all components extracted to dedicated repos +3. [ ] Create `ray-serve` repo on Gitea +4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo +5. [ ] Set up pyproject.toml for PyPI publishing +6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z` +7. [ ] Verify Ray cluster pulls package correctly at runtime + +## Version Compatibility Matrix + +| kuberay-images | ray-serve | Notes | +|----------------|-----------|-------| +| 1.0.0 | 1.0.0 | Initial structure | + +## References + +- [ADR-0020: Internal Registry for CI/CD](./ADR-0020-internal-registry-for-cicd.md) +- [KubeRay Documentation](https://ray-project.github.io/kuberay/) +- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html) +- [KubeRay Documentation](https://ray-project.github.io/kuberay/) +- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)