feat: adr-0024
This commit is contained in:
184
docs/adr/ADR-0024-ray-repository-structure.md
Normal file
184
docs/adr/ADR-0024-ray-repository-structure.md
Normal file
@@ -0,0 +1,184 @@
|
|||||||
|
# ADR-0024: Ray Repository Structure
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Date
|
||||||
|
|
||||||
|
2026-02-03
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
| Factor | Details |
|
||||||
|
|--------|---------|
|
||||||
|
| Problem | Need to document the Ray-specific repository structure |
|
||||||
|
| Impact | Clarity on where Ray components live post-migration |
|
||||||
|
| Current State | kuberay-images standalone, ray-serve needs extraction |
|
||||||
|
| Goal | Clean separation with independent release cycles |
|
||||||
|
|
||||||
|
### Historical Context
|
||||||
|
|
||||||
|
`llm-workflows` was the original monolithic repository containing all ML/AI infrastructure code. It has been **archived** after being fully decomposed into focused, independent repositories:
|
||||||
|
|
||||||
|
| Repository | Purpose |
|
||||||
|
|------------|---------|
|
||||||
|
| `ai-apps` | Gradio applications (STT, TTS, embeddings UIs) |
|
||||||
|
| `ai-pipelines` | Kubeflow pipeline definitions |
|
||||||
|
| `ai-services` | Core ML service implementations |
|
||||||
|
| `chat-handler` | Chat orchestration and routing |
|
||||||
|
| `handler-base` | Base handler framework |
|
||||||
|
| `pipeline-bridge` | Bridge between pipelines and services |
|
||||||
|
| `stt-module` | Speech-to-text service |
|
||||||
|
| `tts-module` | Text-to-speech service |
|
||||||
|
| `voice-assistant` | Voice assistant integration |
|
||||||
|
| `gradio-ui` | Shared Gradio UI components |
|
||||||
|
| `kuberay-images` | GPU-specific Ray worker base images |
|
||||||
|
| `ntfy-discord` | Notification bridge |
|
||||||
|
| `spark-analytics-jobs` | Spark batch analytics |
|
||||||
|
| `flink-analytics-jobs` | Flink streaming analytics |
|
||||||
|
|
||||||
|
### Remaining Ray Component
|
||||||
|
|
||||||
|
The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services.
|
||||||
|
|
||||||
|
| Component | Current Location | Purpose |
|
||||||
|
|-----------|------------------|---------|
|
||||||
|
| kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
|
||||||
|
| ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services |
|
||||||
|
| llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings |
|
||||||
|
|
||||||
|
### Problems with Current Structure
|
||||||
|
|
||||||
|
1. **Tight Coupling**: ray-serve changes require llm-workflows repo access
|
||||||
|
2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps
|
||||||
|
3. **Version Management**: Can't independently version ray-serve deployments
|
||||||
|
4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo
|
||||||
|
5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Establish two dedicated Ray repositories with distinct purposes:**
|
||||||
|
|
||||||
|
| Repository | Type | Contents | Release Cycle |
|
||||||
|
|------------|------|----------|---------------|
|
||||||
|
| `kuberay-images` | Docker images | Ray worker base images (GPU-specific) | On dependency updates |
|
||||||
|
| `ray-serve` | PyPI package | Ray Serve application code | Per model/feature update |
|
||||||
|
|
||||||
|
### Key Design: Dynamic Code Loading
|
||||||
|
|
||||||
|
Ray Serve applications are deployed as **PyPI packages**, not baked into Docker images. This enables:
|
||||||
|
|
||||||
|
- **Dynamic Decoupling**: Update model serving logic without rebuilding containers
|
||||||
|
- **Runtime Flexibility**: Ray cluster pulls code via `pip install` at runtime
|
||||||
|
- **Faster Iteration**: Code changes don't require image rebuilds or pod restarts
|
||||||
|
- **Version Pinning**: Kubernetes manifests specify package versions independently
|
||||||
|
|
||||||
|
### Repository Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
kuberay-images/ # Docker images - GPU runtime environments
|
||||||
|
├── Dockerfile.ray-worker-nvidia
|
||||||
|
├── Dockerfile.ray-worker-rdna2
|
||||||
|
├── Dockerfile.ray-worker-strixhalo
|
||||||
|
├── Dockerfile.ray-worker-intel
|
||||||
|
├── Makefile
|
||||||
|
└── .gitea/workflows/
|
||||||
|
└── build-push.yaml # Builds & pushes to container registry
|
||||||
|
|
||||||
|
ray-serve/ # PyPI package - application code
|
||||||
|
├── src/
|
||||||
|
│ └── ray_serve/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── model_configs.py
|
||||||
|
│ └── serve_apps.py
|
||||||
|
├── pyproject.toml
|
||||||
|
├── README.md
|
||||||
|
└── .gitea/workflows/
|
||||||
|
└── publish-ray-serve.yaml # Publishes to PyPI registry
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: Kubernetes deployment manifests live in `homelab-k8s2`, not in either Ray repo. This maintains separation between:
|
||||||
|
- **Infrastructure** (kuberay-images) - How to run Ray workers
|
||||||
|
- **Application** (ray-serve) - What code to run
|
||||||
|
- **Orchestration** (homelab-k8s2) - Where and when to deploy
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ RAY INFRASTRUCTURE │
|
||||||
|
└─────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────────┴───────────────────┐
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
┌───────────────┐ ┌───────────────┐
|
||||||
|
│ kuberay-images│ │ ray-serve │
|
||||||
|
│ │ │ │
|
||||||
|
│ Base worker │ │ PyPI package │
|
||||||
|
│ Docker images │ │ Ray Serve │
|
||||||
|
│ │ │ application │
|
||||||
|
│ NVIDIA/AMD/ │ │ │
|
||||||
|
│ Intel GPUs │ │ Model configs │
|
||||||
|
└───────────────┘ └───────────────┘
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
┌───────────────┐ ┌───────────────┐
|
||||||
|
│ Container │ │ PyPI │
|
||||||
|
│ Registry │ │ Registry │
|
||||||
|
│ registry.lab/ │ │ registry.lab/ │
|
||||||
|
│ kuberay/* │ │ pypi/ray-serve│
|
||||||
|
└───────────────┘ └───────────────┘
|
||||||
|
│ │
|
||||||
|
└───────────────────┬───────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌───────────────────┐
|
||||||
|
│ Ray Cluster │
|
||||||
|
│ │
|
||||||
|
│ 1. Pull container │
|
||||||
|
│ 2. pip install │
|
||||||
|
│ ray-serve │
|
||||||
|
│ 3. Run serve app │
|
||||||
|
└───────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
|
||||||
|
- **Dynamic Updates**: Deploy new model serving code without rebuilding images
|
||||||
|
- **Independent Releases**: Containers and application code versioned separately
|
||||||
|
- **Faster Iteration**: PyPI publish is seconds vs minutes for Docker builds
|
||||||
|
- **Clear Separation**: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
|
||||||
|
- **Runtime Flexibility**: Same container can run different ray-serve versions
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
|
||||||
|
- **Runtime Dependencies**: Pod startup requires `pip install` (cached in practice)
|
||||||
|
- **Version Coordination**: Must track compatible versions between kuberay-images and ray-serve
|
||||||
|
|
||||||
|
### Migration Steps
|
||||||
|
|
||||||
|
1. ✅ `kuberay-images` already exists as standalone repo
|
||||||
|
2. ✅ `llm-workflows` archived - all components extracted to dedicated repos
|
||||||
|
3. [ ] Create `ray-serve` repo on Gitea
|
||||||
|
4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo
|
||||||
|
5. [ ] Set up pyproject.toml for PyPI publishing
|
||||||
|
6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z`
|
||||||
|
7. [ ] Verify Ray cluster pulls package correctly at runtime
|
||||||
|
|
||||||
|
## Version Compatibility Matrix
|
||||||
|
|
||||||
|
| kuberay-images | ray-serve | Notes |
|
||||||
|
|----------------|-----------|-------|
|
||||||
|
| 1.0.0 | 1.0.0 | Initial structure |
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [ADR-0020: Internal Registry for CI/CD](./ADR-0020-internal-registry-for-cicd.md)
|
||||||
|
- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
|
||||||
|
- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)
|
||||||
|
- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
|
||||||
|
- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)
|
||||||
Reference in New Issue
Block a user