feat: adr-0024

2026-02-03 06:46:50 -05:00
parent 5e9f589311
commit 85b1a9019b
1 changed files with 184 additions and 0 deletions
--- a/docs/adr/ADR-0024-ray-repository-structure.md
+++ b/docs/adr/ADR-0024-ray-repository-structure.md
@@ -0,0 +1,184 @@
 # ADR-0024: Ray Repository Structure
 ## Status
 Accepted
 ## Date
 2026-02-03
 ## Context
 | Factor | Details |
 |--------|---------|
 | Problem | Need to document the Ray-specific repository structure |
 | Impact | Clarity on where Ray components live post-migration |
 | Current State | kuberay-images standalone, ray-serve needs extraction |
 | Goal | Clean separation with independent release cycles |
 ### Historical Context
 `llm-workflows` was the original monolithic repository containing all ML/AI infrastructure code. It has been **archived** after being fully decomposed into focused, independent repositories:
 | Repository | Purpose |
 |------------|---------|
 | `ai-apps` | Gradio applications (STT, TTS, embeddings UIs) |
 | `ai-pipelines` | Kubeflow pipeline definitions |
 | `ai-services` | Core ML service implementations |
 | `chat-handler` | Chat orchestration and routing |
 | `handler-base` | Base handler framework |
 | `pipeline-bridge` | Bridge between pipelines and services |
 | `stt-module` | Speech-to-text service |
 | `tts-module` | Text-to-speech service |
 | `voice-assistant` | Voice assistant integration |
 | `gradio-ui` | Shared Gradio UI components |
 | `kuberay-images` | GPU-specific Ray worker base images |
 | `ntfy-discord` | Notification bridge |
 | `spark-analytics-jobs` | Spark batch analytics |
 | `flink-analytics-jobs` | Flink streaming analytics |
 ### Remaining Ray Component
 The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services.
 | Component | Current Location | Purpose |
 |-----------|------------------|---------|
 | kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
 | ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services |
 | llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings |
 ### Problems with Current Structure
 1. **Tight Coupling**: ray-serve changes require llm-workflows repo access
 2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps
 3. **Version Management**: Can't independently version ray-serve deployments
 4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo
 5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds
 ## Decision
 **Establish two dedicated Ray repositories with distinct purposes:**
 | Repository | Type | Contents | Release Cycle |
 |------------|------|----------|---------------|
 | `kuberay-images` | Docker images | Ray worker base images (GPU-specific) | On dependency updates |
 | `ray-serve` | PyPI package | Ray Serve application code | Per model/feature update |
 ### Key Design: Dynamic Code Loading
 Ray Serve applications are deployed as **PyPI packages**, not baked into Docker images. This enables:
 - **Dynamic Decoupling**: Update model serving logic without rebuilding containers
 - **Runtime Flexibility**: Ray cluster pulls code via `pip install` at runtime
 - **Faster Iteration**: Code changes don't require image rebuilds or pod restarts
 - **Version Pinning**: Kubernetes manifests specify package versions independently
 ### Repository Structure
 ```
 kuberay-images/                    # Docker images - GPU runtime environments
 ├── Dockerfile.ray-worker-nvidia
 ├── Dockerfile.ray-worker-rdna2
 ├── Dockerfile.ray-worker-strixhalo
 ├── Dockerfile.ray-worker-intel
 ├── Makefile
 └── .gitea/workflows/
    └── build-push.yaml            # Builds & pushes to container registry
 ray-serve/                         # PyPI package - application code
 ├── src/
 │   └── ray_serve/
 │       ├── __init__.py
 │       ├── model_configs.py
 │       └── serve_apps.py
 ├── pyproject.toml
 ├── README.md
 └── .gitea/workflows/
    └── publish-ray-serve.yaml     # Publishes to PyPI registry
 ```
 **Note**: Kubernetes deployment manifests live in `homelab-k8s2`, not in either Ray repo. This maintains separation between:
 - **Infrastructure** (kuberay-images) - How to run Ray workers
 - **Application** (ray-serve) - What code to run
 - **Orchestration** (homelab-k8s2) - Where and when to deploy
 ## Architecture
 ```
 ┌─────────────────────────────────────────────────────────────────────┐
 │                        RAY INFRASTRUCTURE                            │
 └─────────────────────────────────────────────────────────────────────┘
                                  │
              ┌───────────────────┴───────────────────┐
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ kuberay-images│                       │   ray-serve   │
      │               │                       │               │
      │ Base worker   │                       │ PyPI package  │
      │ Docker images │                       │ Ray Serve     │
      │               │                       │ application   │
      │ NVIDIA/AMD/   │                       │               │
      │ Intel GPUs    │                       │ Model configs │
      └───────────────┘                       └───────────────┘
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ Container     │                       │ PyPI          │
      │ Registry      │                       │ Registry      │
      │ registry.lab/ │                       │ registry.lab/ │
      │ kuberay/*     │                       │ pypi/ray-serve│
      └───────────────┘                       └───────────────┘
              │                                       │
              └───────────────────┬───────────────────┘
                                  │
                                  ▼
                        ┌───────────────────┐
                        │   Ray Cluster     │
                        │                   │
                        │ 1. Pull container │
                        │ 2. pip install    │
                        │    ray-serve      │
                        │ 3. Run serve app  │
                        └───────────────────┘
 ```
 ## Consequences
 ### Positive
 - **Dynamic Updates**: Deploy new model serving code without rebuilding images
 - **Independent Releases**: Containers and application code versioned separately
 - **Faster Iteration**: PyPI publish is seconds vs minutes for Docker builds
 - **Clear Separation**: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
 - **Runtime Flexibility**: Same container can run different ray-serve versions
 ### Negative
 - **Runtime Dependencies**: Pod startup requires `pip install` (cached in practice)
 - **Version Coordination**: Must track compatible versions between kuberay-images and ray-serve
 ### Migration Steps
 1. ✅ `kuberay-images` already exists as standalone repo
 2. ✅ `llm-workflows` archived - all components extracted to dedicated repos
 3. [ ] Create `ray-serve` repo on Gitea
 4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo
 5. [ ] Set up pyproject.toml for PyPI publishing
 6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z`
 7. [ ] Verify Ray cluster pulls package correctly at runtime
 ## Version Compatibility Matrix
 | kuberay-images | ray-serve | Notes |
 |----------------|-----------|-------|
 | 1.0.0 | 1.0.0 | Initial structure |
 ## References
 - [ADR-0020: Internal Registry for CI/CD](./ADR-0020-internal-registry-for-cicd.md)
 - [KubeRay Documentation](https://ray-project.github.io/kuberay/)
 - [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)
 - [KubeRay Documentation](https://ray-project.github.io/kuberay/)
 - [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)