homelab-design/decisions/0024-ray-repository-structure.md

# Ray Repository Structure

* Status: accepted
* Date: 2026-02-03
* Deciders: Billy
* Technical Story: Document repository layout for Ray Serve and KubeRay image components

## Context

| Factor | Details |
|--------|---------|
| Problem | Need to document the Ray-specific repository structure |
| Impact | Clarity on where Ray components live post-migration |
| Current State | kuberay-images standalone, ray-serve needs extraction |
| Goal | Clean separation with independent release cycles |

### Historical Context

`llm-workflows` was the original monolithic repository containing all ML/AI infrastructure code. It has been **archived** after being fully decomposed into focused, independent repositories:

| Repository | Purpose |
|------------|---------|
| `ai-apps` | Gradio applications (STT, TTS, embeddings UIs) |
| `ai-pipelines` | Kubeflow pipeline definitions |
| `ai-services` | Core ML service implementations |
| `chat-handler` | Chat orchestration and routing |
| `handler-base` | Base handler framework |
| `pipeline-bridge` | Bridge between pipelines and services |
| `stt-module` | Speech-to-text service |
| `tts-module` | Text-to-speech service |
| `voice-assistant` | Voice assistant integration |
| `gradio-ui` | Shared Gradio UI components |
| `kuberay-images` | GPU-specific Ray worker base images |
| `ntfy-discord` | Notification bridge |
| `spark-analytics-jobs` | Spark batch analytics |
| `flink-analytics-jobs` | Flink streaming analytics |

### Ray Component Repositories

Both Ray repositories now exist as standalone repos in the Gitea `daviestechlabs` organization:

| Component | Location | Purpose |
|-----------|----------|---------|
| kuberay-images | `kuberay-images/` (standalone repo) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
| ray-serve | `ray-serve/` (standalone repo) | Ray Serve inference services |

### Problems with Monolithic Structure (Historical)

These were the problems with the original monolithic `llm-workflows` structure (now resolved):

1. **Tight Coupling**: ray-serve changes required llm-workflows repo access
2. **CI/CD Complexity**: Building ray-serve images triggered unrelated workflow steps
3. **Version Management**: Couldn't independently version ray-serve deployments
4. **Team Access**: Contributors to ray-serve needed access to entire llm-workflows repo
5. **Build Times**: Changes to unrelated code could trigger ray-serve rebuilds

## Decision

**Establish two dedicated Ray repositories with distinct purposes:**

| Repository | Type | Contents | Release Cycle |
|------------|------|----------|---------------|
| `kuberay-images` | Docker images | Ray worker base images (GPU-specific) | On dependency updates |
| `ray-serve` | PyPI package | Ray Serve application code | Per model/feature update |

### Key Design: Dynamic Code Loading

Ray Serve applications are deployed as **PyPI packages**, not baked into Docker images. This enables:

- **Dynamic Decoupling**: Update model serving logic without rebuilding containers
- **Runtime Flexibility**: Ray cluster pulls code via `pip install` at runtime
- **Faster Iteration**: Code changes don't require image rebuilds or pod restarts
- **Version Pinning**: Kubernetes manifests specify package versions independently

### Repository Structure

```
kuberay-images/                    # Docker images - GPU runtime environments
├── Dockerfile.ray-worker-nvidia
├── Dockerfile.ray-worker-rdna2
├── Dockerfile.ray-worker-strixhalo
├── Dockerfile.ray-worker-intel
├── Makefile
└── .gitea/workflows/
    └── build-push.yaml            # Builds & pushes to container registry

ray-serve/                         # PyPI package - application code
├── src/
│   └── ray_serve/
│       ├── __init__.py
│       ├── model_configs.py
│       └── serve_apps.py
├── pyproject.toml
├── README.md
└── .gitea/workflows/
    └── publish-ray-serve.yaml     # Publishes to PyPI registry
```

**Note**: Kubernetes deployment manifests live in `homelab-k8s2`, not in either Ray repo. This maintains separation between:
- **Infrastructure** (kuberay-images) - How to run Ray workers
- **Application** (ray-serve) - What code to run
- **Orchestration** (homelab-k8s2) - Where and when to deploy

## Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                        RAY INFRASTRUCTURE                            │
└─────────────────────────────────────────────────────────────────────┘
                                  │
              ┌───────────────────┴───────────────────┐
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ kuberay-images│                       │   ray-serve   │
      │               │                       │               │
      │ Base worker   │                       │ PyPI package  │
      │ Docker images │                       │ Ray Serve     │
      │               │                       │ application   │
      │ NVIDIA/AMD/   │                       │               │
      │ Intel GPUs    │                       │ Model configs │
      └───────────────┘                       └───────────────┘
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ Container     │                       │ PyPI          │
      │ Registry      │                       │ Registry      │
      │ registry.lab/ │                       │ registry.lab/ │
      │ kuberay/*     │                       │ pypi/ray-serve│
      └───────────────┘                       └───────────────┘
              │                                       │
              └───────────────────┬───────────────────┘
                                  │
                                  ▼
                        ┌───────────────────┐
                        │   Ray Cluster     │
                        │                   │
                        │ 1. Pull container │
                        │ 2. pip install    │
                        │    ray-serve      │
                        │ 3. Run serve app  │
                        └───────────────────┘
```

## Consequences

### Positive

- **Dynamic Updates**: Deploy new model serving code without rebuilding images
- **Independent Releases**: Containers and application code versioned separately
- **Faster Iteration**: PyPI publish is seconds vs minutes for Docker builds
- **Clear Separation**: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
- **Runtime Flexibility**: Same container can run different ray-serve versions

### Negative

- **Runtime Dependencies**: Pod startup requires `pip install` (cached in practice)
- **Version Coordination**: Must track compatible versions between kuberay-images and ray-serve

### Migration Steps

1. ✅ `kuberay-images` already exists as standalone repo
2. ✅ `llm-workflows` archived - all components extracted to dedicated repos
3. ✅ `ray-serve` repo created on Gitea (`git.daviestechlabs.io/daviestechlabs/ray-serve`)
4. ✅ CI workflows moved to new repo
5. ✅ pyproject.toml configured for PyPI publishing
6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z`
7. [ ] Verify Ray cluster pulls package correctly at runtime

## Version Compatibility Matrix

| kuberay-images | ray-serve | Notes |
|----------------|-----------|-------|
| 1.0.0 | 1.0.0 | Initial structure |

## References

- [ADR-0020: Internal Registry for CI/CD](./ADR-0020-internal-registry-for-cicd.md)
- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)
- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)