diff --git a/docs/adr/ADR-0024-ray-repository-structure.md b/docs/adr/ADR-0024-ray-repository-structure.md
new file mode 100644
index 0000000..918277d
--- /dev/null
+++ b/docs/adr/ADR-0024-ray-repository-structure.md
@@ -0,0 +1,184 @@
+# ADR-0024: Ray Repository Structure
+
+## Status
+
+Accepted
+
+## Date
+
+2026-02-03
+
+## Context
+
+| Factor | Details |
+|--------|---------|
+| Problem | Need to document the Ray-specific repository structure |
+| Impact | Clarity on where Ray components live post-migration |
+| Current State | kuberay-images standalone, ray-serve needs extraction |
+| Goal | Clean separation with independent release cycles |
+
+### Historical Context
+
+`llm-workflows` was the original monolithic repository containing all ML/AI infrastructure code. It has been **archived** after being fully decomposed into focused, independent repositories:
+
+| Repository | Purpose |
+|------------|---------|
+| `ai-apps` | Gradio applications (STT, TTS, embeddings UIs) |
+| `ai-pipelines` | Kubeflow pipeline definitions |
+| `ai-services` | Core ML service implementations |
+| `chat-handler` | Chat orchestration and routing |
+| `handler-base` | Base handler framework |
+| `pipeline-bridge` | Bridge between pipelines and services |
+| `stt-module` | Speech-to-text service |
+| `tts-module` | Text-to-speech service |
+| `voice-assistant` | Voice assistant integration |
+| `gradio-ui` | Shared Gradio UI components |
+| `kuberay-images` | GPU-specific Ray worker base images |
+| `ntfy-discord` | Notification bridge |
+| `spark-analytics-jobs` | Spark batch analytics |
+| `flink-analytics-jobs` | Flink streaming analytics |
+
+### Remaining Ray Component
+
+The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services.
+
+| Component | Current Location | Purpose |
+|-----------|------------------|---------|
+| kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
+| ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services |
+| llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings |
+
+### Problems with Current Structure
+
+1. **Tight Coupling**: ray-serve changes require llm-workflows repo access
+2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps
+3. **Version Management**: Can't independently version ray-serve deployments
+4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo
+5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds
+
+## Decision
+
+**Establish two dedicated Ray repositories with distinct purposes:**
+
+| Repository | Type | Contents | Release Cycle |
+|------------|------|----------|---------------|
+| `kuberay-images` | Docker images | Ray worker base images (GPU-specific) | On dependency updates |
+| `ray-serve` | PyPI package | Ray Serve application code | Per model/feature update |
+
+### Key Design: Dynamic Code Loading
+
+Ray Serve applications are deployed as **PyPI packages**, not baked into Docker images. This enables:
+
+- **Dynamic Decoupling**: Update model serving logic without rebuilding containers
+- **Runtime Flexibility**: Ray cluster pulls code via `pip install` at runtime
+- **Faster Iteration**: Code changes don't require image rebuilds or pod restarts
+- **Version Pinning**: Kubernetes manifests specify package versions independently
+
+### Repository Structure
+
+```
+kuberay-images/                    # Docker images - GPU runtime environments
+├── Dockerfile.ray-worker-nvidia
+├── Dockerfile.ray-worker-rdna2
+├── Dockerfile.ray-worker-strixhalo
+├── Dockerfile.ray-worker-intel
+├── Makefile
+└── .gitea/workflows/
+    └── build-push.yaml            # Builds & pushes to container registry
+
+ray-serve/                         # PyPI package - application code
+├── src/
+│   └── ray_serve/
+│       ├── __init__.py
+│       ├── model_configs.py
+│       └── serve_apps.py
+├── pyproject.toml
+├── README.md
+└── .gitea/workflows/
+    └── publish-ray-serve.yaml     # Publishes to PyPI registry
+```
+
+**Note**: Kubernetes deployment manifests live in `homelab-k8s2`, not in either Ray repo. This maintains separation between:
+- **Infrastructure** (kuberay-images) - How to run Ray workers
+- **Application** (ray-serve) - What code to run
+- **Orchestration** (homelab-k8s2) - Where and when to deploy
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                        RAY INFRASTRUCTURE                            │
+└─────────────────────────────────────────────────────────────────────┘
+                                  │
+              ┌───────────────────┴───────────────────┐
+              │                                       │
+              ▼                                       ▼
+      ┌───────────────┐                       ┌───────────────┐
+      │ kuberay-images│                       │   ray-serve   │
+      │               │                       │               │
+      │ Base worker   │                       │ PyPI package  │
+      │ Docker images │                       │ Ray Serve     │
+      │               │                       │ application   │
+      │ NVIDIA/AMD/   │                       │               │
+      │ Intel GPUs    │                       │ Model configs │
+      └───────────────┘                       └───────────────┘
+              │                                       │
+              ▼                                       ▼
+      ┌───────────────┐                       ┌───────────────┐
+      │ Container     │                       │ PyPI          │
+      │ Registry      │                       │ Registry      │
+      │ registry.lab/ │                       │ registry.lab/ │
+      │ kuberay/*     │                       │ pypi/ray-serve│
+      └───────────────┘                       └───────────────┘
+              │                                       │
+              └───────────────────┬───────────────────┘
+                                  │
+                                  ▼
+                        ┌───────────────────┐
+                        │   Ray Cluster     │
+                        │                   │
+                        │ 1. Pull container │
+                        │ 2. pip install    │
+                        │    ray-serve      │
+                        │ 3. Run serve app  │
+                        └───────────────────┘
+```
+
+## Consequences
+
+### Positive
+
+- **Dynamic Updates**: Deploy new model serving code without rebuilding images
+- **Independent Releases**: Containers and application code versioned separately
+- **Faster Iteration**: PyPI publish is seconds vs minutes for Docker builds
+- **Clear Separation**: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
+- **Runtime Flexibility**: Same container can run different ray-serve versions
+
+### Negative
+
+- **Runtime Dependencies**: Pod startup requires `pip install` (cached in practice)
+- **Version Coordination**: Must track compatible versions between kuberay-images and ray-serve
+
+### Migration Steps
+
+1. ✅ `kuberay-images` already exists as standalone repo
+2. ✅ `llm-workflows` archived - all components extracted to dedicated repos
+3. [ ] Create `ray-serve` repo on Gitea
+4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo
+5. [ ] Set up pyproject.toml for PyPI publishing
+6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z`
+7. [ ] Verify Ray cluster pulls package correctly at runtime
+
+## Version Compatibility Matrix
+
+| kuberay-images | ray-serve | Notes |
+|----------------|-----------|-------|
+| 1.0.0 | 1.0.0 | Initial structure |
+
+## References
+
+- [ADR-0020: Internal Registry for CI/CD](./ADR-0020-internal-registry-for-cicd.md)
+- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
+- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)
+- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
+- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)