docs: add ADR-0038/0039 and replace llm-workflows references with decomposed repos

- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller) - ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord) - Replace llm-workflows GitHub links with Gitea daviestechlabs org repos - Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos - Update ADR-0006: fix multi-repo reference - Update ADR-0009: fix broken llm-workflows link - Update ADR-0024: mark ray-serve repo as created, update historical context - Update README: fix ADR-0016 status, add 0038/0039 to table, update badges
2026-02-09 18:10:56 -05:00
parent 6682fe271f
commit 8e3e2043c3
7 changed files with 392 additions and 24 deletions
--- a/decisions/0024-ray-repository-structure.md
+++ b/decisions/0024-ray-repository-structure.md
@@ -35,23 +35,24 @@
 | `spark-analytics-jobs` | Spark batch analytics |
 | `flink-analytics-jobs` | Flink streaming analytics |

-### Remaining Ray Component
+### Ray Component Repositories

-The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services.
+Both Ray repositories now exist as standalone repos in the Gitea `daviestechlabs` organization:

-| Component | Current Location | Purpose |
-|-----------|------------------|---------|
-| kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
-| ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services |
-| llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings |
+| Component | Location | Purpose |
+|-----------|----------|---------|
+| kuberay-images | `kuberay-images/` (standalone repo) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
+| ray-serve | `ray-serve/` (standalone repo) | Ray Serve inference services |

-### Problems with Current Structure
+### Problems with Monolithic Structure (Historical)

-1. **Tight Coupling**: ray-serve changes require llm-workflows repo access
-2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps
-3. **Version Management**: Can't independently version ray-serve deployments
-4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo
-5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds
+These were the problems with the original monolithic `llm-workflows` structure (now resolved):
+
+1. **Tight Coupling**: ray-serve changes required llm-workflows repo access
+2. **CI/CD Complexity**: Building ray-serve images triggered unrelated workflow steps
+3. **Version Management**: Couldn't independently version ray-serve deployments
+4. **Team Access**: Contributors to ray-serve needed access to entire llm-workflows repo
+5. **Build Times**: Changes to unrelated code could trigger ray-serve rebuilds

 ## Decision

@@ -160,9 +161,9 @@ ray-serve/                         # PyPI package - application code

 1. ✅ `kuberay-images` already exists as standalone repo
 2. ✅ `llm-workflows` archived - all components extracted to dedicated repos
-3. [ ] Create `ray-serve` repo on Gitea
-4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo
-5. [ ] Set up pyproject.toml for PyPI publishing
+3. ✅ `ray-serve` repo created on Gitea (`git.daviestechlabs.io/daviestechlabs/ray-serve`)
+4. ✅ CI workflows moved to new repo
+5. ✅ pyproject.toml configured for PyPI publishing
 6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z`
 7. [ ] Verify Ray cluster pulls package correctly at runtime