docs: add ADR-0038/0039 and replace llm-workflows references with decomposed repos
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s

- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller)
- ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord)
- Replace llm-workflows GitHub links with Gitea daviestechlabs org repos
- Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos
- Update ADR-0006: fix multi-repo reference
- Update ADR-0009: fix broken llm-workflows link
- Update ADR-0024: mark ray-serve repo as created, update historical context
- Update README: fix ADR-0016 status, add 0038/0039 to table, update badges
This commit is contained in:
2026-02-09 18:10:56 -05:00
parent 6682fe271f
commit 8e3e2043c3
7 changed files with 392 additions and 24 deletions

View File

@@ -35,23 +35,24 @@
| `spark-analytics-jobs` | Spark batch analytics |
| `flink-analytics-jobs` | Flink streaming analytics |
### Remaining Ray Component
### Ray Component Repositories
The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services.
Both Ray repositories now exist as standalone repos in the Gitea `daviestechlabs` organization:
| Component | Current Location | Purpose |
|-----------|------------------|---------|
| kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
| ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services |
| llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings |
| Component | Location | Purpose |
|-----------|----------|---------|
| kuberay-images | `kuberay-images/` (standalone repo) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
| ray-serve | `ray-serve/` (standalone repo) | Ray Serve inference services |
### Problems with Current Structure
### Problems with Monolithic Structure (Historical)
1. **Tight Coupling**: ray-serve changes require llm-workflows repo access
2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps
3. **Version Management**: Can't independently version ray-serve deployments
4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo
5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds
These were the problems with the original monolithic `llm-workflows` structure (now resolved):
1. **Tight Coupling**: ray-serve changes required llm-workflows repo access
2. **CI/CD Complexity**: Building ray-serve images triggered unrelated workflow steps
3. **Version Management**: Couldn't independently version ray-serve deployments
4. **Team Access**: Contributors to ray-serve needed access to entire llm-workflows repo
5. **Build Times**: Changes to unrelated code could trigger ray-serve rebuilds
## Decision
@@ -160,9 +161,9 @@ ray-serve/ # PyPI package - application code
1.`kuberay-images` already exists as standalone repo
2.`llm-workflows` archived - all components extracted to dedicated repos
3. [ ] Create `ray-serve` repo on Gitea
4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo
5. [ ] Set up pyproject.toml for PyPI publishing
3. `ray-serve` repo created on Gitea (`git.daviestechlabs.io/daviestechlabs/ray-serve`)
4. ✅ CI workflows moved to new repo
5. pyproject.toml configured for PyPI publishing
6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z`
7. [ ] Verify Ray cluster pulls package correctly at runtime