Files

Billy D. 8e3e2043c3

Update README with ADR Index / update-readme (push) Successful in 6s

Details

docs: add ADR-0038/0039 and replace llm-workflows references with decomposed repos

- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller)
- ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord)
- Replace llm-workflows GitHub links with Gitea daviestechlabs org repos
- Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos
- Update ADR-0006: fix multi-repo reference
- Update ADR-0009: fix broken llm-workflows link
- Update ADR-0024: mark ray-serve repo as created, update historical context
- Update README: fix ADR-0016 status, add 0038/0039 to table, update badges

2026-02-09 18:12:37 -05:00

8.8 KiB

Raw Blame History

Ray Repository Structure

Status: accepted
Date: 2026-02-03
Deciders: Billy
Technical Story: Document repository layout for Ray Serve and KubeRay image components

Context

Factor	Details
Problem	Need to document the Ray-specific repository structure
Impact	Clarity on where Ray components live post-migration
Current State	kuberay-images standalone, ray-serve needs extraction
Goal	Clean separation with independent release cycles

Historical Context

llm-workflows was the original monolithic repository containing all ML/AI infrastructure code. It has been archived after being fully decomposed into focused, independent repositories:

Repository	Purpose
`ai-apps`	Gradio applications (STT, TTS, embeddings UIs)
`ai-pipelines`	Kubeflow pipeline definitions
`ai-services`	Core ML service implementations
`chat-handler`	Chat orchestration and routing
`handler-base`	Base handler framework
`pipeline-bridge`	Bridge between pipelines and services
`stt-module`	Speech-to-text service
`tts-module`	Text-to-speech service
`voice-assistant`	Voice assistant integration
`gradio-ui`	Shared Gradio UI components
`kuberay-images`	GPU-specific Ray worker base images
`ntfy-discord`	Notification bridge
`spark-analytics-jobs`	Spark batch analytics
`flink-analytics-jobs`	Flink streaming analytics

Ray Component Repositories

Both Ray repositories now exist as standalone repos in the Gitea daviestechlabs organization:

Component	Location	Purpose
kuberay-images	`kuberay-images/` (standalone repo)	Docker images for Ray workers (NVIDIA, AMD, Intel)
ray-serve	`ray-serve/` (standalone repo)	Ray Serve inference services

Problems with Monolithic Structure (Historical)

These were the problems with the original monolithic llm-workflows structure (now resolved):

Tight Coupling: ray-serve changes required llm-workflows repo access
CI/CD Complexity: Building ray-serve images triggered unrelated workflow steps
Version Management: Couldn't independently version ray-serve deployments
Team Access: Contributors to ray-serve needed access to entire llm-workflows repo
Build Times: Changes to unrelated code could trigger ray-serve rebuilds

Decision

Establish two dedicated Ray repositories with distinct purposes:

Repository	Type	Contents	Release Cycle
`kuberay-images`	Docker images	Ray worker base images (GPU-specific)	On dependency updates
`ray-serve`	PyPI package	Ray Serve application code	Per model/feature update

Key Design: Dynamic Code Loading

Ray Serve applications are deployed as PyPI packages, not baked into Docker images. This enables:

Dynamic Decoupling: Update model serving logic without rebuilding containers
Runtime Flexibility: Ray cluster pulls code via pip install at runtime
Faster Iteration: Code changes don't require image rebuilds or pod restarts
Version Pinning: Kubernetes manifests specify package versions independently

Repository Structure

kuberay-images/                    # Docker images - GPU runtime environments
├── Dockerfile.ray-worker-nvidia
├── Dockerfile.ray-worker-rdna2
├── Dockerfile.ray-worker-strixhalo
├── Dockerfile.ray-worker-intel
├── Makefile
└── .gitea/workflows/
    └── build-push.yaml            # Builds & pushes to container registry

ray-serve/                         # PyPI package - application code
├── src/
│   └── ray_serve/
│       ├── __init__.py
│       ├── model_configs.py
│       └── serve_apps.py
├── pyproject.toml
├── README.md
└── .gitea/workflows/
    └── publish-ray-serve.yaml     # Publishes to PyPI registry

Note: Kubernetes deployment manifests live in homelab-k8s2, not in either Ray repo. This maintains separation between:

Infrastructure (kuberay-images) - How to run Ray workers
Application (ray-serve) - What code to run
Orchestration (homelab-k8s2) - Where and when to deploy

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        RAY INFRASTRUCTURE                            │
└─────────────────────────────────────────────────────────────────────┘
                                  │
              ┌───────────────────┴───────────────────┐
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ kuberay-images│                       │   ray-serve   │
      │               │                       │               │
      │ Base worker   │                       │ PyPI package  │
      │ Docker images │                       │ Ray Serve     │
      │               │                       │ application   │
      │ NVIDIA/AMD/   │                       │               │
      │ Intel GPUs    │                       │ Model configs │
      └───────────────┘                       └───────────────┘
              │                                       │
              ▼                                       ▼
      ┌───────────────┐                       ┌───────────────┐
      │ Container     │                       │ PyPI          │
      │ Registry      │                       │ Registry      │
      │ registry.lab/ │                       │ registry.lab/ │
      │ kuberay/*     │                       │ pypi/ray-serve│
      └───────────────┘                       └───────────────┘
              │                                       │
              └───────────────────┬───────────────────┘
                                  │
                                  ▼
                        ┌───────────────────┐
                        │   Ray Cluster     │
                        │                   │
                        │ 1. Pull container │
                        │ 2. pip install    │
                        │    ray-serve      │
                        │ 3. Run serve app  │
                        └───────────────────┘

Consequences

Positive

Dynamic Updates: Deploy new model serving code without rebuilding images
Independent Releases: Containers and application code versioned separately
Faster Iteration: PyPI publish is seconds vs minutes for Docker builds
Clear Separation: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
Runtime Flexibility: Same container can run different ray-serve versions

Negative

Runtime Dependencies: Pod startup requires pip install (cached in practice)
Version Coordination: Must track compatible versions between kuberay-images and ray-serve

Migration Steps

✅ kuberay-images already exists as standalone repo
✅ llm-workflows archived - all components extracted to dedicated repos
✅ ray-serve repo created on Gitea (git.daviestechlabs.io/daviestechlabs/ray-serve)
✅ CI workflows moved to new repo
✅ pyproject.toml configured for PyPI publishing
Update RayService manifests to pip install ray-serve==X.Y.Z
Verify Ray cluster pulls package correctly at runtime

Version Compatibility Matrix

kuberay-images	ray-serve	Notes
1.0.0	1.0.0	Initial structure

8.8 KiB Raw Blame History