cb7dad96c1
fix: PATH variable expansion in ROCm worker Dockerfiles
...
Build and Push Images / build-rdna2 (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Split ENV ROCM_HOME and ENV PATH into separate commands to fix variable
expansion issue. When ROCM_HOME and PATH were in the same ENV line,
${ROCM_HOME} expanded to empty string since it wasn't defined yet.
This was causing 'ray: command not found' in init containers.
2026-02-03 21:07:00 -05:00
a8943c79ad
refactor: remove ray-serve (moved to dedicated repo)
...
Implements ADR-0024: Ray Repository Structure
ray-serve is now a standalone PyPI package repo:
- https://git.daviestechlabs.io/billy/ray-serve
kuberay-images now contains only Docker images for Ray workers
2026-02-03 07:45:48 -05:00
796997cf06
adding intel image build fixes.
Build and Push Images / build-nvidia (push) Failing after 6m29s
Build and Push Images / build-strixhalo (push) Failing after 5m27s
Build and Push Images / build-intel (push) Failing after 4m6s
Build and Push Images / build-rdna2 (push) Failing after 2h19m57s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 2s
2026-02-02 21:16:48 -05:00
81388aed2c
ci: retry build with Docker Hub auth
2026-02-02 17:44:43 -05:00
8af9d04210
fix(ci): configure Docker buildx for insecure HTTP registry
Build and Push Images / build-nvidia (push) Failing after 6m6s
Build and Push Images / build-rdna2 (push) Failing after 6m31s
Build and Push Images / build-strixhalo (push) Failing after 5m35s
Build and Push Images / build-intel (push) Failing after 5m33s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
2026-02-02 17:21:39 -05:00
456f08ec81
fix: use internal K8s service URL for container registry
...
Build and Push Images / build-rdna2 (push) Failing after 8m19s
Build and Push Images / build-nvidia (push) Failing after 9m26s
Build and Push Images / build-strixhalo (push) Failing after 6m50s
Build and Push Images / build-intel (push) Failing after 7m14s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
- Switch from external git.daviestechlabs.io to internal gitea-http.gitea.svc
- Avoids Cloudflare/Authentik routing since runner is in-cluster
- Add REGISTRY_HOST env var for login steps
2026-02-02 13:28:51 -05:00
3c788fe2b6
fix(strixhalo): upgrade pandas for numpy 2.x compatibility
...
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
Ray base image has pandas 1.5.3 compiled against numpy 1.x, but TheRock
PyTorch ROCm wheels require numpy 2.x. This causes:
ValueError: numpy.dtype size changed, may indicate binary incompatibility
Fix by installing pandas 2.x which is compatible with numpy 2.x.
2026-02-02 13:25:28 -05:00
4e813cea64
fix: use twine for PyPI upload with internal URL
...
Build and Publish ray-serve-apps / lint (push) Successful in 1m32s
Build and Publish ray-serve-apps / publish (push) Successful in 2m4s
Replaces curl-based upload with twine which handles the
PyPI upload protocol correctly. Uses TWINE_REPOSITORY_URL
env var to point to internal Gitea service.
2026-02-02 12:40:33 -05:00
18302cf640
chore: trigger ray-serve publish
Build and Publish ray-serve-apps / lint (push) Successful in 1m31s
Build and Publish ray-serve-apps / publish (push) Failing after 1m28s
2026-02-02 12:35:03 -05:00
45a89ffb2c
chore: trigger workflow to test secrets
2026-02-02 12:33:44 -05:00
7b4871f554
debug: check if secrets are being passed
Build and Publish ray-serve-apps / lint (push) Successful in 1m33s
Build and Publish ray-serve-apps / publish (push) Failing after 1m32s
2026-02-02 12:20:39 -05:00
e497fe110d
ci: use internal cluster service URL for PyPI upload
Build and Publish ray-serve-apps / lint (push) Successful in 1m32s
Build and Publish ray-serve-apps / publish (push) Failing after 2m15s
2026-02-02 12:14:01 -05:00
a4ee672c19
feat: correct ntfy topic.
Build and Publish ray-serve-apps / lint (push) Successful in 3m9s
Build and Publish ray-serve-apps / publish (push) Successful in 1m43s
2026-02-02 12:01:37 -05:00
280c08722f
ci: use curl for PyPI upload with SSL skip
...
Build and Publish ray-serve-apps / lint (push) Successful in 1m38s
Build and Publish ray-serve-apps / publish (push) Successful in 1m39s
[ray-serve only]
Twine lacks SSL skip option, use curl -k for self-signed internal cert
2026-02-02 11:31:22 -05:00
072cb233c7
ci: disable SSL verification for internal registry
...
Build and Publish ray-serve-apps / lint (push) Successful in 2m0s
Build and Publish ray-serve-apps / publish (push) Failing after 2m6s
[ray-serve only]
Self-signed cert on internal network requires --disable-certificate-verification
2026-02-02 11:25:17 -05:00
1943a77992
ci: use internal registry URL for PyPI uploads (ADR-0020)
...
Build and Publish ray-serve-apps / lint (push) Successful in 1m38s
Build and Publish ray-serve-apps / publish (push) Failing after 1m35s
[ray-serve only]
Bypass Cloudflare 100MB limit by using registry.lab.daviestechlabs.io
2026-02-02 11:19:33 -05:00
12987c6adc
fix: apply ruff fixes to ray_serve package
...
Build and Publish ray-serve-apps / lint (push) Successful in 1m30s
Build and Publish ray-serve-apps / publish (push) Failing after 2m44s
[ray-serve only]
- Fix whitespace in docstrings
- Add strict=True to zip() calls
- Use ternary operators where appropriate
- Rename unused loop variables
2026-02-02 11:09:35 -05:00
16f6199534
ci: add [skip images] support and trigger ray-serve publish
...
Build and Push Images / build-nvidia (push) Has been skipped
Build and Push Images / build-intel (push) Has been skipped
Build and Push Images / build-rdna2 (push) Has been skipped
Build and Push Images / build-strixhalo (push) Has been skipped
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
Build and Publish ray-serve-apps / lint (push) Failing after 3m38s
Build and Publish ray-serve-apps / publish (push) Has been skipped
[ray-serve only]
- Add skip conditions to all image build jobs
- Commit message [skip images] or [ray-serve only] skips image builds
- Touch ray_serve/__init__.py to trigger publish workflow
2026-02-02 11:02:12 -05:00
bf93c5d7f4
ci: add path filters to avoid building images on ray-serve changes
...
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Only trigger image builds when dockerfiles/ changes.
ray-serve package changes now only trigger publish-ray-serve.yaml.
2026-02-02 10:59:17 -05:00
9e250e149e
chore: re-trigger CI after adding secrets
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has started running
Build and Push Images / build-rdna2 (push) Has been cancelled
2026-02-02 10:54:17 -05:00
7efdcb059e
feat: add pyproject.toml and CI for ray-serve-apps package
...
Build and Push Images / build-nvidia (push) Failing after 7m25s
Build and Push Images / build-rdna2 (push) Failing after 7m29s
Build and Push Images / build-strixhalo (push) Failing after 6m45s
Build and Push Images / build-intel (push) Failing after 6m22s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
Build and Publish ray-serve-apps / lint (push) Failing after 3m9s
Build and Publish ray-serve-apps / publish (push) Has been skipped
- Restructure ray-serve as proper Python package (ray_serve/)
- Add pyproject.toml with hatch build system
- Add CI workflow to publish to Gitea PyPI
- Add py.typed for PEP 561 compliance
- Aligns with ADR-0019 handler deployment strategy
2026-02-02 09:22:03 -05:00
876188a150
feat: add ntfy notifications and semantic versioning (ADR-0015)
Build and Push Images / build-nvidia (push) Failing after 26s
Build and Push Images / build-strixhalo (push) Failing after 34s
Build and Push Images / build-rdna2 (push) Failing after 47s
Build and Push Images / build-intel (push) Failing after 23s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
2026-02-02 08:00:33 -05:00
c0ca6bfc6a
ci: re-trigger pipeline
Build and Push Images / build-nvidia (push) Failing after 1s
Build and Push Images / build-rdna2 (push) Failing after 1s
Build and Push Images / build-strixhalo (push) Failing after 1s
Build and Push Images / build-intel (push) Failing after 1s
2026-02-02 07:51:10 -05:00
e1529ad923
ci: fix registry login - skip on PRs, add Docker Hub auth
...
Build and Push Images / build-nvidia (push) Failing after 31s
Build and Push Images / build-rdna2 (push) Failing after 33s
Build and Push Images / build-strixhalo (push) Failing after 20s
Build and Push Images / build-intel (push) Failing after 25s
- Only login to Gitea registry on push (not PRs)
- Add optional Docker Hub login to avoid pull rate limits
- Requires REGISTRY_USER, REGISTRY_TOKEN secrets in Gitea
- Optional: DOCKERHUB_USERNAME (var) + DOCKERHUB_TOKEN (secret)
2026-02-02 07:35:20 -05:00
cb80709d3d
build: optimize Dockerfiles for production
...
Build and Push Images / build-rdna2 (push) Failing after 4m3s
Build and Push Images / build-nvidia (push) Failing after 4m6s
Build and Push Images / build-strixhalo (push) Failing after 18s
Build and Push Images / build-intel (push) Failing after 21s
- Use BuildKit syntax 1.7 with cache mounts for apt/uv
- Switch from pip to uv for 10-100x faster installs (ADR-0014)
- Add OCI Image Spec labels for container metadata
- Add HEALTHCHECK directives for orchestration
- Add .dockerignore to reduce context size
- Update Makefile with buildx and lint target
- Add retry logic to ray-entrypoint.sh
Refs: ADR-0012 (uv), ADR-0014 (Docker best practices)
2026-02-02 07:26:27 -05:00
a16ffff73f
feat: Add GPU-specific Ray worker images with CI/CD
...
Build and Push Images / build-nvidia (push) Failing after 1s
Build and Push Images / build-rdna2 (push) Failing after 1s
Build and Push Images / build-strixhalo (push) Failing after 1s
Build and Push Images / build-intel (push) Failing after 1s
- Add Dockerfiles for nvidia, rdna2, strixhalo, and intel GPU targets
- Add ray-serve modules (embeddings, whisper, tts, llm, reranker)
- Add Gitea Actions workflow for automated builds
- Add Makefile for local development
- Update README with comprehensive documentation
2026-02-01 15:04:31 -05:00
e68d5c1f0e
Initial commit
2026-02-01 19:59:37 +00:00