Commit Graph

34 Commits

Author SHA1 Message Date
5cb79a0fe7 fix: Use docker/login-action for buildx registry authentication
Some checks failed
Build and Push Images / determine-version (push) Successful in 57s
Build and Push Images / build-nvidia (push) Failing after 6m47s
Build and Push Images / build-rdna2 (push) Failing after 7m10s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
docker login doesn't properly propagate credentials to buildx builders.
docker/login-action handles this correctly and creates proper ~/.docker/config.json
2026-02-04 08:00:12 -05:00
338b668388 feat: Add semantic versioning based on commit message prefixes
Some checks failed
Build and Push Images / determine-version (push) Successful in 55s
Build and Push Images / build-nvidia (push) Failing after 1h52m48s
Build and Push Images / build-rdna2 (push) Failing after 3h14m40s
Build and Push Images / build-strixhalo (push) Failing after 1h52m42s
Build and Push Images / build-intel (push) Failing after 3h14m39s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
- Added determine-version job that runs BEFORE builds
- Version bump based on commit message:
  - major: or BREAKING CHANGE → major bump
  - minor:, feat:, or feature: → minor bump
  - everything else → patch bump
- All build jobs now depend on determine-version
- Images tagged with calculated version (e.g. v1.2.3) + latest
- Release job creates git tag after successful builds
- Notify job includes version info in notifications
- PRs get tagged with pr-<number>
- Manual tag pushes use tag directly (no version recalculation)
2026-02-03 22:30:48 -05:00
0bb3d25df7 trigger: rebuild after clearing runner cache 2026-02-03 22:25:35 -05:00
40c544ba0a fix: remove COPY ray-serve/ - now installed from PyPI
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 13s
Build and Push Images / build-strixhalo (push) Failing after 1m56s
Build and Push Images / build-rdna2 (push) Failing after 2m8s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
ray-serve-apps package is now installed from Gitea PyPI registry
at runtime by the RayService configuration, not bundled in image.
2026-02-03 22:23:05 -05:00
96921fe799 fix: workflow conditions for push events
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 15s
Build and Push Images / build-rdna2 (push) Failing after 17s
Build and Push Images / build-strixhalo (push) Failing after 15s
Build and Push Images / build-intel (push) Failing after 16s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
The if conditions were checking github.event.inputs.image == '' which
fails for push events where inputs is undefined. Changed logic to run
all builds unless this is a workflow_dispatch with a specific image
selected.
2026-02-03 21:39:17 -05:00
7e7822f995 trigger: rebuild rdna2 image
Some checks failed
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
2026-02-03 21:34:53 -05:00
aac9508c28 trigger: rebuild worker images after fix 2026-02-03 21:32:13 -05:00
cb7dad96c1 fix: PATH variable expansion in ROCm worker Dockerfiles
Some checks failed
Build and Push Images / build-rdna2 (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Split ENV ROCM_HOME and ENV PATH into separate commands to fix variable
expansion issue. When ROCM_HOME and PATH were in the same ENV line,
${ROCM_HOME} expanded to empty string since it wasn't defined yet.

This was causing 'ray: command not found' in init containers.
2026-02-03 21:07:00 -05:00
a8943c79ad refactor: remove ray-serve (moved to dedicated repo)
Implements ADR-0024: Ray Repository Structure

ray-serve is now a standalone PyPI package repo:
- https://git.daviestechlabs.io/billy/ray-serve

kuberay-images now contains only Docker images for Ray workers
2026-02-03 07:45:48 -05:00
796997cf06 adding intel image build fixes.
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 6m29s
Build and Push Images / build-strixhalo (push) Failing after 5m27s
Build and Push Images / build-intel (push) Failing after 4m6s
Build and Push Images / build-rdna2 (push) Failing after 2h19m57s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 2s
2026-02-02 21:16:48 -05:00
81388aed2c ci: retry build with Docker Hub auth 2026-02-02 17:44:43 -05:00
8af9d04210 fix(ci): configure Docker buildx for insecure HTTP registry
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 6m6s
Build and Push Images / build-rdna2 (push) Failing after 6m31s
Build and Push Images / build-strixhalo (push) Failing after 5m35s
Build and Push Images / build-intel (push) Failing after 5m33s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
2026-02-02 17:21:39 -05:00
456f08ec81 fix: use internal K8s service URL for container registry
Some checks failed
Build and Push Images / build-rdna2 (push) Failing after 8m19s
Build and Push Images / build-nvidia (push) Failing after 9m26s
Build and Push Images / build-strixhalo (push) Failing after 6m50s
Build and Push Images / build-intel (push) Failing after 7m14s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
- Switch from external git.daviestechlabs.io to internal gitea-http.gitea.svc
- Avoids Cloudflare/Authentik routing since runner is in-cluster
- Add REGISTRY_HOST env var for login steps
2026-02-02 13:28:51 -05:00
3c788fe2b6 fix(strixhalo): upgrade pandas for numpy 2.x compatibility
Some checks failed
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
Ray base image has pandas 1.5.3 compiled against numpy 1.x, but TheRock
PyTorch ROCm wheels require numpy 2.x. This causes:
  ValueError: numpy.dtype size changed, may indicate binary incompatibility

Fix by installing pandas 2.x which is compatible with numpy 2.x.
2026-02-02 13:25:28 -05:00
4e813cea64 fix: use twine for PyPI upload with internal URL
All checks were successful
Build and Publish ray-serve-apps / lint (push) Successful in 1m32s
Build and Publish ray-serve-apps / publish (push) Successful in 2m4s
Replaces curl-based upload with twine which handles the
PyPI upload protocol correctly. Uses TWINE_REPOSITORY_URL
env var to point to internal Gitea service.
2026-02-02 12:40:33 -05:00
18302cf640 chore: trigger ray-serve publish
Some checks failed
Build and Publish ray-serve-apps / lint (push) Successful in 1m31s
Build and Publish ray-serve-apps / publish (push) Failing after 1m28s
2026-02-02 12:35:03 -05:00
45a89ffb2c chore: trigger workflow to test secrets 2026-02-02 12:33:44 -05:00
7b4871f554 debug: check if secrets are being passed
Some checks failed
Build and Publish ray-serve-apps / lint (push) Successful in 1m33s
Build and Publish ray-serve-apps / publish (push) Failing after 1m32s
2026-02-02 12:20:39 -05:00
e497fe110d ci: use internal cluster service URL for PyPI upload
Some checks failed
Build and Publish ray-serve-apps / lint (push) Successful in 1m32s
Build and Publish ray-serve-apps / publish (push) Failing after 2m15s
2026-02-02 12:14:01 -05:00
a4ee672c19 feat: correct ntfy topic.
All checks were successful
Build and Publish ray-serve-apps / lint (push) Successful in 3m9s
Build and Publish ray-serve-apps / publish (push) Successful in 1m43s
2026-02-02 12:01:37 -05:00
280c08722f ci: use curl for PyPI upload with SSL skip
All checks were successful
Build and Publish ray-serve-apps / lint (push) Successful in 1m38s
Build and Publish ray-serve-apps / publish (push) Successful in 1m39s
[ray-serve only]

Twine lacks SSL skip option, use curl -k for self-signed internal cert
2026-02-02 11:31:22 -05:00
072cb233c7 ci: disable SSL verification for internal registry
Some checks failed
Build and Publish ray-serve-apps / lint (push) Successful in 2m0s
Build and Publish ray-serve-apps / publish (push) Failing after 2m6s
[ray-serve only]

Self-signed cert on internal network requires --disable-certificate-verification
2026-02-02 11:25:17 -05:00
1943a77992 ci: use internal registry URL for PyPI uploads (ADR-0020)
Some checks failed
Build and Publish ray-serve-apps / lint (push) Successful in 1m38s
Build and Publish ray-serve-apps / publish (push) Failing after 1m35s
[ray-serve only]

Bypass Cloudflare 100MB limit by using registry.lab.daviestechlabs.io
2026-02-02 11:19:33 -05:00
12987c6adc fix: apply ruff fixes to ray_serve package
Some checks failed
Build and Publish ray-serve-apps / lint (push) Successful in 1m30s
Build and Publish ray-serve-apps / publish (push) Failing after 2m44s
[ray-serve only]

- Fix whitespace in docstrings
- Add strict=True to zip() calls
- Use ternary operators where appropriate
- Rename unused loop variables
2026-02-02 11:09:35 -05:00
16f6199534 ci: add [skip images] support and trigger ray-serve publish
Some checks failed
Build and Push Images / build-nvidia (push) Has been skipped
Build and Push Images / build-intel (push) Has been skipped
Build and Push Images / build-rdna2 (push) Has been skipped
Build and Push Images / build-strixhalo (push) Has been skipped
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
Build and Publish ray-serve-apps / lint (push) Failing after 3m38s
Build and Publish ray-serve-apps / publish (push) Has been skipped
[ray-serve only]

- Add skip conditions to all image build jobs
- Commit message [skip images] or [ray-serve only] skips image builds
- Touch ray_serve/__init__.py to trigger publish workflow
2026-02-02 11:02:12 -05:00
bf93c5d7f4 ci: add path filters to avoid building images on ray-serve changes
Some checks failed
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Only trigger image builds when dockerfiles/ changes.
ray-serve package changes now only trigger publish-ray-serve.yaml.
2026-02-02 10:59:17 -05:00
9e250e149e chore: re-trigger CI after adding secrets
Some checks failed
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has started running
Build and Push Images / build-rdna2 (push) Has been cancelled
2026-02-02 10:54:17 -05:00
7efdcb059e feat: add pyproject.toml and CI for ray-serve-apps package
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 7m25s
Build and Push Images / build-rdna2 (push) Failing after 7m29s
Build and Push Images / build-strixhalo (push) Failing after 6m45s
Build and Push Images / build-intel (push) Failing after 6m22s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
Build and Publish ray-serve-apps / lint (push) Failing after 3m9s
Build and Publish ray-serve-apps / publish (push) Has been skipped
- Restructure ray-serve as proper Python package (ray_serve/)
- Add pyproject.toml with hatch build system
- Add CI workflow to publish to Gitea PyPI
- Add py.typed for PEP 561 compliance
- Aligns with ADR-0019 handler deployment strategy
2026-02-02 09:22:03 -05:00
876188a150 feat: add ntfy notifications and semantic versioning (ADR-0015)
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 26s
Build and Push Images / build-strixhalo (push) Failing after 34s
Build and Push Images / build-rdna2 (push) Failing after 47s
Build and Push Images / build-intel (push) Failing after 23s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
2026-02-02 08:00:33 -05:00
c0ca6bfc6a ci: re-trigger pipeline
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 1s
Build and Push Images / build-rdna2 (push) Failing after 1s
Build and Push Images / build-strixhalo (push) Failing after 1s
Build and Push Images / build-intel (push) Failing after 1s
2026-02-02 07:51:10 -05:00
e1529ad923 ci: fix registry login - skip on PRs, add Docker Hub auth
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 31s
Build and Push Images / build-rdna2 (push) Failing after 33s
Build and Push Images / build-strixhalo (push) Failing after 20s
Build and Push Images / build-intel (push) Failing after 25s
- Only login to Gitea registry on push (not PRs)
- Add optional Docker Hub login to avoid pull rate limits
- Requires REGISTRY_USER, REGISTRY_TOKEN secrets in Gitea
- Optional: DOCKERHUB_USERNAME (var) + DOCKERHUB_TOKEN (secret)
2026-02-02 07:35:20 -05:00
cb80709d3d build: optimize Dockerfiles for production
Some checks failed
Build and Push Images / build-rdna2 (push) Failing after 4m3s
Build and Push Images / build-nvidia (push) Failing after 4m6s
Build and Push Images / build-strixhalo (push) Failing after 18s
Build and Push Images / build-intel (push) Failing after 21s
- Use BuildKit syntax 1.7 with cache mounts for apt/uv
- Switch from pip to uv for 10-100x faster installs (ADR-0014)
- Add OCI Image Spec labels for container metadata
- Add HEALTHCHECK directives for orchestration
- Add .dockerignore to reduce context size
- Update Makefile with buildx and lint target
- Add retry logic to ray-entrypoint.sh

Refs: ADR-0012 (uv), ADR-0014 (Docker best practices)
2026-02-02 07:26:27 -05:00
a16ffff73f feat: Add GPU-specific Ray worker images with CI/CD
Some checks failed
Build and Push Images / build-nvidia (push) Failing after 1s
Build and Push Images / build-rdna2 (push) Failing after 1s
Build and Push Images / build-strixhalo (push) Failing after 1s
Build and Push Images / build-intel (push) Failing after 1s
- Add Dockerfiles for nvidia, rdna2, strixhalo, and intel GPU targets
- Add ray-serve modules (embeddings, whisper, tts, llm, reranker)
- Add Gitea Actions workflow for automated builds
- Add Makefile for local development
- Update README with comprehensive documentation
2026-02-01 15:04:31 -05:00
e68d5c1f0e Initial commit 2026-02-01 19:59:37 +00:00