e299f6476e
fix: Use external registry URL for proper Bearer token auth
...
Build and Push Images / determine-version (push) Successful in 1m32s
Build and Push Images / build-nvidia (push) Failing after 6m47s
Build and Push Images / build-rdna2 (push) Failing after 7m8s
Build and Push Images / build-strixhalo (push) Failing after 6m35s
Build and Push Images / build-intel (push) Failing after 6m35s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 2s
Gitea's container registry uses Bearer token auth with realm pointing
to external URL. Changed from internal K8s service URL to
registry.lab.daviestechlabs.io for proper auth flow.
Also removed insecure registry buildx config since using HTTPS now.
2026-02-04 08:13:35 -05:00
5cb79a0fe7
fix: Use docker/login-action for buildx registry authentication
...
Build and Push Images / determine-version (push) Successful in 57s
Build and Push Images / build-nvidia (push) Failing after 6m47s
Build and Push Images / build-rdna2 (push) Failing after 7m10s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
docker login doesn't properly propagate credentials to buildx builders.
docker/login-action handles this correctly and creates proper ~/.docker/config.json
2026-02-04 08:00:12 -05:00
338b668388
feat: Add semantic versioning based on commit message prefixes
...
Build and Push Images / determine-version (push) Successful in 55s
Build and Push Images / build-nvidia (push) Failing after 1h52m48s
Build and Push Images / build-rdna2 (push) Failing after 3h14m40s
Build and Push Images / build-strixhalo (push) Failing after 1h52m42s
Build and Push Images / build-intel (push) Failing after 3h14m39s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
- Added determine-version job that runs BEFORE builds
- Version bump based on commit message:
- major: or BREAKING CHANGE → major bump
- minor:, feat:, or feature: → minor bump
- everything else → patch bump
- All build jobs now depend on determine-version
- Images tagged with calculated version (e.g. v1.2.3) + latest
- Release job creates git tag after successful builds
- Notify job includes version info in notifications
- PRs get tagged with pr-<number>
- Manual tag pushes use tag directly (no version recalculation)
2026-02-03 22:30:48 -05:00
0bb3d25df7
trigger: rebuild after clearing runner cache
2026-02-03 22:25:35 -05:00
40c544ba0a
fix: remove COPY ray-serve/ - now installed from PyPI
...
Build and Push Images / build-nvidia (push) Failing after 13s
Build and Push Images / build-strixhalo (push) Failing after 1m56s
Build and Push Images / build-rdna2 (push) Failing after 2m8s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
ray-serve-apps package is now installed from Gitea PyPI registry
at runtime by the RayService configuration, not bundled in image.
2026-02-03 22:23:05 -05:00
96921fe799
fix: workflow conditions for push events
...
Build and Push Images / build-nvidia (push) Failing after 15s
Build and Push Images / build-rdna2 (push) Failing after 17s
Build and Push Images / build-strixhalo (push) Failing after 15s
Build and Push Images / build-intel (push) Failing after 16s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
The if conditions were checking github.event.inputs.image == '' which
fails for push events where inputs is undefined. Changed logic to run
all builds unless this is a workflow_dispatch with a specific image
selected.
2026-02-03 21:39:17 -05:00
7e7822f995
trigger: rebuild rdna2 image
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
2026-02-03 21:34:53 -05:00
aac9508c28
trigger: rebuild worker images after fix
2026-02-03 21:32:13 -05:00
cb7dad96c1
fix: PATH variable expansion in ROCm worker Dockerfiles
...
Build and Push Images / build-rdna2 (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Split ENV ROCM_HOME and ENV PATH into separate commands to fix variable
expansion issue. When ROCM_HOME and PATH were in the same ENV line,
${ROCM_HOME} expanded to empty string since it wasn't defined yet.
This was causing 'ray: command not found' in init containers.
2026-02-03 21:07:00 -05:00
a8943c79ad
refactor: remove ray-serve (moved to dedicated repo)
...
Implements ADR-0024: Ray Repository Structure
ray-serve is now a standalone PyPI package repo:
- https://git.daviestechlabs.io/billy/ray-serve
kuberay-images now contains only Docker images for Ray workers
2026-02-03 07:45:48 -05:00
796997cf06
adding intel image build fixes.
Build and Push Images / build-nvidia (push) Failing after 6m29s
Build and Push Images / build-strixhalo (push) Failing after 5m27s
Build and Push Images / build-intel (push) Failing after 4m6s
Build and Push Images / build-rdna2 (push) Failing after 2h19m57s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 2s
2026-02-02 21:16:48 -05:00
81388aed2c
ci: retry build with Docker Hub auth
2026-02-02 17:44:43 -05:00
8af9d04210
fix(ci): configure Docker buildx for insecure HTTP registry
Build and Push Images / build-nvidia (push) Failing after 6m6s
Build and Push Images / build-rdna2 (push) Failing after 6m31s
Build and Push Images / build-strixhalo (push) Failing after 5m35s
Build and Push Images / build-intel (push) Failing after 5m33s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
2026-02-02 17:21:39 -05:00
456f08ec81
fix: use internal K8s service URL for container registry
...
Build and Push Images / build-rdna2 (push) Failing after 8m19s
Build and Push Images / build-nvidia (push) Failing after 9m26s
Build and Push Images / build-strixhalo (push) Failing after 6m50s
Build and Push Images / build-intel (push) Failing after 7m14s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
- Switch from external git.daviestechlabs.io to internal gitea-http.gitea.svc
- Avoids Cloudflare/Authentik routing since runner is in-cluster
- Add REGISTRY_HOST env var for login steps
2026-02-02 13:28:51 -05:00
3c788fe2b6
fix(strixhalo): upgrade pandas for numpy 2.x compatibility
...
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
Ray base image has pandas 1.5.3 compiled against numpy 1.x, but TheRock
PyTorch ROCm wheels require numpy 2.x. This causes:
ValueError: numpy.dtype size changed, may indicate binary incompatibility
Fix by installing pandas 2.x which is compatible with numpy 2.x.
2026-02-02 13:25:28 -05:00
4e813cea64
fix: use twine for PyPI upload with internal URL
...
Build and Publish ray-serve-apps / lint (push) Successful in 1m32s
Build and Publish ray-serve-apps / publish (push) Successful in 2m4s
Replaces curl-based upload with twine which handles the
PyPI upload protocol correctly. Uses TWINE_REPOSITORY_URL
env var to point to internal Gitea service.
2026-02-02 12:40:33 -05:00
18302cf640
chore: trigger ray-serve publish
Build and Publish ray-serve-apps / lint (push) Successful in 1m31s
Build and Publish ray-serve-apps / publish (push) Failing after 1m28s
2026-02-02 12:35:03 -05:00
45a89ffb2c
chore: trigger workflow to test secrets
2026-02-02 12:33:44 -05:00
7b4871f554
debug: check if secrets are being passed
Build and Publish ray-serve-apps / lint (push) Successful in 1m33s
Build and Publish ray-serve-apps / publish (push) Failing after 1m32s
2026-02-02 12:20:39 -05:00
e497fe110d
ci: use internal cluster service URL for PyPI upload
Build and Publish ray-serve-apps / lint (push) Successful in 1m32s
Build and Publish ray-serve-apps / publish (push) Failing after 2m15s
2026-02-02 12:14:01 -05:00
a4ee672c19
feat: correct ntfy topic.
Build and Publish ray-serve-apps / lint (push) Successful in 3m9s
Build and Publish ray-serve-apps / publish (push) Successful in 1m43s
2026-02-02 12:01:37 -05:00
280c08722f
ci: use curl for PyPI upload with SSL skip
...
Build and Publish ray-serve-apps / lint (push) Successful in 1m38s
Build and Publish ray-serve-apps / publish (push) Successful in 1m39s
[ray-serve only]
Twine lacks SSL skip option, use curl -k for self-signed internal cert
2026-02-02 11:31:22 -05:00
072cb233c7
ci: disable SSL verification for internal registry
...
Build and Publish ray-serve-apps / lint (push) Successful in 2m0s
Build and Publish ray-serve-apps / publish (push) Failing after 2m6s
[ray-serve only]
Self-signed cert on internal network requires --disable-certificate-verification
2026-02-02 11:25:17 -05:00
1943a77992
ci: use internal registry URL for PyPI uploads (ADR-0020)
...
Build and Publish ray-serve-apps / lint (push) Successful in 1m38s
Build and Publish ray-serve-apps / publish (push) Failing after 1m35s
[ray-serve only]
Bypass Cloudflare 100MB limit by using registry.lab.daviestechlabs.io
2026-02-02 11:19:33 -05:00
12987c6adc
fix: apply ruff fixes to ray_serve package
...
Build and Publish ray-serve-apps / lint (push) Successful in 1m30s
Build and Publish ray-serve-apps / publish (push) Failing after 2m44s
[ray-serve only]
- Fix whitespace in docstrings
- Add strict=True to zip() calls
- Use ternary operators where appropriate
- Rename unused loop variables
2026-02-02 11:09:35 -05:00
16f6199534
ci: add [skip images] support and trigger ray-serve publish
...
Build and Push Images / build-nvidia (push) Has been skipped
Build and Push Images / build-intel (push) Has been skipped
Build and Push Images / build-rdna2 (push) Has been skipped
Build and Push Images / build-strixhalo (push) Has been skipped
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
Build and Publish ray-serve-apps / lint (push) Failing after 3m38s
Build and Publish ray-serve-apps / publish (push) Has been skipped
[ray-serve only]
- Add skip conditions to all image build jobs
- Commit message [skip images] or [ray-serve only] skips image builds
- Touch ray_serve/__init__.py to trigger publish workflow
2026-02-02 11:02:12 -05:00
bf93c5d7f4
ci: add path filters to avoid building images on ray-serve changes
...
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Only trigger image builds when dockerfiles/ changes.
ray-serve package changes now only trigger publish-ray-serve.yaml.
2026-02-02 10:59:17 -05:00
9e250e149e
chore: re-trigger CI after adding secrets
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has started running
Build and Push Images / build-rdna2 (push) Has been cancelled
2026-02-02 10:54:17 -05:00
7efdcb059e
feat: add pyproject.toml and CI for ray-serve-apps package
...
Build and Push Images / build-nvidia (push) Failing after 7m25s
Build and Push Images / build-rdna2 (push) Failing after 7m29s
Build and Push Images / build-strixhalo (push) Failing after 6m45s
Build and Push Images / build-intel (push) Failing after 6m22s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
Build and Publish ray-serve-apps / lint (push) Failing after 3m9s
Build and Publish ray-serve-apps / publish (push) Has been skipped
- Restructure ray-serve as proper Python package (ray_serve/)
- Add pyproject.toml with hatch build system
- Add CI workflow to publish to Gitea PyPI
- Add py.typed for PEP 561 compliance
- Aligns with ADR-0019 handler deployment strategy
2026-02-02 09:22:03 -05:00
876188a150
feat: add ntfy notifications and semantic versioning (ADR-0015)
Build and Push Images / build-nvidia (push) Failing after 26s
Build and Push Images / build-strixhalo (push) Failing after 34s
Build and Push Images / build-rdna2 (push) Failing after 47s
Build and Push Images / build-intel (push) Failing after 23s
Build and Push Images / Release (push) Has been skipped
Build and Push Images / Notify (push) Successful in 1s
2026-02-02 08:00:33 -05:00
c0ca6bfc6a
ci: re-trigger pipeline
Build and Push Images / build-nvidia (push) Failing after 1s
Build and Push Images / build-rdna2 (push) Failing after 1s
Build and Push Images / build-strixhalo (push) Failing after 1s
Build and Push Images / build-intel (push) Failing after 1s
2026-02-02 07:51:10 -05:00
e1529ad923
ci: fix registry login - skip on PRs, add Docker Hub auth
...
Build and Push Images / build-nvidia (push) Failing after 31s
Build and Push Images / build-rdna2 (push) Failing after 33s
Build and Push Images / build-strixhalo (push) Failing after 20s
Build and Push Images / build-intel (push) Failing after 25s
- Only login to Gitea registry on push (not PRs)
- Add optional Docker Hub login to avoid pull rate limits
- Requires REGISTRY_USER, REGISTRY_TOKEN secrets in Gitea
- Optional: DOCKERHUB_USERNAME (var) + DOCKERHUB_TOKEN (secret)
2026-02-02 07:35:20 -05:00
cb80709d3d
build: optimize Dockerfiles for production
...
Build and Push Images / build-rdna2 (push) Failing after 4m3s
Build and Push Images / build-nvidia (push) Failing after 4m6s
Build and Push Images / build-strixhalo (push) Failing after 18s
Build and Push Images / build-intel (push) Failing after 21s
- Use BuildKit syntax 1.7 with cache mounts for apt/uv
- Switch from pip to uv for 10-100x faster installs (ADR-0014)
- Add OCI Image Spec labels for container metadata
- Add HEALTHCHECK directives for orchestration
- Add .dockerignore to reduce context size
- Update Makefile with buildx and lint target
- Add retry logic to ray-entrypoint.sh
Refs: ADR-0012 (uv), ADR-0014 (Docker best practices)
2026-02-02 07:26:27 -05:00
a16ffff73f
feat: Add GPU-specific Ray worker images with CI/CD
...
Build and Push Images / build-nvidia (push) Failing after 1s
Build and Push Images / build-rdna2 (push) Failing after 1s
Build and Push Images / build-strixhalo (push) Failing after 1s
Build and Push Images / build-intel (push) Failing after 1s
- Add Dockerfiles for nvidia, rdna2, strixhalo, and intel GPU targets
- Add ray-serve modules (embeddings, whisper, tts, llm, reranker)
- Add Gitea Actions workflow for automated builds
- Add Makefile for local development
- Update README with comprehensive documentation
2026-02-01 15:04:31 -05:00
e68d5c1f0e
Initial commit
2026-02-01 19:59:37 +00:00