kuberay-images/amdsmi-shim/strixhalo_vram_fix.py at d1b6d78c66f2013707d4635e1457de871ce4ef96

daviestechlabs/kuberay-images

Fork 0

Files

Billy D. d1b6d78c66

Build and Push Images / determine-version (push) Successful in 5s

Details

Build and Push Images / build (Dockerfile.ray-worker-nvidia, nvidia) (push) Failing after 24s

Details

Build and Push Images / build (Dockerfile.ray-worker-intel, intel) (push) Failing after 27s

Details

Build and Push Images / build (Dockerfile.ray-worker-strixhalo, strixhalo) (push) Failing after 22s

Details

Build and Push Images / build (Dockerfile.ray-worker-rdna2, rdna2) (push) Failing after 24s

Details

Build and Push Images / Release (push) Has been skipped

Details

Build and Push Images / Notify (push) Successful in 1s

Details

fix(strixhalo): skip VRAM patch in low-memory init containers

KubeRay's auto-injected wait-gcs-ready init container has only 256Mi
memory limit. The .pth hook was unconditionally importing torch+ROCm
which requires >256Mi, causing OOMKill.

Now checks cgroup memory limit first — if under 512Mi, skips the
expensive torch import entirely. The VRAM patch is only needed by the
main Ray worker process, not by health-check init containers.

2026-02-06 19:15:49 -05:00

4.1 KiB

Raw Blame History

View Raw

4.1 KiB Raw Blame History

4.1 KiB

Raw Blame History