kuberay-images/amdsmi-shim/strixhalo_vram_fix.py at e7642b86ddf100ee3edbe1b6ec2611e2d8b04c30

daviestechlabs/kuberay-images

Fork 0

Files

Billy D. e7642b86dd

Build and Push Images / determine-version (push) Successful in 4s

Details

Build and Push Images / build (Dockerfile.ray-worker-nvidia, nvidia) (push) Failing after 25s

Details

Build and Push Images / build (Dockerfile.ray-worker-intel, intel) (push) Failing after 28s

Details

Build and Push Images / build (Dockerfile.ray-worker-strixhalo, strixhalo) (push) Failing after 23s

Details

Build and Push Images / build (Dockerfile.ray-worker-rdna2, rdna2) (push) Failing after 26s

Details

Build and Push Images / Release (push) Has been skipped

Details

Build and Push Images / Notify (push) Successful in 1s

Details

feat(strixhalo): patch torch.cuda.mem_get_info for unified memory APU

On Strix Halo, PyTorch reports GTT pool (128 GiB) as device memory
instead of real VRAM (96 GiB from BIOS). vLLM uses mem_get_info() to
pre-allocate and refuses to start when free GTT (29 GiB) < requested.

The strixhalo_vram_fix.pth hook auto-patches mem_get_info on Python
startup to read real VRAM total/used from /sys/class/drm sysfs.
Only activates when PyTorch total differs >10% from sysfs VRAM.

2026-02-06 16:29:46 -05:00

3.3 KiB

Raw Blame History

View Raw

3.3 KiB Raw Blame History

3.3 KiB

Raw Blame History