Some checks failed
Build and Push Images / determine-version (push) Successful in 58s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
The native amdsmi from ROCm 7.1 requires libamd_smi.so linked against glibc 2.38 (Ubuntu 24.04), but the Ray base image is Ubuntu 22.04 (glibc 2.35). This caused vLLM to fail ROCm platform detection with 'No module named amdsmi' / GLIBC_2.38 not found errors. Solution: Pure-Python amdsmi shim that reads GPU info from sysfs (/sys/class/drm/*) instead of the native library. Provides the full API surface used by both vLLM (platform detection, device info) and PyTorch (device counting, memory/power/temp monitoring). Tested in-container: vLLM detects RocmPlatform, PyTorch sees GPU (Radeon 8060S, 128GB, HIP 7.3), DeviceConfig resolves to 'cuda'. Changes: - Add amdsmi-shim/ package with sysfs-backed implementation - Update Dockerfile to install shim after vLLM/torch - Add amdsmi-shim/ to .dockerignore explicit includes
14 lines
329 B
TOML
14 lines
329 B
TOML
[build-system]
|
|
requires = ["setuptools>=64"]
|
|
build-backend = "setuptools.build_meta"
|
|
|
|
[project]
|
|
name = "amdsmi"
|
|
version = "0.1.0"
|
|
description = "Lightweight sysfs-based amdsmi shim for ROCm platform detection without libamd_smi.so"
|
|
requires-python = ">=3.8"
|
|
license = "MIT"
|
|
|
|
[tool.setuptools.packages.find]
|
|
include = ["amdsmi*"]
|