Some checks failed
Build and Push Images / determine-version (push) Successful in 58s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
The native amdsmi from ROCm 7.1 requires libamd_smi.so linked against glibc 2.38 (Ubuntu 24.04), but the Ray base image is Ubuntu 22.04 (glibc 2.35). This caused vLLM to fail ROCm platform detection with 'No module named amdsmi' / GLIBC_2.38 not found errors. Solution: Pure-Python amdsmi shim that reads GPU info from sysfs (/sys/class/drm/*) instead of the native library. Provides the full API surface used by both vLLM (platform detection, device info) and PyTorch (device counting, memory/power/temp monitoring). Tested in-container: vLLM detects RocmPlatform, PyTorch sees GPU (Radeon 8060S, 128GB, HIP 7.3), DeviceConfig resolves to 'cuda'. Changes: - Add amdsmi-shim/ package with sysfs-backed implementation - Update Dockerfile to install shim after vLLM/torch - Add amdsmi-shim/ to .dockerignore explicit includes
46 lines
414 B
Plaintext
46 lines
414 B
Plaintext
# Git
|
|
.git
|
|
.gitignore
|
|
.gitea
|
|
|
|
# Documentation
|
|
*.md
|
|
LICENSE
|
|
docs/
|
|
|
|
# IDE and editors
|
|
.vscode/
|
|
.idea/
|
|
*.swp
|
|
*.swo
|
|
*~
|
|
|
|
# Python artifacts
|
|
__pycache__/
|
|
*.py[cod]
|
|
*$py.class
|
|
.pytest_cache/
|
|
.venv/
|
|
venv/
|
|
.env
|
|
*.egg-info/
|
|
dist/
|
|
build/
|
|
|
|
# OS files
|
|
.DS_Store
|
|
Thumbs.db
|
|
|
|
# Build logs
|
|
*.log
|
|
*.tmp
|
|
|
|
# Local development
|
|
Makefile
|
|
.goreleaser.yml
|
|
|
|
# Don't ignore these (explicitly include)
|
|
!ray-serve/
|
|
!dockerfiles/
|
|
!amdsmi-shim/
|