Billy D. 300582a520
Some checks failed
Build and Push Images / determine-version (push) Successful in 58s
Build and Push Images / Release (push) Has been cancelled
Build and Push Images / Notify (push) Has been cancelled
Build and Push Images / build-strixhalo (push) Has been cancelled
Build and Push Images / build-intel (push) Has been cancelled
Build and Push Images / build-nvidia (push) Has been cancelled
Build and Push Images / build-rdna2 (push) Has been cancelled
feat(strixhalo): add amdsmi sysfs shim to bypass glibc 2.38 requirement
The native amdsmi from ROCm 7.1 requires libamd_smi.so linked against
glibc 2.38 (Ubuntu 24.04), but the Ray base image is Ubuntu 22.04
(glibc 2.35). This caused vLLM to fail ROCm platform detection with
'No module named amdsmi' / GLIBC_2.38 not found errors.

Solution: Pure-Python amdsmi shim that reads GPU info from sysfs
(/sys/class/drm/*) instead of the native library. Provides the full
API surface used by both vLLM (platform detection, device info) and
PyTorch (device counting, memory/power/temp monitoring).

Tested in-container: vLLM detects RocmPlatform, PyTorch sees GPU
(Radeon 8060S, 128GB, HIP 7.3), DeviceConfig resolves to 'cuda'.

Changes:
- Add amdsmi-shim/ package with sysfs-backed implementation
- Update Dockerfile to install shim after vLLM/torch
- Add amdsmi-shim/ to .dockerignore explicit includes
2026-02-06 08:28:07 -05:00
2026-02-06 07:47:37 -05:00
2026-02-01 19:59:37 +00:00
2026-02-05 17:23:18 -05:00

KubeRay Worker Images

GPU-specific Ray worker images for the DaviesTechLabs AI/ML platform.

Features

  • BuildKit optimized: Cache mounts for apt and pip speed up rebuilds
  • OCI compliant: Standard image labels (org.opencontainers.image.*)
  • Health checks: Built-in HEALTHCHECK for container orchestration
  • Non-root execution: Ray runs as unprivileged ray user
  • Retry logic: Entrypoint waits for Ray head with exponential backoff

Images

Image GPU Target Workloads Base
ray-worker-nvidia NVIDIA CUDA 12.1 (RTX 2070) Whisper STT, XTTS TTS rayproject/ray-ml:2.53.0-py310-cu121
ray-worker-rdna2 AMD ROCm 6.4 (Radeon 680M) BGE Embeddings rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.6.0
ray-worker-strixhalo AMD ROCm 7.1 (Strix Halo) vLLM, BGE rocm/pytorch:rocm7.1_ubuntu24.04_py3.12_pytorch_release_2.8.0
ray-worker-intel Intel XPU (Arc) BGE Reranker rayproject/ray-ml:2.53.0-py310

Building Locally

# Lint Dockerfiles (requires hadolint)
make lint

# Build all images
make build-all

# Build specific image
make build-nvidia
make build-rdna2
make build-strixhalo
make build-intel

# Push to Gitea registry (requires login)
make login
make push-all

# Release with version tag
make VERSION=v1.0.0 release

CI/CD

Images are automatically built and pushed to git.daviestechlabs.io package registry on:

  • Push to main branch
  • Git tag creation (e.g., v1.0.0)

Gitea Actions Secrets Required

Add these secrets in Gitea repo settings → Actions → Secrets:

Secret Description
REGISTRY_USER Gitea username
REGISTRY_TOKEN Gitea access token with package:write scope

Directory Structure

kuberay-images/
├── dockerfiles/
│   ├── Dockerfile.ray-worker-nvidia
│   ├── Dockerfile.ray-worker-rdna2
│   ├── Dockerfile.ray-worker-strixhalo
│   ├── Dockerfile.ray-worker-intel
│   └── ray-entrypoint.sh
├── ray-serve/
│   ├── serve_embeddings.py
│   ├── serve_whisper.py
│   ├── serve_tts.py
│   ├── serve_llm.py
│   ├── serve_reranker.py
│   └── requirements.txt
├── .gitea/workflows/
│   └── build-push.yaml
├── Makefile
└── README.md

Environment Variables

Variable Description Default
RAY_HEAD_SVC Ray head service name ai-inference-raycluster-head-svc
GPU_RESOURCE Custom Ray resource name gpu_nvidia, gpu_amd, etc.
NUM_GPUS Number of GPUs to expose 1

Node Allocation

Node Image GPU Memory
elminster ray-worker-nvidia RTX 2070 8GB VRAM
khelben ray-worker-strixhalo Strix Halo 64GB Unified
drizzt ray-worker-rdna2 Radeon 680M 12GB VRAM
danilo ray-worker-intel Intel Arc 16GB Shared
Description
Where all my kuberay images will go
Readme MIT 328 KiB
Languages
Python 77.3%
Makefile 14.7%
Shell 8%