daviestechlabs/kuberay-images

Fork 0

Go to file

Billy D. 300582a520

Build and Push Images / determine-version (push) Successful in 58s

Details

Build and Push Images / Release (push) Has been cancelled

Details

Build and Push Images / Notify (push) Has been cancelled

Details

Build and Push Images / build-strixhalo (push) Has been cancelled

Details

Build and Push Images / build-intel (push) Has been cancelled

Details

Build and Push Images / build-nvidia (push) Has been cancelled

Details

Build and Push Images / build-rdna2 (push) Has been cancelled

Details

feat(strixhalo): add amdsmi sysfs shim to bypass glibc 2.38 requirement

The native amdsmi from ROCm 7.1 requires libamd_smi.so linked against
glibc 2.38 (Ubuntu 24.04), but the Ray base image is Ubuntu 22.04
(glibc 2.35). This caused vLLM to fail ROCm platform detection with
'No module named amdsmi' / GLIBC_2.38 not found errors.

Solution: Pure-Python amdsmi shim that reads GPU info from sysfs
(/sys/class/drm/*) instead of the native library. Provides the full
API surface used by both vLLM (platform detection, device info) and
PyTorch (device counting, memory/power/temp monitoring).

Tested in-container: vLLM detects RocmPlatform, PyTorch sees GPU
(Radeon 8060S, 128GB, HIP 7.3), DeviceConfig resolves to 'cuda'.

Changes:
- Add amdsmi-shim/ package with sysfs-backed implementation
- Update Dockerfile to install shim after vLLM/torch
- Add amdsmi-shim/ to .dockerignore explicit includes

2026-02-06 08:28:07 -05:00

.gitea/workflows

overhaul image builds.

2026-02-06 07:47:37 -05:00

amdsmi-shim

feat(strixhalo): add amdsmi sysfs shim to bypass glibc 2.38 requirement

2026-02-06 08:28:07 -05:00

dockerfiles

feat(strixhalo): add amdsmi sysfs shim to bypass glibc 2.38 requirement

2026-02-06 08:28:07 -05:00

.dockerignore

feat(strixhalo): add amdsmi sysfs shim to bypass glibc 2.38 requirement

2026-02-06 08:28:07 -05:00

.gitignore

feat: Add GPU-specific Ray worker images with CI/CD

2026-02-01 15:04:31 -05:00

LICENSE

Initial commit

2026-02-01 19:59:37 +00:00

Makefile

feat: add podman support to Makefile

2026-02-05 17:23:18 -05:00

README.md

build: optimize Dockerfiles for production

2026-02-02 07:26:27 -05:00

README.md

KubeRay Worker Images

GPU-specific Ray worker images for the DaviesTechLabs AI/ML platform.

Features

BuildKit optimized: Cache mounts for apt and pip speed up rebuilds
OCI compliant: Standard image labels (org.opencontainers.image.*)
Health checks: Built-in HEALTHCHECK for container orchestration
Non-root execution: Ray runs as unprivileged ray user
Retry logic: Entrypoint waits for Ray head with exponential backoff

Images

Image	GPU Target	Workloads	Base
`ray-worker-nvidia`	NVIDIA CUDA 12.1 (RTX 2070)	Whisper STT, XTTS TTS	`rayproject/ray-ml:2.53.0-py310-cu121`
`ray-worker-rdna2`	AMD ROCm 6.4 (Radeon 680M)	BGE Embeddings	`rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.6.0`
`ray-worker-strixhalo`	AMD ROCm 7.1 (Strix Halo)	vLLM, BGE	`rocm/pytorch:rocm7.1_ubuntu24.04_py3.12_pytorch_release_2.8.0`
`ray-worker-intel`	Intel XPU (Arc)	BGE Reranker	`rayproject/ray-ml:2.53.0-py310`

Building Locally

# Lint Dockerfiles (requires hadolint)
make lint

# Build all images
make build-all

# Build specific image
make build-nvidia
make build-rdna2
make build-strixhalo
make build-intel

# Push to Gitea registry (requires login)
make login
make push-all

# Release with version tag
make VERSION=v1.0.0 release

CI/CD

Images are automatically built and pushed to git.daviestechlabs.io package registry on:

Push to main branch
Git tag creation (e.g., v1.0.0)

Gitea Actions Secrets Required

Add these secrets in Gitea repo settings → Actions → Secrets:

Secret	Description
`REGISTRY_USER`	Gitea username
`REGISTRY_TOKEN`	Gitea access token with `package:write` scope

Directory Structure

kuberay-images/
├── dockerfiles/
│   ├── Dockerfile.ray-worker-nvidia
│   ├── Dockerfile.ray-worker-rdna2
│   ├── Dockerfile.ray-worker-strixhalo
│   ├── Dockerfile.ray-worker-intel
│   └── ray-entrypoint.sh
├── ray-serve/
│   ├── serve_embeddings.py
│   ├── serve_whisper.py
│   ├── serve_tts.py
│   ├── serve_llm.py
│   ├── serve_reranker.py
│   └── requirements.txt
├── .gitea/workflows/
│   └── build-push.yaml
├── Makefile
└── README.md

Environment Variables

Variable	Description	Default
`RAY_HEAD_SVC`	Ray head service name	`ai-inference-raycluster-head-svc`
`GPU_RESOURCE`	Custom Ray resource name	`gpu_nvidia`, `gpu_amd`, etc.
`NUM_GPUS`	Number of GPUs to expose	`1`

Node Allocation

Node	Image	GPU	Memory
elminster	ray-worker-nvidia	RTX 2070	8GB VRAM
khelben	ray-worker-strixhalo	Strix Halo	64GB Unified
drizzt	ray-worker-rdna2	Radeon 680M	12GB VRAM
danilo	ray-worker-intel	Intel Arc	16GB Shared

homelab-design - Architecture documentation
homelab-k8s2 - Kubernetes manifests

README.md

KubeRay Worker Images

Features

Images

Building Locally

CI/CD

Gitea Actions Secrets Required

Directory Structure

Environment Variables

Node Allocation

Related