- Add Gitea Action to auto-update README badges and ADR table on push - Standardize 8 ADRs from heading-style to inline metadata format - Add shields.io badges for ADR counts (total/accepted/proposed) - Replace static directory listing with linked ADR table in README - Accept ADR-0030 (MFA/YubiKey strategy)
5.2 KiB
Docker Build Best Practices
- Status: accepted
- Date: 2026-02-02
- Deciders: Billy
- Technical Story: Standardize container build practices for GPU-heavy ML images
Context
Our ML/AI platform relies heavily on containerized services, particularly GPU workers for KubeRay that include large dependencies (PyTorch, vLLM, ROCm, CUDA). These images can take 30+ minutes to build and exceed 10GB in size. We need standardized practices to ensure:
- Fast rebuilds - Avoid re-downloading dependencies on every build
- Reproducibility - Consistent builds across different machines
- Security - Non-root execution, minimal attack surface
- Observability - Proper metadata for image management
- Consistency - Same patterns across all Dockerfiles
Decision
We adopt the following Docker build best practices across all repositories:
1. BuildKit Syntax and Features
# syntax=docker/dockerfile:1.7
All Dockerfiles use BuildKit syntax 1.7+ for cache mount support.
2. Use uv for Python Package Installation
Replace pip with uv for dramatically faster installs (10-100x):
# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
# Install packages with cache mount
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system --no-cache \
'package>=1.0,<2.0'
Benefits:
- Parallel downloads and installs
- Better dependency resolution
- Consistent with ADR-0012 (uv for Python development)
3. Cache Mounts for Package Managers
# APT cache mount
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
package1 package2
# uv/pip cache mount
RUN --mount=type=cache,target=/home/ray/.cache/uv,uid=1000,gid=1000 \
uv pip install --system 'package>=1.0'
4. OCI Image Specification Labels
All images include standard metadata:
LABEL org.opencontainers.image.title="Service Name"
LABEL org.opencontainers.image.description="Service description"
LABEL org.opencontainers.image.vendor="DaviesTechLabs"
LABEL org.opencontainers.image.source="https://git.daviestechlabs.io/daviestechlabs/repo"
LABEL org.opencontainers.image.licenses="MIT"
5. Health Checks
All service images include HEALTHCHECK:
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
6. Non-Root Execution
Services run as unprivileged users:
USER ray # or appuser, 1000:1000
7. Version Pinning with Ranges
Dependencies use minimum version with upper bound:
RUN uv pip install --system \
'transformers>=4.35.0,<5.0' \
'torch>=2.0.0,<3.0'
8. Layer Optimization
- Combine related commands into single RUN layers
- Order from least to most frequently changing
- Use multi-stage builds to reduce final image size
- Use
COPY --linkfor multi-stageCOPY --fromlayers to make them independent of prior layers, improving cache reuse when base images change:
# --link makes this layer reusable even if the base image changes
COPY --link --from=rocm-source /opt/rocm /opt/rocm
9. Registry-Based BuildKit Cache
Use type=registry cache instead of type=gha (which only works on GitHub Actions).
This stores build cache layers directly in the container registry with zstd compression:
- name: Build and push
uses: docker/build-push-action@v5
with:
cache-from: type=registry,ref=${{ env.REGISTRY }}/image:buildcache
cache-to: type=registry,ref=${{ env.REGISTRY }}/image:buildcache,mode=max,image-manifest=true,compression=zstd
Benefits:
- Works on any CI system (Gitea Actions, Jenkins, etc.)
mode=maxcaches all layers, not just final image layerscompression=zstdis faster than gzip with similar compression ratios- Cache survives runner restarts (stored in registry, not ephemeral disk)
Important: type=gha is a no-op on self-hosted Gitea runners — it requires
GitHub's cache API. Always use type=registry for self-hosted CI.
10. .dockerignore
All repos include a .dockerignore:
.git
.gitea
*.md
__pycache__/
*.pyc
.venv/
.mypy_cache/
.pytest_cache/
.ruff_cache/
11. Makefile Integration
Standard targets for building and linting:
lint:
hadolint Dockerfile
build:
docker buildx build --platform linux/amd64 --load -t image:tag .
Consequences
Positive
- 10-100x faster pip operations with uv cache mounts
- Consistent builds via lockfiles and version pinning
- Better observability through OCI labels
- Improved security with non-root execution
- Faster CI/CD through BuildKit caching
Negative
- Requires Docker BuildKit - Must use
DOCKER_BUILDKIT=1or buildx - Cache invalidation complexity - Cache mounts persist across builds
- Learning curve - Developers must understand BuildKit syntax