tuning up runner improvements.
This commit is contained in:
@@ -110,8 +110,37 @@ RUN uv pip install --system \
|
|||||||
- Combine related commands into single RUN layers
|
- Combine related commands into single RUN layers
|
||||||
- Order from least to most frequently changing
|
- Order from least to most frequently changing
|
||||||
- Use multi-stage builds to reduce final image size
|
- Use multi-stage builds to reduce final image size
|
||||||
|
- Use `COPY --link` for multi-stage `COPY --from` layers to make them independent
|
||||||
|
of prior layers, improving cache reuse when base images change:
|
||||||
|
|
||||||
### 9. .dockerignore
|
```dockerfile
|
||||||
|
# --link makes this layer reusable even if the base image changes
|
||||||
|
COPY --link --from=rocm-source /opt/rocm /opt/rocm
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Registry-Based BuildKit Cache
|
||||||
|
|
||||||
|
Use `type=registry` cache instead of `type=gha` (which only works on GitHub Actions).
|
||||||
|
This stores build cache layers directly in the container registry with zstd compression:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Build and push
|
||||||
|
uses: docker/build-push-action@v5
|
||||||
|
with:
|
||||||
|
cache-from: type=registry,ref=${{ env.REGISTRY }}/image:buildcache
|
||||||
|
cache-to: type=registry,ref=${{ env.REGISTRY }}/image:buildcache,mode=max,image-manifest=true,compression=zstd
|
||||||
|
```
|
||||||
|
|
||||||
|
Benefits:
|
||||||
|
- Works on any CI system (Gitea Actions, Jenkins, etc.)
|
||||||
|
- `mode=max` caches all layers, not just final image layers
|
||||||
|
- `compression=zstd` is faster than gzip with similar compression ratios
|
||||||
|
- Cache survives runner restarts (stored in registry, not ephemeral disk)
|
||||||
|
|
||||||
|
**Important:** `type=gha` is a no-op on self-hosted Gitea runners — it requires
|
||||||
|
GitHub's cache API. Always use `type=registry` for self-hosted CI.
|
||||||
|
|
||||||
|
### 10. .dockerignore
|
||||||
|
|
||||||
All repos include a `.dockerignore`:
|
All repos include a `.dockerignore`:
|
||||||
|
|
||||||
@@ -127,7 +156,7 @@ __pycache__/
|
|||||||
.ruff_cache/
|
.ruff_cache/
|
||||||
```
|
```
|
||||||
|
|
||||||
### 10. Makefile Integration
|
### 11. Makefile Integration
|
||||||
|
|
||||||
Standard targets for building and linting:
|
Standard targets for building and linting:
|
||||||
|
|
||||||
|
|||||||
@@ -286,13 +286,58 @@ on:
|
|||||||
|
|
||||||
See [kuberay-images/.gitea/workflows/build-push.yaml](https://git.daviestechlabs.io/daviestechlabs/kuberay-images/src/branch/main/.gitea/workflows/build-push.yaml) for complete example.
|
See [kuberay-images/.gitea/workflows/build-push.yaml](https://git.daviestechlabs.io/daviestechlabs/kuberay-images/src/branch/main/.gitea/workflows/build-push.yaml) for complete example.
|
||||||
|
|
||||||
|
## Build Performance Tuning
|
||||||
|
|
||||||
|
GPU worker images are 20-30GB+ due to ROCm/CUDA/PyTorch layers. Several optimizations
|
||||||
|
are in place to avoid multi-hour rebuild/push cycles on every change.
|
||||||
|
|
||||||
|
### Registry-Based BuildKit Cache
|
||||||
|
|
||||||
|
Use `type=registry` cache (not `type=gha`, which is a no-op on Gitea runners):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
cache-from: type=registry,ref=${{ env.REGISTRY }}/image:buildcache
|
||||||
|
cache-to: type=registry,ref=${{ env.REGISTRY }}/image:buildcache,mode=max,image-manifest=true,compression=zstd
|
||||||
|
```
|
||||||
|
|
||||||
|
- `mode=max` caches all intermediate layers, not just the final image
|
||||||
|
- `compression=zstd` is faster than gzip with comparable ratios
|
||||||
|
- Cache is stored in the Gitea container registry alongside images
|
||||||
|
- Only changed layers are rebuilt and pushed on subsequent builds
|
||||||
|
|
||||||
|
### Docker Daemon Tuning
|
||||||
|
|
||||||
|
The runner's DinD daemon.json is configured for parallel transfers:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"max-concurrent-uploads": 10,
|
||||||
|
"max-concurrent-downloads": 10,
|
||||||
|
"features": {
|
||||||
|
"containerd-snapshotter": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Defaults are only 3 concurrent uploads — insufficient for images with many large layers.
|
||||||
|
|
||||||
|
### Persistent DinD Layer Cache
|
||||||
|
|
||||||
|
The runner mounts a 100Gi Longhorn PVC at `/home/rootless/.local/share/docker` to
|
||||||
|
persist Docker's layer cache across pod restarts. Without this, every runner restart
|
||||||
|
forces re-download of 10-20GB base images (ROCm, Ray, PyTorch).
|
||||||
|
|
||||||
|
| Volume | Storage Class | Size | Purpose |
|
||||||
|
|--------|---------------|------|---------|
|
||||||
|
| `gitea-runner-data` | nfs-slow | 5Gi | Runner state, workspace |
|
||||||
|
| `gitea-runner-docker-cache` | longhorn | 100Gi | Docker layer cache |
|
||||||
|
|
||||||
## Future Enhancements
|
## Future Enhancements
|
||||||
|
|
||||||
1. **Caching improvements** - Persistent layer cache across builds
|
1. **Multi-arch builds** - ARM64 support for Raspberry Pi
|
||||||
2. **Multi-arch builds** - ARM64 support for Raspberry Pi
|
2. **Security scanning** - Trivy integration in CI
|
||||||
3. **Security scanning** - Trivy integration in CI
|
3. **Signed images** - Cosign for image signatures
|
||||||
4. **Signed images** - Cosign for image signatures
|
4. **SLSA provenance** - Supply chain attestations
|
||||||
5. **SLSA provenance** - Supply chain attestations
|
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user