All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
New ADRs: - 0043: Cilium CNI and Network Fabric - 0044: DNS and External Access Architecture - 0045: TLS Certificate Strategy (cert-manager) - 0046: Companions Frontend Architecture - 0047: MLflow Experiment Tracking and Model Registry - 0048: Entertainment and Media Stack - 0049: Self-Hosted Productivity Suite - 0050: Argo Rollouts Progressive Delivery - 0051: KEDA Event-Driven Autoscaling - 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS) - 0053: Vaultwarden Password Management README updated with table entries and badge count (53 total).
105 lines
4.2 KiB
Markdown
105 lines
4.2 KiB
Markdown
# Cluster Utilities and Optimization
|
|
|
|
* Status: accepted
|
|
* Date: 2026-02-09
|
|
* Deciders: Billy
|
|
* Technical Story: Deploy supporting utilities that improve cluster efficiency, reliability, and operational overhead
|
|
|
|
## Context and Problem Statement
|
|
|
|
A Kubernetes cluster running diverse workloads benefits from several operational utilities — image caching to reduce pull times, workload rebalancing for efficiency, automatic secret/configmap reloading, and shared storage provisioning. Each is small individually but collectively they significantly improve cluster operations.
|
|
|
|
How do we manage these cross-cutting cluster utilities consistently?
|
|
|
|
## Decision Drivers
|
|
|
|
* Reduce container image pull latency across nodes
|
|
* Automatically rebalance workloads for even resource utilization
|
|
* Eliminate manual pod restarts when secrets/configmaps change
|
|
* Provide shared NFS storage class for ReadWriteMany workloads
|
|
* Minimal resource overhead per utility
|
|
|
|
## Decision Outcome
|
|
|
|
Deploy four cluster utilities — Spegel (image cache), Descheduler (pod rebalancing), Reloader (config reload), and CSI-NFS (NFS StorageClass) — each solving a distinct operational concern with minimal footprint.
|
|
|
|
## Components
|
|
|
|
### Spegel — Peer-to-Peer Image Registry Mirror
|
|
|
|
Spegel distributes container images between nodes, so pulling an image already present on _any_ node avoids hitting the external registry.
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Chart** | `spegel` OCI chart v0.3.0 |
|
|
| **Namespace** | `spegel` |
|
|
| **Port** | 29999 |
|
|
| **Mode** | P2P mirror (DaemonSet, one pod per node) |
|
|
|
|
**Mirrored Registries:**
|
|
- `docker.io`, `ghcr.io`, `quay.io`, `gcr.io`
|
|
- `registry.k8s.io`, `mcr.microsoft.com`
|
|
- `git.daviestechlabs.io` (Gitea), `public.ecr.aws`
|
|
|
|
Spegel registers as a containerd mirror, intercepting pulls before they reach the internet. Especially valuable for large ML model images (5-20GB) that would otherwise be pulled repeatedly.
|
|
|
|
### Descheduler — Workload Rebalancing
|
|
|
|
The descheduler evicts pods to allow the scheduler to redistribute them more optimally.
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Chart** | `descheduler` v0.33.0 |
|
|
| **Namespace** | `descheduler` |
|
|
| **Mode** | Deployment (continuous) |
|
|
| **Strategy** | `LowNodeUtilization` |
|
|
|
|
**Excluded Namespaces:** `ai-ml`, `kuberay`, `gitea`
|
|
|
|
AI/ML and Gitea namespaces are excluded because GPU workloads and git repositories should not be disrupted by rebalancing.
|
|
|
|
### Reloader — Automatic Config Reload
|
|
|
|
Reloader watches for Secret and ConfigMap changes and triggers rolling restarts on Deployments/StatefulSets that reference them.
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Chart** | `reloader` v2.2.7 |
|
|
| **Namespace** | `reloader` |
|
|
| **Monitoring** | PodMonitor enabled |
|
|
| **Security** | Read-only root filesystem |
|
|
|
|
Eliminates manual `kubectl rollout restart` after Vault secret rotations or config changes.
|
|
|
|
### CSI-NFS — NFS StorageClass
|
|
|
|
Provides a Kubernetes StorageClass backed by the NAS (candlekeep) NFS export.
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Chart** | `csi-driver-nfs` v4.13.0 |
|
|
| **Namespace** | `csi-nfs` |
|
|
| **StorageClass** | `nfs-slow` |
|
|
| **NFS Server** | `candlekeep` → `/kubernetes` |
|
|
| **NFS Version** | 4.1, `nconnect=16` |
|
|
|
|
`nfs-slow` provides ReadWriteMany access for workloads that need shared storage (media library, ML artifacts, photo libraries). Named "slow" relative to Longhorn SSDs, not in absolute terms. The `nconnect=16` option enables 16 parallel NFS connections per mount for improved throughput.
|
|
|
|
## Resource Overhead
|
|
|
|
| Utility | Pods | CPU Request | Memory Request |
|
|
|---------|------|-------------|----------------|
|
|
| Spegel | 1 per node (DaemonSet) | — | — |
|
|
| Descheduler | 1 | — | — |
|
|
| Reloader | 1 | — | — |
|
|
| CSI-NFS | 1 controller + DaemonSet | — | — |
|
|
| **Total** | ~8-12 pods | Minimal | Minimal |
|
|
|
|
All four utilities are lightweight and designed to run alongside workloads with negligible resource impact.
|
|
|
|
## Links
|
|
|
|
* Related to [ADR-0026](0026-storage-strategy.md) (Longhorn + NFS storage strategy)
|
|
* Related to [ADR-0003](0003-bare-metal-kubernetes.md) (Talos container runtime / containerd)
|
|
* [Spegel](https://github.com/spegel-org/spegel) · [Descheduler](https://sigs.k8s.io/descheduler) · [Reloader](https://github.com/stakater/Reloader) · [CSI-NFS](https://github.com/kubernetes-csi/csi-driver-nfs)
|