All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
New ADRs: - 0043: Cilium CNI and Network Fabric - 0044: DNS and External Access Architecture - 0045: TLS Certificate Strategy (cert-manager) - 0046: Companions Frontend Architecture - 0047: MLflow Experiment Tracking and Model Registry - 0048: Entertainment and Media Stack - 0049: Self-Hosted Productivity Suite - 0050: Argo Rollouts Progressive Delivery - 0051: KEDA Event-Driven Autoscaling - 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS) - 0053: Vaultwarden Password Management README updated with table entries and badge count (53 total).
2.6 KiB
2.6 KiB
KEDA Event-Driven Autoscaling
- Status: accepted
- Date: 2026-02-09
- Deciders: Billy
- Technical Story: Scale workloads based on external event sources rather than only CPU/memory metrics
Context and Problem Statement
Kubernetes Horizontal Pod Autoscaler (HPA) scales on CPU and memory, but many homelab workloads have scaling signals from external systems — Envoy Gateway request queues, NATS queue depth, or GPU utilization. Scaling on the right signal reduces latency and avoids over-provisioning.
How do we autoscale workloads based on external metrics like message queues, HTTP request rates, and custom Prometheus queries?
Decision Drivers
- Scale on NATS queue depth for inference pipelines
- Scale on Envoy Gateway metrics for HTTP workloads
- Prometheus integration for arbitrary custom metrics
- CRD-based scalers compatible with Flux GitOps
- Low resource overhead for the scaler controller itself
Considered Options
- KEDA — Kubernetes Event-Driven Autoscaling
- Custom HPA with Prometheus Adapter — HPA + external-metrics API
- Knative Serving — Serverless autoscaler with scale-to-zero
Decision Outcome
Chosen option: KEDA, because it provides a large catalog of built-in scalers (Prometheus, NATS, HTTP), supports scale-to-zero, and integrates cleanly with existing HelmRelease/Kustomization GitOps.
Positive Consequences
- 60+ built-in scalers covering all homelab event sources
- ScaledObject CRDs fit naturally in GitOps workflow
- Scale-to-zero for bursty workloads (saves GPU resources)
- ServiceMonitors for self-monitoring via Prometheus
- Grafana dashboard included for visibility
Negative Consequences
- Additional CRDs and controller pods
- ScaledObject/TriggerAuthentication learning curve
- Potential conflict with manually-defined HPAs
Deployment Configuration
| Chart | keda OCI chart v2.19.0 |
| Namespace | keda |
| Monitoring | ServiceMonitor enabled, Grafana dashboard provisioned |
| Webhooks | Enabled |
Scaling Use Cases
| Workload | Scaler | Signal | Target |
|---|---|---|---|
| Ray Serve inference | Prometheus | Pending request queue depth | 1-4 replicas |
| Envoy Gateway | Prometheus | Active connections per gateway | KEDA manages envoy proxy fleet |
| Voice pipeline | NATS | Message queue length | 0-2 replicas |
| Batch inference | Prometheus | Job queue size | 0-N GPU pods |
Links
- Related to ADR-0010 (inference scaling)
- Related to ADR-0038 (Prometheus metrics)
- KEDA Documentation