# KEDA Event-Driven Autoscaling * Status: accepted * Date: 2026-02-09 * Deciders: Billy * Technical Story: Scale workloads based on external event sources rather than only CPU/memory metrics ## Context and Problem Statement Kubernetes Horizontal Pod Autoscaler (HPA) scales on CPU and memory, but many homelab workloads have scaling signals from external systems — Envoy Gateway request queues, NATS queue depth, or GPU utilization. Scaling on the right signal reduces latency and avoids over-provisioning. How do we autoscale workloads based on external metrics like message queues, HTTP request rates, and custom Prometheus queries? ## Decision Drivers * Scale on NATS queue depth for inference pipelines * Scale on Envoy Gateway metrics for HTTP workloads * Prometheus integration for arbitrary custom metrics * CRD-based scalers compatible with Flux GitOps * Low resource overhead for the scaler controller itself ## Considered Options 1. **KEDA** — Kubernetes Event-Driven Autoscaling 2. **Custom HPA with Prometheus Adapter** — HPA + external-metrics API 3. **Knative Serving** — Serverless autoscaler with scale-to-zero ## Decision Outcome Chosen option: **KEDA**, because it provides a large catalog of built-in scalers (Prometheus, NATS, HTTP), supports scale-to-zero, and integrates cleanly with existing HelmRelease/Kustomization GitOps. ### Positive Consequences * 60+ built-in scalers covering all homelab event sources * ScaledObject CRDs fit naturally in GitOps workflow * Scale-to-zero for bursty workloads (saves GPU resources) * ServiceMonitors for self-monitoring via Prometheus * Grafana dashboard included for visibility ### Negative Consequences * Additional CRDs and controller pods * ScaledObject/TriggerAuthentication learning curve * Potential conflict with manually-defined HPAs ## Deployment Configuration | | | |---|---| | **Chart** | `keda` OCI chart v2.19.0 | | **Namespace** | `keda` | | **Monitoring** | ServiceMonitor enabled, Grafana dashboard provisioned | | **Webhooks** | Enabled | ## Scaling Use Cases | Workload | Scaler | Signal | Target | |----------|--------|--------|--------| | Ray Serve inference | Prometheus | Pending request queue depth | 1-4 replicas | | Envoy Gateway | Prometheus | Active connections per gateway | KEDA manages envoy proxy fleet | | Voice pipeline | NATS | Message queue length | 0-2 replicas | | Batch inference | Prometheus | Job queue size | 0-N GPU pods | ## Links * Related to [ADR-0010](0010-scalable-inference-platform.md) (inference scaling) * Related to [ADR-0038](0038-infrastructure-metrics-collection.md) (Prometheus metrics) * [KEDA Documentation](https://keda.sh/docs/)