homelab-design/decisions/0051-keda-event-driven-autoscaling.md

# KEDA Event-Driven Autoscaling

* Status: accepted
* Date: 2026-02-09
* Deciders: Billy
* Technical Story: Scale workloads based on external event sources rather than only CPU/memory metrics

## Context and Problem Statement

Kubernetes Horizontal Pod Autoscaler (HPA) scales on CPU and memory, but many homelab workloads have scaling signals from external systems — Envoy Gateway request queues, NATS queue depth, or GPU utilization. Scaling on the right signal reduces latency and avoids over-provisioning.

How do we autoscale workloads based on external metrics like message queues, HTTP request rates, and custom Prometheus queries?

## Decision Drivers

* Scale on NATS queue depth for inference pipelines
* Scale on Envoy Gateway metrics for HTTP workloads
* Prometheus integration for arbitrary custom metrics
* CRD-based scalers compatible with Flux GitOps
* Low resource overhead for the scaler controller itself

## Considered Options

1. **KEDA** — Kubernetes Event-Driven Autoscaling
2. **Custom HPA with Prometheus Adapter** — HPA + external-metrics API
3. **Knative Serving** — Serverless autoscaler with scale-to-zero

## Decision Outcome

Chosen option: **KEDA**, because it provides a large catalog of built-in scalers (Prometheus, NATS, HTTP), supports scale-to-zero, and integrates cleanly with existing HelmRelease/Kustomization GitOps.

### Positive Consequences

* 60+ built-in scalers covering all homelab event sources
* ScaledObject CRDs fit naturally in GitOps workflow
* Scale-to-zero for bursty workloads (saves GPU resources)
* ServiceMonitors for self-monitoring via Prometheus
* Grafana dashboard included for visibility

### Negative Consequences

* Additional CRDs and controller pods
* ScaledObject/TriggerAuthentication learning curve
* Potential conflict with manually-defined HPAs

## Deployment Configuration

| | |
|---|---|
| **Chart** | `keda` OCI chart v2.19.0 |
| **Namespace** | `keda` |
| **Monitoring** | ServiceMonitor enabled, Grafana dashboard provisioned |
| **Webhooks** | Enabled |

## Scaling Use Cases

| Workload | Scaler | Signal | Target |
|----------|--------|--------|--------|
| Ray Serve inference | Prometheus | Pending request queue depth | 1-4 replicas |
| Envoy Gateway | Prometheus | Active connections per gateway | KEDA manages envoy proxy fleet |
| Voice pipeline | NATS | Message queue length | 0-2 replicas |
| Batch inference | Prometheus | Job queue size | 0-N GPU pods |

## Links

* Related to [ADR-0010](0010-scalable-inference-platform.md) (inference scaling)
* Related to [ADR-0038](0038-infrastructure-metrics-collection.md) (Prometheus metrics)
* [KEDA Documentation](https://keda.sh/docs/)