Files
homelab-design/decisions/0051-keda-event-driven-autoscaling.md
Billy D. 5846d0dc16
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
docs: add ADRs 0043-0053 covering remaining architecture gaps
New ADRs:
- 0043: Cilium CNI and Network Fabric
- 0044: DNS and External Access Architecture
- 0045: TLS Certificate Strategy (cert-manager)
- 0046: Companions Frontend Architecture
- 0047: MLflow Experiment Tracking and Model Registry
- 0048: Entertainment and Media Stack
- 0049: Self-Hosted Productivity Suite
- 0050: Argo Rollouts Progressive Delivery
- 0051: KEDA Event-Driven Autoscaling
- 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS)
- 0053: Vaultwarden Password Management

README updated with table entries and badge count (53 total).
2026-02-09 18:37:14 -05:00

2.6 KiB

KEDA Event-Driven Autoscaling

  • Status: accepted
  • Date: 2026-02-09
  • Deciders: Billy
  • Technical Story: Scale workloads based on external event sources rather than only CPU/memory metrics

Context and Problem Statement

Kubernetes Horizontal Pod Autoscaler (HPA) scales on CPU and memory, but many homelab workloads have scaling signals from external systems — Envoy Gateway request queues, NATS queue depth, or GPU utilization. Scaling on the right signal reduces latency and avoids over-provisioning.

How do we autoscale workloads based on external metrics like message queues, HTTP request rates, and custom Prometheus queries?

Decision Drivers

  • Scale on NATS queue depth for inference pipelines
  • Scale on Envoy Gateway metrics for HTTP workloads
  • Prometheus integration for arbitrary custom metrics
  • CRD-based scalers compatible with Flux GitOps
  • Low resource overhead for the scaler controller itself

Considered Options

  1. KEDA — Kubernetes Event-Driven Autoscaling
  2. Custom HPA with Prometheus Adapter — HPA + external-metrics API
  3. Knative Serving — Serverless autoscaler with scale-to-zero

Decision Outcome

Chosen option: KEDA, because it provides a large catalog of built-in scalers (Prometheus, NATS, HTTP), supports scale-to-zero, and integrates cleanly with existing HelmRelease/Kustomization GitOps.

Positive Consequences

  • 60+ built-in scalers covering all homelab event sources
  • ScaledObject CRDs fit naturally in GitOps workflow
  • Scale-to-zero for bursty workloads (saves GPU resources)
  • ServiceMonitors for self-monitoring via Prometheus
  • Grafana dashboard included for visibility

Negative Consequences

  • Additional CRDs and controller pods
  • ScaledObject/TriggerAuthentication learning curve
  • Potential conflict with manually-defined HPAs

Deployment Configuration

Chart keda OCI chart v2.19.0
Namespace keda
Monitoring ServiceMonitor enabled, Grafana dashboard provisioned
Webhooks Enabled

Scaling Use Cases

Workload Scaler Signal Target
Ray Serve inference Prometheus Pending request queue depth 1-4 replicas
Envoy Gateway Prometheus Active connections per gateway KEDA manages envoy proxy fleet
Voice pipeline NATS Message queue length 0-2 replicas
Batch inference Prometheus Job queue size 0-N GPU pods