All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
New ADRs: - 0043: Cilium CNI and Network Fabric - 0044: DNS and External Access Architecture - 0045: TLS Certificate Strategy (cert-manager) - 0046: Companions Frontend Architecture - 0047: MLflow Experiment Tracking and Model Registry - 0048: Entertainment and Media Stack - 0049: Self-Hosted Productivity Suite - 0050: Argo Rollouts Progressive Delivery - 0051: KEDA Event-Driven Autoscaling - 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS) - 0053: Vaultwarden Password Management README updated with table entries and badge count (53 total).
72 lines
2.8 KiB
Markdown
72 lines
2.8 KiB
Markdown
# Argo Rollouts Progressive Delivery
|
|
|
|
* Status: accepted
|
|
* Date: 2026-02-09
|
|
* Deciders: Billy
|
|
* Technical Story: Enable progressive delivery (canary, blue-green) for safer deployments alongside existing Argo Workflows
|
|
|
|
## Context and Problem Statement
|
|
|
|
Standard Kubernetes Deployments use a rolling update strategy that replaces all pods at once. For critical services, this creates risk — a bad deployment affects all traffic immediately. Progressive delivery allows gradual traffic shifting with automated rollback on failure.
|
|
|
|
How do we add progressive delivery capabilities without duplicating the existing Argo Workflows infrastructure?
|
|
|
|
## Decision Drivers
|
|
|
|
* Reduce blast radius of bad deployments
|
|
* Automated rollback on failure metrics
|
|
* Complement (not replace) existing GitOps deployment via Flux
|
|
* Reuse Argo ecosystem already deployed for workflows
|
|
* Dashboard for deployment visibility
|
|
|
|
## Considered Options
|
|
|
|
1. **Argo Rollouts** — Progressive delivery controller from Argo project
|
|
2. **Flagger** — Flux-native progressive delivery
|
|
3. **Istio traffic management** — Service mesh canary routing
|
|
4. **Manual canary via Flux** — Separate canary Deployments managed by Flux
|
|
|
|
## Decision Outcome
|
|
|
|
Chosen option: **Argo Rollouts**, because it complements the existing Argo Workflows deployment, provides native canary and blue-green strategies, and includes a dashboard for deployment visibility.
|
|
|
|
### Positive Consequences
|
|
|
|
* Canary and blue-green deployment strategies with automated analysis
|
|
* Integrates with Envoy Gateway for traffic splitting
|
|
* Dashboard for real-time deployment progress
|
|
* Same Argo ecosystem as existing Workflows (shared expertise)
|
|
* CRD-based — works with GitOps (Flux manages Rollout resources)
|
|
|
|
### Negative Consequences
|
|
|
|
* Another CRD set to manage alongside standard Deployments
|
|
* Not all workloads need progressive delivery (overhead for simple services)
|
|
* Dashboard currently available only via port-forward (no ingress)
|
|
|
|
## Deployment Configuration
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Chart** | `argo-rollouts` from Argo HelmRepository |
|
|
| **Namespace** | `ci-cd` |
|
|
| **Replicas** | 1 |
|
|
| **Dashboard** | Enabled |
|
|
| **CRDs** | `CreateReplace` on install and upgrade |
|
|
|
|
Managed by Flux Kustomization with `wait: true` to ensure the controller is ready before dependent Rollout resources are applied.
|
|
|
|
## Use Cases
|
|
|
|
| Strategy | When to Use | Example |
|
|
|----------|-------------|---------|
|
|
| Canary | Gradual traffic shift with metric analysis | AI inference endpoint updates |
|
|
| Blue-Green | Zero-downtime full cutover with instant rollback | Companions frontend releases |
|
|
| Rolling (standard) | Low-risk config changes | Most infrastructure services |
|
|
|
|
## Links
|
|
|
|
* Related to [ADR-0009](0009-dual-workflow-engines.md) (Argo ecosystem)
|
|
* Related to [ADR-0031](0031-gitea-cicd-strategy.md) (CI/CD pipeline)
|
|
* [Argo Rollouts Documentation](https://argoproj.github.io/rollouts/)
|