docs: add ADRs 0043-0053 covering remaining architecture gaps
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s

New ADRs:
- 0043: Cilium CNI and Network Fabric
- 0044: DNS and External Access Architecture
- 0045: TLS Certificate Strategy (cert-manager)
- 0046: Companions Frontend Architecture
- 0047: MLflow Experiment Tracking and Model Registry
- 0048: Entertainment and Media Stack
- 0049: Self-Hosted Productivity Suite
- 0050: Argo Rollouts Progressive Delivery
- 0051: KEDA Event-Driven Autoscaling
- 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS)
- 0053: Vaultwarden Password Management

README updated with table entries and badge count (53 total).
This commit is contained in:
2026-02-09 18:36:39 -05:00
parent 49ce970780
commit 5846d0dc16
12 changed files with 1141 additions and 1 deletions

View File

@@ -0,0 +1,71 @@
# Argo Rollouts Progressive Delivery
* Status: accepted
* Date: 2026-02-09
* Deciders: Billy
* Technical Story: Enable progressive delivery (canary, blue-green) for safer deployments alongside existing Argo Workflows
## Context and Problem Statement
Standard Kubernetes Deployments use a rolling update strategy that replaces all pods at once. For critical services, this creates risk — a bad deployment affects all traffic immediately. Progressive delivery allows gradual traffic shifting with automated rollback on failure.
How do we add progressive delivery capabilities without duplicating the existing Argo Workflows infrastructure?
## Decision Drivers
* Reduce blast radius of bad deployments
* Automated rollback on failure metrics
* Complement (not replace) existing GitOps deployment via Flux
* Reuse Argo ecosystem already deployed for workflows
* Dashboard for deployment visibility
## Considered Options
1. **Argo Rollouts** — Progressive delivery controller from Argo project
2. **Flagger** — Flux-native progressive delivery
3. **Istio traffic management** — Service mesh canary routing
4. **Manual canary via Flux** — Separate canary Deployments managed by Flux
## Decision Outcome
Chosen option: **Argo Rollouts**, because it complements the existing Argo Workflows deployment, provides native canary and blue-green strategies, and includes a dashboard for deployment visibility.
### Positive Consequences
* Canary and blue-green deployment strategies with automated analysis
* Integrates with Envoy Gateway for traffic splitting
* Dashboard for real-time deployment progress
* Same Argo ecosystem as existing Workflows (shared expertise)
* CRD-based — works with GitOps (Flux manages Rollout resources)
### Negative Consequences
* Another CRD set to manage alongside standard Deployments
* Not all workloads need progressive delivery (overhead for simple services)
* Dashboard currently available only via port-forward (no ingress)
## Deployment Configuration
| | |
|---|---|
| **Chart** | `argo-rollouts` from Argo HelmRepository |
| **Namespace** | `ci-cd` |
| **Replicas** | 1 |
| **Dashboard** | Enabled |
| **CRDs** | `CreateReplace` on install and upgrade |
Managed by Flux Kustomization with `wait: true` to ensure the controller is ready before dependent Rollout resources are applied.
## Use Cases
| Strategy | When to Use | Example |
|----------|-------------|---------|
| Canary | Gradual traffic shift with metric analysis | AI inference endpoint updates |
| Blue-Green | Zero-downtime full cutover with instant rollback | Companions frontend releases |
| Rolling (standard) | Low-risk config changes | Most infrastructure services |
## Links
* Related to [ADR-0009](0009-dual-workflow-engines.md) (Argo ecosystem)
* Related to [ADR-0031](0031-gitea-cicd-strategy.md) (CI/CD pipeline)
* [Argo Rollouts Documentation](https://argoproj.github.io/rollouts/)