New ADRs: - 0043: Cilium CNI and Network Fabric - 0044: DNS and External Access Architecture - 0045: TLS Certificate Strategy (cert-manager) - 0046: Companions Frontend Architecture - 0047: MLflow Experiment Tracking and Model Registry - 0048: Entertainment and Media Stack - 0049: Self-Hosted Productivity Suite - 0050: Argo Rollouts Progressive Delivery - 0051: KEDA Event-Driven Autoscaling - 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS) - 0053: Vaultwarden Password Management README updated with table entries and badge count (53 total).
2.9 KiB
TLS Certificate Strategy
- Status: accepted
- Date: 2026-02-09
- Deciders: Billy
- Technical Story: Automate TLS certificate provisioning for both public and internal services
Context and Problem Statement
Every HTTPS service in the cluster needs a valid TLS certificate. Public services need trusted certificates (Let's Encrypt), while internal services can use self-signed certificates. Manual certificate management doesn't scale across 30+ services.
How do we automate certificate issuance and renewal for both public and internal domains?
Decision Drivers
- Fully automated certificate lifecycle (issuance, renewal, rotation)
- Wildcard certificates to avoid per-service certificate sprawl
- DNS-01 challenge for wildcard support (HTTP-01 can't do wildcards)
- Internal services need certificates too (browser warnings are unacceptable)
- Zero downtime during renewal
Decision Outcome
Deploy cert-manager with two ClusterIssuers: Let's Encrypt (DNS-01 via Cloudflare) for public domains, and a self-signed issuer for internal domains.
Deployment Configuration
| Chart | cert-manager from oci://quay.io/jetstack/charts/cert-manager |
| Version | v1.19.3 |
| Namespace | cert-manager |
| Replicas | 1 |
Certificate Issuers
letsencrypt-production (Public)
| Type | ACME (Let's Encrypt) |
| Challenge | DNS-01 via Cloudflare API |
| Nameservers | 1.1.1.1:443, 1.0.0.1:443 (DNS-over-HTTPS) |
| Zone | daviestechlabs.io |
Uses a Cloudflare API token (SOPS-encrypted) to create DNS-01 challenge TXT records. Recursive nameservers configured to use Cloudflare DoH for faster propagation checks.
selfsigned-internal (Private)
| Type | Self-Signed |
| Use | *.lab.daviestechlabs.io internal services |
Used for internal services where browser trust isn't critical (admin UIs accessed by the operator).
Certificates
| Domain | Issuer | Type | Duration | Renewal |
|---|---|---|---|---|
daviestechlabs.io + *.daviestechlabs.io |
letsencrypt-production | Wildcard | 90 days (LE default) | Auto |
lab.daviestechlabs.io + *.lab.daviestechlabs.io |
selfsigned-internal | Wildcard | 1 year | 30 days before expiry |
Wildcard certificates are used to avoid creating individual certificates per service. Both certificates are referenced by the Envoy Gateway listeners.
Integration Points
- Cloudflare: API token for DNS-01 challenges (stored as SOPS-encrypted Secret)
- Envoy Gateway: References certificates in Gateway listener TLS configuration
- Flux: Health check validates ClusterIssuer readiness before dependent resources
- Prometheus: ServiceMonitor enabled for cert-manager metrics
Links
- Related to ADR-0044 (DNS architecture)
- Related to ADR-0010 (Gateway TLS listeners)
- cert-manager Documentation