All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
New ADRs: - 0043: Cilium CNI and Network Fabric - 0044: DNS and External Access Architecture - 0045: TLS Certificate Strategy (cert-manager) - 0046: Companions Frontend Architecture - 0047: MLflow Experiment Tracking and Model Registry - 0048: Entertainment and Media Stack - 0049: Self-Hosted Productivity Suite - 0050: Argo Rollouts Progressive Delivery - 0051: KEDA Event-Driven Autoscaling - 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS) - 0053: Vaultwarden Password Management README updated with table entries and badge count (53 total).
78 lines
2.9 KiB
Markdown
78 lines
2.9 KiB
Markdown
# TLS Certificate Strategy
|
|
|
|
* Status: accepted
|
|
* Date: 2026-02-09
|
|
* Deciders: Billy
|
|
* Technical Story: Automate TLS certificate provisioning for both public and internal services
|
|
|
|
## Context and Problem Statement
|
|
|
|
Every HTTPS service in the cluster needs a valid TLS certificate. Public services need trusted certificates (Let's Encrypt), while internal services can use self-signed certificates. Manual certificate management doesn't scale across 30+ services.
|
|
|
|
How do we automate certificate issuance and renewal for both public and internal domains?
|
|
|
|
## Decision Drivers
|
|
|
|
* Fully automated certificate lifecycle (issuance, renewal, rotation)
|
|
* Wildcard certificates to avoid per-service certificate sprawl
|
|
* DNS-01 challenge for wildcard support (HTTP-01 can't do wildcards)
|
|
* Internal services need certificates too (browser warnings are unacceptable)
|
|
* Zero downtime during renewal
|
|
|
|
## Decision Outcome
|
|
|
|
Deploy **cert-manager** with two ClusterIssuers: Let's Encrypt (DNS-01 via Cloudflare) for public domains, and a self-signed issuer for internal domains.
|
|
|
|
## Deployment Configuration
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Chart** | `cert-manager` from `oci://quay.io/jetstack/charts/cert-manager` |
|
|
| **Version** | v1.19.3 |
|
|
| **Namespace** | `cert-manager` |
|
|
| **Replicas** | 1 |
|
|
|
|
## Certificate Issuers
|
|
|
|
### letsencrypt-production (Public)
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Type** | ACME (Let's Encrypt) |
|
|
| **Challenge** | DNS-01 via Cloudflare API |
|
|
| **Nameservers** | `1.1.1.1:443`, `1.0.0.1:443` (DNS-over-HTTPS) |
|
|
| **Zone** | `daviestechlabs.io` |
|
|
|
|
Uses a Cloudflare API token (SOPS-encrypted) to create DNS-01 challenge TXT records. Recursive nameservers configured to use Cloudflare DoH for faster propagation checks.
|
|
|
|
### selfsigned-internal (Private)
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Type** | Self-Signed |
|
|
| **Use** | `*.lab.daviestechlabs.io` internal services |
|
|
|
|
Used for internal services where browser trust isn't critical (admin UIs accessed by the operator).
|
|
|
|
## Certificates
|
|
|
|
| Domain | Issuer | Type | Duration | Renewal |
|
|
|--------|--------|------|----------|---------|
|
|
| `daviestechlabs.io` + `*.daviestechlabs.io` | letsencrypt-production | Wildcard | 90 days (LE default) | Auto |
|
|
| `lab.daviestechlabs.io` + `*.lab.daviestechlabs.io` | selfsigned-internal | Wildcard | 1 year | 30 days before expiry |
|
|
|
|
Wildcard certificates are used to avoid creating individual certificates per service. Both certificates are referenced by the Envoy Gateway listeners.
|
|
|
|
## Integration Points
|
|
|
|
- **Cloudflare:** API token for DNS-01 challenges (stored as SOPS-encrypted Secret)
|
|
- **Envoy Gateway:** References certificates in Gateway listener TLS configuration
|
|
- **Flux:** Health check validates ClusterIssuer readiness before dependent resources
|
|
- **Prometheus:** ServiceMonitor enabled for cert-manager metrics
|
|
|
|
## Links
|
|
|
|
* Related to [ADR-0044](0044-dns-and-external-access.md) (DNS architecture)
|
|
* Related to [ADR-0010](0010-use-envoy-gateway.md) (Gateway TLS listeners)
|
|
* [cert-manager Documentation](https://cert-manager.io/docs/)
|