Files
homelab-design/decisions/0045-tls-certificate-strategy.md
Billy D. 5846d0dc16
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
docs: add ADRs 0043-0053 covering remaining architecture gaps
New ADRs:
- 0043: Cilium CNI and Network Fabric
- 0044: DNS and External Access Architecture
- 0045: TLS Certificate Strategy (cert-manager)
- 0046: Companions Frontend Architecture
- 0047: MLflow Experiment Tracking and Model Registry
- 0048: Entertainment and Media Stack
- 0049: Self-Hosted Productivity Suite
- 0050: Argo Rollouts Progressive Delivery
- 0051: KEDA Event-Driven Autoscaling
- 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS)
- 0053: Vaultwarden Password Management

README updated with table entries and badge count (53 total).
2026-02-09 18:37:14 -05:00

2.9 KiB

TLS Certificate Strategy

  • Status: accepted
  • Date: 2026-02-09
  • Deciders: Billy
  • Technical Story: Automate TLS certificate provisioning for both public and internal services

Context and Problem Statement

Every HTTPS service in the cluster needs a valid TLS certificate. Public services need trusted certificates (Let's Encrypt), while internal services can use self-signed certificates. Manual certificate management doesn't scale across 30+ services.

How do we automate certificate issuance and renewal for both public and internal domains?

Decision Drivers

  • Fully automated certificate lifecycle (issuance, renewal, rotation)
  • Wildcard certificates to avoid per-service certificate sprawl
  • DNS-01 challenge for wildcard support (HTTP-01 can't do wildcards)
  • Internal services need certificates too (browser warnings are unacceptable)
  • Zero downtime during renewal

Decision Outcome

Deploy cert-manager with two ClusterIssuers: Let's Encrypt (DNS-01 via Cloudflare) for public domains, and a self-signed issuer for internal domains.

Deployment Configuration

Chart cert-manager from oci://quay.io/jetstack/charts/cert-manager
Version v1.19.3
Namespace cert-manager
Replicas 1

Certificate Issuers

letsencrypt-production (Public)

Type ACME (Let's Encrypt)
Challenge DNS-01 via Cloudflare API
Nameservers 1.1.1.1:443, 1.0.0.1:443 (DNS-over-HTTPS)
Zone daviestechlabs.io

Uses a Cloudflare API token (SOPS-encrypted) to create DNS-01 challenge TXT records. Recursive nameservers configured to use Cloudflare DoH for faster propagation checks.

selfsigned-internal (Private)

Type Self-Signed
Use *.lab.daviestechlabs.io internal services

Used for internal services where browser trust isn't critical (admin UIs accessed by the operator).

Certificates

Domain Issuer Type Duration Renewal
daviestechlabs.io + *.daviestechlabs.io letsencrypt-production Wildcard 90 days (LE default) Auto
lab.daviestechlabs.io + *.lab.daviestechlabs.io selfsigned-internal Wildcard 1 year 30 days before expiry

Wildcard certificates are used to avoid creating individual certificates per service. Both certificates are referenced by the Envoy Gateway listeners.

Integration Points

  • Cloudflare: API token for DNS-01 challenges (stored as SOPS-encrypted Secret)
  • Envoy Gateway: References certificates in Gateway listener TLS configuration
  • Flux: Health check validates ClusterIssuer readiness before dependent resources
  • Prometheus: ServiceMonitor enabled for cert-manager metrics