Files
homelab-design/decisions/0060-internal-pki-vault.md
Billy D. 4537de7825
Some checks failed
Update README with ADR Index / update-readme (push) Failing after 12m52s
adding vualt for self signed pki stuff.
2026-02-16 19:24:55 -05:00

9.9 KiB
Raw Blame History

Internal PKI with Vault and cert-manager

  • Status: accepted
  • Date: 2026-02-16
  • Deciders: Billy
  • Technical Story: Replace self-signed internal certificates with a proper CA chain using Vault PKI

Context and Problem Statement

Internal services on *.lab.daviestechlabs.io use a selfsigned-internal ClusterIssuer. Each cert-manager Certificate gets its own unique self-signed root — there is no shared CA. This causes problems:

  • Off-cluster devices (gravenhollow, candlekeep, waterdeep) have no way to obtain trusted certs
  • Clients cannot verify server certs because there's no CA to trust
  • RustFS on gravenhollow has a TLS cert only valid for localhost, breaking S3 clients
  • No certificate chain means no ability to distribute a single CA bundle across the fleet

The homelab already runs HashiCorp Vault in HA mode (3 replicas, Raft storage) and cert-manager with Let's Encrypt for public certs. How do we issue trusted internal certificates for both in-cluster and off-cluster services?

Decision Drivers

  • Vault is already deployed and used for secrets management
  • cert-manager is already deployed with ClusterIssuer support
  • Off-cluster devices (NAS, Mac Mini) need valid TLS certs
  • Single CA root to trust across all machines
  • Automated renewal for in-cluster certs via cert-manager
  • Must not disrupt existing Let's Encrypt public certs

Considered Options

  1. Vault PKI secrets engine + cert-manager Vault ClusterIssuer
  2. step-ca (Smallstep) as standalone internal CA
  3. Keep self-signed, distribute individual certs manually

Decision Outcome

Chosen option: Option 1 — Vault PKI + cert-manager Vault ClusterIssuer, because it builds on existing infrastructure (Vault and cert-manager), provides a proper two-tier CA chain, and supports both in-cluster automated renewal and off-cluster cert issuance via the Vault API.

Positive Consequences

  • Single root CA — one trust anchor for the entire homelab
  • cert-manager automatically renews in-cluster certs via Vault
  • Off-cluster devices request certs via vault write CLI
  • Two-tier CA (root → intermediate) follows PKI best practices
  • Root CA key never leaves Vault
  • Existing Let's Encrypt public certs are unaffected

Negative Consequences

  • Vault becomes a dependency for internal TLS issuance
  • Off-cluster cert renewal requires manual or scripted vault write (no ACME)
  • CA root cert must be distributed to trust stores on all machines
  • Vault PKI engine adds operational complexity

Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                        Vault PKI (security namespace)                     │
│                                                                          │
│  ┌──────────────────────┐       ┌──────────────────────────┐            │
│  │  pki/ (Root CA)       │       │  pki_int/ (Intermediate)  │            │
│  │                       │ signs │                            │            │
│  │  Homelab Root CA      │──────▶│  Homelab Intermediate CA  │            │
│  │  TTL: 10 years        │       │  TTL: 5 years             │            │
│  │                       │       │                            │            │
│  │  (only signs          │       │  Role: lab-internal        │            │
│  │   intermediates)      │       │  *.lab.daviestechlabs.io   │            │
│  └──────────────────────┘       │  TTL: 90 days (default)    │            │
│                                  │  Key: EC P-256             │            │
│                                  └─────────┬────────────────┘            │
│                                             │                             │
│                              ┌──────────────┼──────────────┐             │
│                              │              │              │             │
│                              ▼              ▼              ▼             │
│                     ┌─────────────┐ ┌─────────────┐ ┌──────────┐        │
│                     │cert-manager │ │ vault write  │ │  Future  │        │
│                     │ClusterIssuer│ │   (CLI)      │ │  ACME    │        │
│                     │vault-internal│ │              │ │          │        │
│                     └──────┬──────┘ └──────┬──────┘ └──────────┘        │
│                            │               │                             │
└────────────────────────────┼───────────────┼─────────────────────────────┘
                             │               │
                ┌────────────▼───┐    ┌──────▼───────────────────┐
                │  In-Cluster     │    │  Off-Cluster              │
                │                 │    │                           │
                │ *.lab.dav...    │    │ gravenhollow (RustFS)     │
                │ envoy-internal  │    │ candlekeep (QNAP)         │
                │ auto-renewed    │    │ waterdeep (Mac Mini)      │
                │ by cert-manager │    │ manual/scripted renewal   │
                └─────────────────┘    └───────────────────────────┘

Implementation

Vault PKI Configuration (Phases 14, completed)

# Phase 1: Root CA
vault secrets enable -path=pki pki
vault secrets tune -max-lease-ttl=87600h pki
vault write pki/root/generate/internal \
  common_name="Homelab Root CA" issuer_name="homelab-root" ttl=87600h
vault write pki/config/urls \
  issuing_certificates="http://vault.security.svc:8200/v1/pki/ca" \
  crl_distribution_points="http://vault.security.svc:8200/v1/pki/crl"

# Phase 2: Intermediate CA
vault secrets enable -path=pki_int pki
vault secrets tune -max-lease-ttl=43800h pki_int
vault write -field=csr pki_int/intermediate/generate/internal \
  common_name="Homelab Intermediate CA" issuer_name="homelab-intermediate" \
  > /tmp/intermediate.csr
vault write -field=certificate pki/root/sign-intermediate \
  issuer_ref="homelab-root" csr=@/tmp/intermediate.csr \
  format=pem_bundle ttl=43800h > /tmp/intermediate.crt
vault write pki_int/intermediate/set-signed certificate=@/tmp/intermediate.crt

# Phase 3: PKI Role
vault write pki_int/roles/lab-internal \
  allowed_domains="lab.daviestechlabs.io" \
  allow_subdomains=true allow_bare_domains=true \
  max_ttl=8760h ttl=2160h key_type=ec key_bits=256

# Phase 4: Policy and Kubernetes Auth Role
vault policy write cert-manager-pki - <<EOF
path "pki_int/sign/lab-internal" { capabilities = ["create", "update"] }
path "pki_int/issue/lab-internal" { capabilities = ["create"] }
EOF
vault write auth/kubernetes/role/cert-manager \
  bound_service_account_names=cert-manager \
  bound_service_account_namespaces=cert-manager \
  audience="https://192.168.100.20:6443" \
  policies=cert-manager-pki ttl=1h

Phase 5: Kubernetes Manifests (GitOps)

New vault-internal ClusterIssuer replaces selfsigned-internal:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-internal
spec:
  vault:
    server: http://vault.security.svc:8200
    path: pki_int/sign/lab-internal
    auth:
      kubernetes:
        role: cert-manager
        mountPath: /v1/auth/kubernetes
        serviceAccountRef:
          name: cert-manager

Envoy internal wildcard certificate updated to use vault-internal.

Phase 6: Off-Cluster Cert Issuance

vault write -format=json pki_int/issue/lab-internal \
  common_name="gravenhollow.lab.daviestechlabs.io" \
  alt_names="gravenhollow.lab.daviestechlabs.io" \
  ttl=8760h
# Extract cert, key, ca_chain from JSON output

Phase 7: CA Distribution

The root CA cert is distributed to:

  • Kubernetes pods: via ConfigMap homelab-ca-bundle or cert-manager ca-injector
  • waterdeep (macOS): sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain homelab-root-ca.crt
  • gravenhollow / candlekeep: installed into OS trust store

Certificate Inventory

Service Issuer Renewal Notes
*.daviestechlabs.io letsencrypt-production Auto (cert-manager) Public, unchanged
*.lab.daviestechlabs.io vault-internal Auto (cert-manager) Envoy internal gateway
gravenhollow.lab.daviestechlabs.io vault-internal (via CLI) Manual/cron RustFS S3, NFS
candlekeep.lab.daviestechlabs.io vault-internal (via CLI) Manual/cron QNAP NAS
waterdeep.lab.daviestechlabs.io vault-internal (via CLI) Manual/cron Mac Mini dev workstation