adding vualt for self signed pki stuff.
Some checks failed
Update README with ADR Index / update-readme (push) Failing after 12m52s

This commit is contained in:
2026-02-16 19:24:55 -05:00
parent f94945fb46
commit 4537de7825

View File

@@ -0,0 +1,192 @@
# Internal PKI with Vault and cert-manager
* Status: accepted
* Date: 2026-02-16
* Deciders: Billy
* Technical Story: Replace self-signed internal certificates with a proper CA chain using Vault PKI
## Context and Problem Statement
Internal services on `*.lab.daviestechlabs.io` use a `selfsigned-internal` ClusterIssuer. Each cert-manager Certificate gets its own unique self-signed root — there is no shared CA. This causes problems:
- Off-cluster devices (gravenhollow, candlekeep, waterdeep) have no way to obtain trusted certs
- Clients cannot verify server certs because there's no CA to trust
- RustFS on gravenhollow has a TLS cert only valid for `localhost`, breaking S3 clients
- No certificate chain means no ability to distribute a single CA bundle across the fleet
The homelab already runs HashiCorp Vault in HA mode (3 replicas, Raft storage) and cert-manager with Let's Encrypt for public certs. How do we issue trusted internal certificates for both in-cluster and off-cluster services?
## Decision Drivers
* Vault is already deployed and used for secrets management
* cert-manager is already deployed with ClusterIssuer support
* Off-cluster devices (NAS, Mac Mini) need valid TLS certs
* Single CA root to trust across all machines
* Automated renewal for in-cluster certs via cert-manager
* Must not disrupt existing Let's Encrypt public certs
## Considered Options
1. **Vault PKI secrets engine + cert-manager Vault ClusterIssuer**
2. **step-ca (Smallstep) as standalone internal CA**
3. **Keep self-signed, distribute individual certs manually**
## Decision Outcome
Chosen option: **Option 1 — Vault PKI + cert-manager Vault ClusterIssuer**, because it builds on existing infrastructure (Vault and cert-manager), provides a proper two-tier CA chain, and supports both in-cluster automated renewal and off-cluster cert issuance via the Vault API.
### Positive Consequences
* Single root CA — one trust anchor for the entire homelab
* cert-manager automatically renews in-cluster certs via Vault
* Off-cluster devices request certs via `vault write` CLI
* Two-tier CA (root → intermediate) follows PKI best practices
* Root CA key never leaves Vault
* Existing Let's Encrypt public certs are unaffected
### Negative Consequences
* Vault becomes a dependency for internal TLS issuance
* Off-cluster cert renewal requires manual or scripted `vault write` (no ACME)
* CA root cert must be distributed to trust stores on all machines
* Vault PKI engine adds operational complexity
## Architecture
```
┌──────────────────────────────────────────────────────────────────────────┐
│ Vault PKI (security namespace) │
│ │
│ ┌──────────────────────┐ ┌──────────────────────────┐ │
│ │ pki/ (Root CA) │ │ pki_int/ (Intermediate) │ │
│ │ │ signs │ │ │
│ │ Homelab Root CA │──────▶│ Homelab Intermediate CA │ │
│ │ TTL: 10 years │ │ TTL: 5 years │ │
│ │ │ │ │ │
│ │ (only signs │ │ Role: lab-internal │ │
│ │ intermediates) │ │ *.lab.daviestechlabs.io │ │
│ └──────────────────────┘ │ TTL: 90 days (default) │ │
│ │ Key: EC P-256 │ │
│ └─────────┬────────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │
│ │cert-manager │ │ vault write │ │ Future │ │
│ │ClusterIssuer│ │ (CLI) │ │ ACME │ │
│ │vault-internal│ │ │ │ │ │
│ └──────┬──────┘ └──────┬──────┘ └──────────┘ │
│ │ │ │
└────────────────────────────┼───────────────┼─────────────────────────────┘
│ │
┌────────────▼───┐ ┌──────▼───────────────────┐
│ In-Cluster │ │ Off-Cluster │
│ │ │ │
│ *.lab.dav... │ │ gravenhollow (RustFS) │
│ envoy-internal │ │ candlekeep (QNAP) │
│ auto-renewed │ │ waterdeep (Mac Mini) │
│ by cert-manager │ │ manual/scripted renewal │
└─────────────────┘ └───────────────────────────┘
```
## Implementation
### Vault PKI Configuration (Phases 14, completed)
```bash
# Phase 1: Root CA
vault secrets enable -path=pki pki
vault secrets tune -max-lease-ttl=87600h pki
vault write pki/root/generate/internal \
common_name="Homelab Root CA" issuer_name="homelab-root" ttl=87600h
vault write pki/config/urls \
issuing_certificates="http://vault.security.svc:8200/v1/pki/ca" \
crl_distribution_points="http://vault.security.svc:8200/v1/pki/crl"
# Phase 2: Intermediate CA
vault secrets enable -path=pki_int pki
vault secrets tune -max-lease-ttl=43800h pki_int
vault write -field=csr pki_int/intermediate/generate/internal \
common_name="Homelab Intermediate CA" issuer_name="homelab-intermediate" \
> /tmp/intermediate.csr
vault write -field=certificate pki/root/sign-intermediate \
issuer_ref="homelab-root" csr=@/tmp/intermediate.csr \
format=pem_bundle ttl=43800h > /tmp/intermediate.crt
vault write pki_int/intermediate/set-signed certificate=@/tmp/intermediate.crt
# Phase 3: PKI Role
vault write pki_int/roles/lab-internal \
allowed_domains="lab.daviestechlabs.io" \
allow_subdomains=true allow_bare_domains=true \
max_ttl=8760h ttl=2160h key_type=ec key_bits=256
# Phase 4: Policy and Kubernetes Auth Role
vault policy write cert-manager-pki - <<EOF
path "pki_int/sign/lab-internal" { capabilities = ["create", "update"] }
path "pki_int/issue/lab-internal" { capabilities = ["create"] }
EOF
vault write auth/kubernetes/role/cert-manager \
bound_service_account_names=cert-manager \
bound_service_account_namespaces=cert-manager \
audience="https://192.168.100.20:6443" \
policies=cert-manager-pki ttl=1h
```
### Phase 5: Kubernetes Manifests (GitOps)
New `vault-internal` ClusterIssuer replaces `selfsigned-internal`:
```yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: vault-internal
spec:
vault:
server: http://vault.security.svc:8200
path: pki_int/sign/lab-internal
auth:
kubernetes:
role: cert-manager
mountPath: /v1/auth/kubernetes
serviceAccountRef:
name: cert-manager
```
Envoy internal wildcard certificate updated to use `vault-internal`.
### Phase 6: Off-Cluster Cert Issuance
```bash
vault write -format=json pki_int/issue/lab-internal \
common_name="gravenhollow.lab.daviestechlabs.io" \
alt_names="gravenhollow.lab.daviestechlabs.io" \
ttl=8760h
# Extract cert, key, ca_chain from JSON output
```
### Phase 7: CA Distribution
The root CA cert is distributed to:
- **Kubernetes pods**: via ConfigMap `homelab-ca-bundle` or cert-manager `ca-injector`
- **waterdeep (macOS)**: `sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain homelab-root-ca.crt`
- **gravenhollow / candlekeep**: installed into OS trust store
## Certificate Inventory
| Service | Issuer | Renewal | Notes |
|---------|--------|---------|-------|
| `*.daviestechlabs.io` | `letsencrypt-production` | Auto (cert-manager) | Public, unchanged |
| `*.lab.daviestechlabs.io` | `vault-internal` | Auto (cert-manager) | Envoy internal gateway |
| `gravenhollow.lab.daviestechlabs.io` | `vault-internal` (via CLI) | Manual/cron | RustFS S3, NFS |
| `candlekeep.lab.daviestechlabs.io` | `vault-internal` (via CLI) | Manual/cron | QNAP NAS |
| `waterdeep.lab.daviestechlabs.io` | `vault-internal` (via CLI) | Manual/cron | Mac Mini dev workstation |
## Links
* [Vault PKI Secrets Engine](https://developer.hashicorp.com/vault/docs/secrets/pki)
* [cert-manager Vault Issuer](https://cert-manager.io/docs/configuration/vault/)
* Related: [ADR-0026](0026-storage-strategy.md) — Storage strategy (gravenhollow S3)
* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions
* Related: [ADR-0059](0059-mac-mini-ray-worker.md) — Mac Mini Ray worker