docs: Add ADRs for secrets management and security policy
- 0017: Secrets Management Strategy (SOPS + Vault + External Secrets) - 0018: Security Policy Enforcement (Gatekeeper + Trivy)
This commit is contained in:
197
decisions/0017-secrets-management-strategy.md
Normal file
197
decisions/0017-secrets-management-strategy.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Secrets Management Strategy
|
||||
|
||||
* Status: accepted
|
||||
* Date: 2026-02-04
|
||||
* Deciders: Billy
|
||||
* Technical Story: Establish a secure, GitOps-compatible secrets management approach for the homelab
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
Managing secrets in a Kubernetes environment presents challenges: secrets must be available to applications, versionable in Git for GitOps, yet never exposed in plain text in repositories. The homelab needs a solution that balances security with operational simplicity.
|
||||
|
||||
How do we manage secrets securely while maintaining GitOps principles and enabling applications to access credentials at runtime?
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
* GitOps compatibility - secrets must be manageable through Git workflows
|
||||
* Security - no plain text secrets in repositories or logs
|
||||
* Operational simplicity - minimize manual secret rotation burden
|
||||
* Application integration - secrets must be consumable by workloads
|
||||
* Disaster recovery - ability to restore secrets from backups
|
||||
|
||||
## Considered Options
|
||||
|
||||
1. **SOPS + Age for bootstrap, Vault + External Secrets for runtime**
|
||||
2. **Sealed Secrets only**
|
||||
3. **Vault only (with Vault Agent Injector)**
|
||||
4. **SOPS only for everything**
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
Chosen option: **Option 1 - SOPS + Age for bootstrap, Vault + External Secrets for runtime**
|
||||
|
||||
This hybrid approach uses SOPS with Age encryption for bootstrap secrets that must exist before the cluster is fully operational, and HashiCorp Vault with External Secrets Operator for runtime secrets that applications consume.
|
||||
|
||||
### Positive Consequences
|
||||
|
||||
* Bootstrap secrets can be committed to Git safely (encrypted with Age)
|
||||
* Vault provides centralized secret management with audit logging
|
||||
* External Secrets Operator enables declarative secret sync from Vault
|
||||
* Clear separation between infrastructure secrets (SOPS) and application secrets (Vault)
|
||||
* Secrets are automatically synced and refreshed
|
||||
|
||||
### Negative Consequences
|
||||
|
||||
* Two systems to understand and maintain
|
||||
* Initial Vault setup requires manual unsealing (or auto-unseal configuration)
|
||||
* Age key must be securely backed up outside the cluster
|
||||
|
||||
## Pros and Cons of the Options
|
||||
|
||||
### Option 1: SOPS + Age for Bootstrap, Vault + External Secrets for Runtime (Chosen)
|
||||
|
||||
**Architecture:**
|
||||
```
|
||||
Bootstrap Secrets (Git-encrypted):
|
||||
.sops.yaml ──► age encryption ──► *.sops.yaml files
|
||||
│
|
||||
▼
|
||||
Flux SOPS decryption
|
||||
│
|
||||
▼
|
||||
Kubernetes Secrets
|
||||
|
||||
Runtime Secrets (Vault-managed):
|
||||
Vault KV Store ◄── Manual/API ──► ExternalSecret CR
|
||||
│
|
||||
▼
|
||||
External Secrets Operator
|
||||
│
|
||||
▼
|
||||
Kubernetes Secrets
|
||||
```
|
||||
|
||||
* Good, because bootstrap secrets (Flux, cert-manager, Cloudflare) are encrypted in Git
|
||||
* Good, because Vault provides audit trail and dynamic secret generation
|
||||
* Good, because External Secrets syncs secrets declaratively (GitOps-friendly)
|
||||
* Good, because secrets can be rotated in Vault without Git commits
|
||||
* Bad, because two systems add operational complexity
|
||||
* Bad, because Vault requires storage (Raft) and HA consideration
|
||||
|
||||
### Option 2: Sealed Secrets Only
|
||||
|
||||
* Good, because single tool to manage
|
||||
* Good, because native Kubernetes integration
|
||||
* Bad, because secrets are cluster-specific (can't reuse across clusters)
|
||||
* Bad, because no central secret management or audit logging
|
||||
* Bad, because no support for dynamic secrets
|
||||
|
||||
### Option 3: Vault Only with Agent Injector
|
||||
|
||||
* Good, because single source of truth
|
||||
* Good, because supports dynamic secrets and leases
|
||||
* Bad, because requires sidecar injection (resource overhead)
|
||||
* Bad, because bootstrap problem - how does Vault authenticate before secrets exist?
|
||||
* Bad, because more complex application integration
|
||||
|
||||
### Option 4: SOPS Only
|
||||
|
||||
* Good, because simple - everything encrypted in Git
|
||||
* Good, because no external dependencies at runtime
|
||||
* Bad, because all secrets in Git (even encrypted) is risky for large secrets
|
||||
* Bad, because secret rotation requires Git commits
|
||||
* Bad, because no audit logging
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### SOPS Configuration
|
||||
|
||||
`.sops.yaml` at repository root:
|
||||
```yaml
|
||||
creation_rules:
|
||||
- path_regex: talos/.*\.sops\.ya?ml
|
||||
age: age1... # Talos-specific key
|
||||
- path_regex: (bootstrap|kubernetes)/.*\.sops\.ya?ml
|
||||
age: age1... # Cluster key
|
||||
```
|
||||
|
||||
**Bootstrap secrets encrypted with SOPS:**
|
||||
- `bootstrap/sops-age.sops.yaml` - Age private key for Flux
|
||||
- `bootstrap/github-deploy-key.sops.yaml` - Git repository access
|
||||
- `talos/talsecret.sops.yaml` - Talos machine secrets
|
||||
|
||||
### Vault Configuration
|
||||
|
||||
**Deployment:** HA mode with 3 replicas, Raft storage on Longhorn
|
||||
|
||||
```yaml
|
||||
# HelmRelease values
|
||||
server:
|
||||
ha:
|
||||
enabled: true
|
||||
replicas: 3
|
||||
raft:
|
||||
enabled: true
|
||||
dataStorage:
|
||||
storageClass: longhorn
|
||||
size: 2Gi
|
||||
```
|
||||
|
||||
**Kubernetes Auth:** External Secrets authenticates via ServiceAccount
|
||||
|
||||
```yaml
|
||||
# ClusterSecretStore
|
||||
spec:
|
||||
provider:
|
||||
vault:
|
||||
server: "http://vault.security.svc:8200"
|
||||
path: "kv"
|
||||
version: "v2"
|
||||
auth:
|
||||
kubernetes:
|
||||
mountPath: "kubernetes"
|
||||
role: "external-secrets"
|
||||
```
|
||||
|
||||
### External Secrets Usage Pattern
|
||||
|
||||
```yaml
|
||||
apiVersion: external-secrets.io/v1
|
||||
kind: ExternalSecret
|
||||
metadata:
|
||||
name: app-credentials
|
||||
spec:
|
||||
refreshInterval: 1h
|
||||
secretStoreRef:
|
||||
kind: ClusterSecretStore
|
||||
name: vault
|
||||
target:
|
||||
name: app-credentials
|
||||
data:
|
||||
- secretKey: password
|
||||
remoteRef:
|
||||
key: kv/data/myapp
|
||||
property: password
|
||||
```
|
||||
|
||||
### Secret Categories
|
||||
|
||||
| Category | Storage | Examples |
|
||||
|----------|---------|----------|
|
||||
| Bootstrap | SOPS + Age | Age keys, deploy keys, Talos secrets |
|
||||
| Infrastructure | Vault | Database credentials, API tokens |
|
||||
| Application | Vault | Service accounts, OAuth secrets |
|
||||
| Certificates | cert-manager | TLS certs (auto-generated) |
|
||||
|
||||
## Disaster Recovery
|
||||
|
||||
1. **Age private key** - Stored securely outside cluster (password manager, hardware key)
|
||||
2. **Vault data** - Backed up via Longhorn snapshots
|
||||
3. **Unseal keys** - Stored securely outside cluster (Shamir shares distributed)
|
||||
|
||||
## References
|
||||
|
||||
* [SOPS Documentation](https://github.com/getsops/sops)
|
||||
* [Age Encryption](https://github.com/FiloSottile/age)
|
||||
* [External Secrets Operator](https://external-secrets.io/)
|
||||
* [HashiCorp Vault](https://www.vaultproject.io/)
|
||||
Reference in New Issue
Block a user