Files
homelab-design/decisions/0017-secrets-management-strategy.md
Billy D. a128c265e4 docs: Add ADRs for secrets management and security policy
- 0017: Secrets Management Strategy (SOPS + Vault + External Secrets)
- 0018: Security Policy Enforcement (Gatekeeper + Trivy)
2026-02-04 08:45:47 -05:00

6.5 KiB

Secrets Management Strategy

  • Status: accepted
  • Date: 2026-02-04
  • Deciders: Billy
  • Technical Story: Establish a secure, GitOps-compatible secrets management approach for the homelab

Context and Problem Statement

Managing secrets in a Kubernetes environment presents challenges: secrets must be available to applications, versionable in Git for GitOps, yet never exposed in plain text in repositories. The homelab needs a solution that balances security with operational simplicity.

How do we manage secrets securely while maintaining GitOps principles and enabling applications to access credentials at runtime?

Decision Drivers

  • GitOps compatibility - secrets must be manageable through Git workflows
  • Security - no plain text secrets in repositories or logs
  • Operational simplicity - minimize manual secret rotation burden
  • Application integration - secrets must be consumable by workloads
  • Disaster recovery - ability to restore secrets from backups

Considered Options

  1. SOPS + Age for bootstrap, Vault + External Secrets for runtime
  2. Sealed Secrets only
  3. Vault only (with Vault Agent Injector)
  4. SOPS only for everything

Decision Outcome

Chosen option: Option 1 - SOPS + Age for bootstrap, Vault + External Secrets for runtime

This hybrid approach uses SOPS with Age encryption for bootstrap secrets that must exist before the cluster is fully operational, and HashiCorp Vault with External Secrets Operator for runtime secrets that applications consume.

Positive Consequences

  • Bootstrap secrets can be committed to Git safely (encrypted with Age)
  • Vault provides centralized secret management with audit logging
  • External Secrets Operator enables declarative secret sync from Vault
  • Clear separation between infrastructure secrets (SOPS) and application secrets (Vault)
  • Secrets are automatically synced and refreshed

Negative Consequences

  • Two systems to understand and maintain
  • Initial Vault setup requires manual unsealing (or auto-unseal configuration)
  • Age key must be securely backed up outside the cluster

Pros and Cons of the Options

Option 1: SOPS + Age for Bootstrap, Vault + External Secrets for Runtime (Chosen)

Architecture:

Bootstrap Secrets (Git-encrypted):
  .sops.yaml ──► age encryption ──► *.sops.yaml files
                                          │
                                          ▼
                              Flux SOPS decryption
                                          │
                                          ▼
                              Kubernetes Secrets

Runtime Secrets (Vault-managed):
  Vault KV Store ◄── Manual/API ──► ExternalSecret CR
                                          │
                                          ▼
                              External Secrets Operator
                                          │
                                          ▼
                              Kubernetes Secrets
  • Good, because bootstrap secrets (Flux, cert-manager, Cloudflare) are encrypted in Git
  • Good, because Vault provides audit trail and dynamic secret generation
  • Good, because External Secrets syncs secrets declaratively (GitOps-friendly)
  • Good, because secrets can be rotated in Vault without Git commits
  • Bad, because two systems add operational complexity
  • Bad, because Vault requires storage (Raft) and HA consideration

Option 2: Sealed Secrets Only

  • Good, because single tool to manage
  • Good, because native Kubernetes integration
  • Bad, because secrets are cluster-specific (can't reuse across clusters)
  • Bad, because no central secret management or audit logging
  • Bad, because no support for dynamic secrets

Option 3: Vault Only with Agent Injector

  • Good, because single source of truth
  • Good, because supports dynamic secrets and leases
  • Bad, because requires sidecar injection (resource overhead)
  • Bad, because bootstrap problem - how does Vault authenticate before secrets exist?
  • Bad, because more complex application integration

Option 4: SOPS Only

  • Good, because simple - everything encrypted in Git
  • Good, because no external dependencies at runtime
  • Bad, because all secrets in Git (even encrypted) is risky for large secrets
  • Bad, because secret rotation requires Git commits
  • Bad, because no audit logging

Implementation Details

SOPS Configuration

.sops.yaml at repository root:

creation_rules:
  - path_regex: talos/.*\.sops\.ya?ml
    age: age1...  # Talos-specific key
  - path_regex: (bootstrap|kubernetes)/.*\.sops\.ya?ml
    age: age1...  # Cluster key

Bootstrap secrets encrypted with SOPS:

  • bootstrap/sops-age.sops.yaml - Age private key for Flux
  • bootstrap/github-deploy-key.sops.yaml - Git repository access
  • talos/talsecret.sops.yaml - Talos machine secrets

Vault Configuration

Deployment: HA mode with 3 replicas, Raft storage on Longhorn

# HelmRelease values
server:
  ha:
    enabled: true
    replicas: 3
    raft:
      enabled: true
  dataStorage:
    storageClass: longhorn
    size: 2Gi

Kubernetes Auth: External Secrets authenticates via ServiceAccount

# ClusterSecretStore
spec:
  provider:
    vault:
      server: "http://vault.security.svc:8200"
      path: "kv"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "external-secrets"

External Secrets Usage Pattern

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: app-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: vault
  target:
    name: app-credentials
  data:
    - secretKey: password
      remoteRef:
        key: kv/data/myapp
        property: password

Secret Categories

Category Storage Examples
Bootstrap SOPS + Age Age keys, deploy keys, Talos secrets
Infrastructure Vault Database credentials, API tokens
Application Vault Service accounts, OAuth secrets
Certificates cert-manager TLS certs (auto-generated)

Disaster Recovery

  1. Age private key - Stored securely outside cluster (password manager, hardware key)
  2. Vault data - Backed up via Longhorn snapshots
  3. Unseal keys - Stored securely outside cluster (Shamir shares distributed)

References