Files
homelab-design/decisions/0006-gitops-with-flux.md
Billy D. 8e3e2043c3
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
docs: add ADR-0038/0039 and replace llm-workflows references with decomposed repos
- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller)
- ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord)
- Replace llm-workflows GitHub links with Gitea daviestechlabs org repos
- Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos
- Update ADR-0006: fix multi-repo reference
- Update ADR-0009: fix broken llm-workflows link
- Update ADR-0024: mark ray-serve repo as created, update historical context
- Update README: fix ADR-0016 status, add 0038/0039 to table, update badges
2026-02-09 18:12:37 -05:00

3.8 KiB

GitOps with Flux CD

  • Status: accepted
  • Date: 2025-11-30
  • Deciders: Billy Davies
  • Technical Story: Implementing GitOps for cluster management

Context and Problem Statement

Managing a Kubernetes cluster with numerous applications, configurations, and secrets requires a reliable, auditable, and reproducible approach. Manual kubectl apply is error-prone and doesn't track state over time.

Decision Drivers

  • Infrastructure as Code (IaC) principles
  • Audit trail for all changes
  • Self-healing cluster state
  • Multi-repository support
  • Secret encryption integration
  • Active community and maintenance

Considered Options

  • Manual kubectl apply
  • ArgoCD
  • Flux CD
  • Rancher Fleet
  • Pulumi/Terraform for Kubernetes

Decision Outcome

Chosen option: "Flux CD", because it provides a mature GitOps implementation with excellent multi-source support, SOPS integration, and aligns well with the Kubernetes ecosystem.

Positive Consequences

  • Git is single source of truth
  • Automatic drift detection and correction
  • Native SOPS/Age secret encryption
  • Multi-repository support (homelab-k8s2 + Gitea daviestechlabs repos)
  • Helm and Kustomize native support
  • Webhook-free sync (pull-based)

Negative Consequences

  • No built-in UI (use CLI or third-party)
  • Learning curve for CRD-based configuration
  • Debugging requires understanding Flux controllers

Configuration

Repository Structure

homelab-k8s2/
├── kubernetes/
│   ├── flux/            # Flux system config
│   │   ├── config/
│   │   │   ├── cluster.yaml
│   │   │   └── secrets.yaml  # SOPS encrypted
│   │   └── repositories/
│   │       ├── helm/    # HelmRepositories
│   │       └── git/     # GitRepositories
│   └── apps/            # Application Kustomizations

Multi-Repository Sync

# GitRepository for Gitea repos (daviestechlabs org)
# Examples: argo, kubeflow, chat-handler, voice-assistant
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: argo-workflows
  namespace: flux-system
spec:
  url: https://git.daviestechlabs.io/daviestechlabs/argo.git
  ref:
    branch: main
  # Public repos don't need secretRef

Note: The monolithic llm-workflows repo has been archived and decomposed into focused repos in the daviestechlabs Gitea organization (e.g. chat-handler, voice-assistant, handler-base, ray-serve, etc.). See AGENT-ONBOARDING.md for the full list.

SOPS Integration

# .sops.yaml
creation_rules:
  - path_regex: .*\.sops\.yaml$
    age: >-
      age1...  # Public key

Pros and Cons of the Options

Manual kubectl apply

  • Good, because simple
  • Good, because no setup
  • Bad, because no audit trail
  • Bad, because no drift detection
  • Bad, because not reproducible

ArgoCD

  • Good, because great UI
  • Good, because app-of-apps pattern
  • Good, because large community
  • Bad, because heavier resource usage
  • Bad, because webhook-dependent sync
  • Bad, because SOPS requires plugins

Flux CD

  • Good, because lightweight
  • Good, because pull-based (no webhooks)
  • Good, because native SOPS support
  • Good, because multi-source/multi-tenant
  • Good, because Kubernetes-native CRDs
  • Bad, because no built-in UI
  • Bad, because CRD learning curve

Rancher Fleet

  • Good, because integrated with Rancher
  • Good, because multi-cluster
  • Bad, because Rancher ecosystem lock-in
  • Bad, because smaller community

Pulumi/Terraform

  • Good, because familiar IaC tools
  • Good, because drift detection
  • Bad, because not Kubernetes-native
  • Bad, because requires state management
  • Bad, because not continuous reconciliation