All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
- ADR-0038: Infrastructure metrics collection (smartctl, SNMP, blackbox, unpoller) - ADR-0039: Alerting and notification pipeline (Alertmanager → ntfy → Discord) - Replace llm-workflows GitHub links with Gitea daviestechlabs org repos - Update AGENT-ONBOARDING.md: remove llm-workflows from file tree, add missing repos - Update ADR-0006: fix multi-repo reference - Update ADR-0009: fix broken llm-workflows link - Update ADR-0024: mark ray-serve repo as created, update historical context - Update README: fix ADR-0016 status, add 0038/0039 to table, update badges
3.8 KiB
3.8 KiB
GitOps with Flux CD
- Status: accepted
- Date: 2025-11-30
- Deciders: Billy Davies
- Technical Story: Implementing GitOps for cluster management
Context and Problem Statement
Managing a Kubernetes cluster with numerous applications, configurations, and secrets requires a reliable, auditable, and reproducible approach. Manual kubectl apply is error-prone and doesn't track state over time.
Decision Drivers
- Infrastructure as Code (IaC) principles
- Audit trail for all changes
- Self-healing cluster state
- Multi-repository support
- Secret encryption integration
- Active community and maintenance
Considered Options
- Manual kubectl apply
- ArgoCD
- Flux CD
- Rancher Fleet
- Pulumi/Terraform for Kubernetes
Decision Outcome
Chosen option: "Flux CD", because it provides a mature GitOps implementation with excellent multi-source support, SOPS integration, and aligns well with the Kubernetes ecosystem.
Positive Consequences
- Git is single source of truth
- Automatic drift detection and correction
- Native SOPS/Age secret encryption
- Multi-repository support (homelab-k8s2 + Gitea daviestechlabs repos)
- Helm and Kustomize native support
- Webhook-free sync (pull-based)
Negative Consequences
- No built-in UI (use CLI or third-party)
- Learning curve for CRD-based configuration
- Debugging requires understanding Flux controllers
Configuration
Repository Structure
homelab-k8s2/
├── kubernetes/
│ ├── flux/ # Flux system config
│ │ ├── config/
│ │ │ ├── cluster.yaml
│ │ │ └── secrets.yaml # SOPS encrypted
│ │ └── repositories/
│ │ ├── helm/ # HelmRepositories
│ │ └── git/ # GitRepositories
│ └── apps/ # Application Kustomizations
Multi-Repository Sync
# GitRepository for Gitea repos (daviestechlabs org)
# Examples: argo, kubeflow, chat-handler, voice-assistant
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: argo-workflows
namespace: flux-system
spec:
url: https://git.daviestechlabs.io/daviestechlabs/argo.git
ref:
branch: main
# Public repos don't need secretRef
Note: The monolithic llm-workflows repo has been archived and decomposed into
focused repos in the daviestechlabs Gitea organization (e.g. chat-handler,
voice-assistant, handler-base, ray-serve, etc.). See AGENT-ONBOARDING.md
for the full list.
SOPS Integration
# .sops.yaml
creation_rules:
- path_regex: .*\.sops\.yaml$
age: >-
age1... # Public key
Pros and Cons of the Options
Manual kubectl apply
- Good, because simple
- Good, because no setup
- Bad, because no audit trail
- Bad, because no drift detection
- Bad, because not reproducible
ArgoCD
- Good, because great UI
- Good, because app-of-apps pattern
- Good, because large community
- Bad, because heavier resource usage
- Bad, because webhook-dependent sync
- Bad, because SOPS requires plugins
Flux CD
- Good, because lightweight
- Good, because pull-based (no webhooks)
- Good, because native SOPS support
- Good, because multi-source/multi-tenant
- Good, because Kubernetes-native CRDs
- Bad, because no built-in UI
- Bad, because CRD learning curve
Rancher Fleet
- Good, because integrated with Rancher
- Good, because multi-cluster
- Bad, because Rancher ecosystem lock-in
- Bad, because smaller community
Pulumi/Terraform
- Good, because familiar IaC tools
- Good, because drift detection
- Bad, because not Kubernetes-native
- Bad, because requires state management
- Bad, because not continuous reconciliation
Links
- Flux CD
- SOPS Integration
- flux-local - Local testing