Files

Billy D. b43c80153c docs: add ADRs 0025-0028 for infrastructure patterns

- 0025: Observability stack (Prometheus + ClickStack + OTEL)
- 0026: Tiered storage strategy (Longhorn + NFS)
- 0027: Database strategy (CloudNativePG for PostgreSQL)
- 0028: Authentik SSO strategy (OIDC/SAML identity provider)

2026-02-04 08:55:15 -05:00

15 KiB

Raw Blame History

Tiered Storage Strategy: Longhorn + NFS

Status: accepted
Date: 2026-02-04
Deciders: Billy
Technical Story: Provide tiered storage for Kubernetes workloads balancing performance and capacity

Context and Problem Statement

Kubernetes requires a storage solution for stateful applications like databases, message queues, and AI model caches. Different workloads have vastly different requirements:

Databases need fast, reliable storage with replication
Media libraries need large capacity but can tolerate slower access
AI/ML workloads need both - fast storage for models, large capacity for datasets

The homelab has heterogeneous nodes including x86_64 servers and ARM64 Raspberry Pis, plus an external NAS for bulk storage.

How do we provide tiered storage that balances performance, reliability, and capacity for diverse homelab workloads?

Decision Drivers

Performance - fast IOPS for databases and critical workloads
Capacity - large storage for media, datasets, and archives
Reliability - data must survive node failures
Heterogeneous support - work on both x86_64 and ARM64 (with limitations)
Backup capability - support for off-cluster backups
GitOps deployment - Helm charts with Flux management

Considered Options

Longhorn + NFS dual-tier storage
Rook-Ceph for everything
OpenEBS with Mayastor
NFS only
Longhorn only

Decision Outcome

Chosen option: Option 1 - Longhorn + NFS dual-tier storage

Two storage tiers optimized for different use cases:

longhorn (default): Fast distributed block storage on NVMe/SSDs for databases and critical workloads
nfs-slow: High-capacity NFS storage on external NAS for media, datasets, and bulk storage

Positive Consequences

Right-sized storage for each workload type
Longhorn provides HA with automatic replication
NFS provides massive capacity without consuming cluster disk space
ReadWriteMany (RWX) easy on NFS tier
Cost-effective - use existing NAS investment

Negative Consequences

Two storage systems to manage
NFS is slower (hence nfs-slow naming)
NFS single point of failure (no replication)
Network dependency for both tiers

Architecture

┌────────────────────────────────────────────────────────────────────────────┐
│                              TIER 1: LONGHORN                              │
│                        (Fast Distributed Block Storage)                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                         │
│  │   khelben   │  │   mystra    │  │   selune    │                         │
│  │  (NVIDIA)   │  │   (AMD)     │  │   (AMD)     │                         │
│  │             │  │             │  │             │                         │
│  │ /var/mnt/   │  │ /var/mnt/   │  │ /var/mnt/   │                         │
│  │  longhorn   │  │  longhorn   │  │  longhorn   │                         │
│  │  (NVMe)     │  │  (SSD)      │  │  (SSD)      │                         │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘                         │
│         │                │                │                                 │
│         └────────────────┼────────────────┘                                 │
│                          ▼                                                  │
│              ┌───────────────────────┐                                      │
│              │   Longhorn Manager    │                                      │
│              │  (Schedules replicas) │                                      │
│              └───────────┬───────────┘                                      │
│                          ▼                                                  │
│     ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐                 │
│     │ Postgres │  │  Vault   │  │Prometheus│  │ClickHouse│                 │
│     │   PVC    │  │   PVC    │  │   PVC    │  │   PVC    │                 │
│     └──────────┘  └──────────┘  └──────────┘  └──────────┘                 │
└────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────────┐
│                              TIER 2: NFS-SLOW                              │
│                        (High-Capacity Bulk Storage)                         │
│                                                                            │
│  ┌────────────────────────────────────────────────────────────────┐        │
│  │                  candlekeep.lab.daviestechlabs.io              │        │
│  │                        (External NAS)                           │        │
│  │                                                                 │        │
│  │   /kubernetes                                                   │        │
│  │   ├── jellyfin-media/     (1TB+ media library)                 │        │
│  │   ├── nextcloud/          (user files)                         │        │
│  │   ├── immich/             (photo backups)                      │        │
│  │   ├── kavita/             (ebooks, comics, manga)              │        │
│  │   ├── mlflow-artifacts/   (model artifacts)                    │        │
│  │   ├── ray-models/         (AI model weights)                   │        │
│  │   └── gitea-runner/       (build caches)                       │        │
│  └────────────────────────────────────────────────────────────────┘        │
│                          │                                                  │
│                          ▼                                                  │
│              ┌───────────────────────┐                                      │
│              │   NFS CSI Driver      │                                      │
│              │  (csi-driver-nfs)     │                                      │
│              └───────────┬───────────┘                                      │
│                          ▼                                                  │
│     ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐                 │
│     │ Jellyfin │  │Nextcloud │  │  Immich  │  │  Kavita  │                 │
│     │   PVC    │  │   PVC    │  │   PVC    │  │   PVC    │                 │
│     └──────────┘  └──────────┘  └──────────┘  └──────────┘                 │
└────────────────────────────────────────────────────────────────────────────┘

Tier 1: Longhorn Configuration

Helm Values

persistence:
  defaultClass: true
  defaultClassReplicaCount: 2
  defaultDataPath: /var/mnt/longhorn

defaultSettings:
  defaultDataPath: /var/mnt/longhorn
  # Allow on vllm-tainted nodes
  taintToleration: "dedicated=vllm:NoSchedule"
  # Exclude Raspberry Pi nodes (ARM64)
  systemManagedComponentsNodeSelector: "kubernetes.io/arch:amd64"
  # Snapshot retention
  defaultRecurringJobs:
    - name: nightly-snapshots
      task: snapshot
      cron: "0 2 * * *"
      retain: 7
    - name: weekly-backups
      task: backup
      cron: "0 3 * * 0"
      retain: 4

Longhorn Storage Classes

StorageClass	Replicas	Use Case
`longhorn` (default)	2	General workloads, databases
`longhorn-single`	1	Development/ephemeral
`longhorn-strict`	3	Critical databases

Tier 2: NFS Configuration

Helm Values (csi-driver-nfs)

storageClass:
  create: true
  name: nfs-slow
  parameters:
    server: candlekeep.lab.daviestechlabs.io
    share: /kubernetes
  mountOptions:
    - nfsvers=4.1
    - nconnect=16    # Multiple TCP connections for throughput
    - hard           # Retry indefinitely on failure
    - noatime        # Don't update access times (performance)
  reclaimPolicy: Delete
  volumeBindingMode: Immediate

Why "nfs-slow"?

The naming is intentional - it sets correct expectations:

Latency: NAS is over network, higher latency than local NVMe
IOPS: Spinning disks in NAS can't match SSD performance
Throughput: Adequate for streaming media, not for databases
Benefit: Massive capacity without consuming cluster disk space

Storage Tier Selection Guide

Workload Type	Storage Class	Rationale
PostgreSQL (CNPG)	`longhorn` or `nfs-slow`	Depends on criticality
Prometheus/ClickHouse	`longhorn`	High write IOPS required
Vault	`longhorn`	Security-critical, needs HA
Media (Jellyfin, Kavita)	`nfs-slow`	Large files, sequential reads
Photos (Immich)	`nfs-slow`	Bulk storage for photos
User files (Nextcloud)	`nfs-slow`	Capacity over speed
AI/ML models (Ray)	`nfs-slow`	Large model weights
Build caches (Gitea runner)	`nfs-slow`	Ephemeral, large
MLflow artifacts	`nfs-slow`	Model artifacts storage

Volume Usage by Tier

Longhorn Volumes (Performance Tier)

Workload	Size	Replicas	Access Mode
Prometheus	50Gi	2	RWO
Vault	2Gi	2	RWO
ClickHouse	100Gi	2	RWO
Alertmanager	1Gi	2	RWO

NFS Volumes (Capacity Tier)

Workload	Size	Access Mode	Notes
Jellyfin	2Ti	RWX	Media library
Immich	500Gi	RWX	Photo storage
Nextcloud	1Ti	RWX	User files
Kavita	200Gi	RWX	Ebooks, comics
MLflow	100Gi	RWX	Model artifacts
Ray models	200Gi	RWX	AI model weights
Gitea runner	50Gi	RWO	Build caches
Gitea DB (CNPG)	10Gi	RWO	Capacity-optimized

Backup Strategy

Longhorn Tier

Local Snapshots

Frequency: Nightly at 2 AM
Retention: 7 days
Purpose: Quick recovery from accidental deletion

Off-Cluster Backups

Frequency: Weekly on Sundays at 3 AM
Destination: S3-compatible storage (MinIO/Backblaze)
Retention: 4 weeks
Purpose: Disaster recovery

NFS Tier

NAS-Level Backups

Handled by NAS backup solution (snapshots, replication)
Not managed by Kubernetes
Relies on NAS raid configuration for redundancy

Backup Target Configuration (Longhorn)

# ExternalSecret for backup credentials
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: longhorn-backup-secret
spec:
  secretStoreRef:
    kind: ClusterSecretStore
    name: vault
  target:
    name: longhorn-backup-secret
  data:
    - secretKey: AWS_ACCESS_KEY_ID
      remoteRef:
        key: kv/data/longhorn
        property: backup_access_key
    - secretKey: AWS_SECRET_ACCESS_KEY
      remoteRef:
        key: kv/data/longhorn
        property: backup_secret_key

Node Exclusions (Longhorn Only)

Raspberry Pi nodes excluded because:

Limited disk I/O performance
SD card wear concerns
Memory constraints for Longhorn components

GPU nodes included with tolerations:

khelben (NVIDIA) participates in Longhorn storage
Taint toleration allows Longhorn to schedule there

Performance Considerations

Longhorn Performance

khelben has NVMe - fastest storage node
mystra/selune have SATA SSDs - adequate for most workloads
2 replicas across different nodes ensures single node failure survival
Trade-off: 2x storage consumption

NFS Performance

Optimized with nconnect=16 for parallel connections
noatime reduces unnecessary write operations
Sequential read workloads perform well (media streaming)
Random I/O workloads should use Longhorn instead

When to Choose Each Tier

Requirement	Longhorn	NFS-Slow
Low latency	✅	❌
High IOPS	✅	❌
Large capacity	❌	✅
ReadWriteMany (RWX)	Limited	✅
Node failure survival	✅	✅ (NAS HA)
Kubernetes-native	✅	✅

Monitoring

Grafana Dashboard: Longhorn dashboard for:

Volume health and replica status
IOPS and throughput per volume
Disk space utilization per node
Backup job status

Alerts:

Volume degraded (replica count < desired)
Disk space low (< 20% free)
Backup job failed

Future Enhancements

NAS high availability - Second NAS with replication
Dedicated storage network - Separate VLAN for storage traffic
NVMe-oF - Network NVMe for lower latency
Tiered Longhorn - Hot (NVMe) and warm (SSD) within Longhorn
S3 tier - MinIO for object storage workloads

15 KiB Raw Blame History