Files
homelab-design/decisions/0026-storage-strategy.md
Billy D. b43c80153c docs: add ADRs 0025-0028 for infrastructure patterns
- 0025: Observability stack (Prometheus + ClickStack + OTEL)
- 0026: Tiered storage strategy (Longhorn + NFS)
- 0027: Database strategy (CloudNativePG for PostgreSQL)
- 0028: Authentik SSO strategy (OIDC/SAML identity provider)
2026-02-04 08:55:15 -05:00

15 KiB

Tiered Storage Strategy: Longhorn + NFS

  • Status: accepted
  • Date: 2026-02-04
  • Deciders: Billy
  • Technical Story: Provide tiered storage for Kubernetes workloads balancing performance and capacity

Context and Problem Statement

Kubernetes requires a storage solution for stateful applications like databases, message queues, and AI model caches. Different workloads have vastly different requirements:

  • Databases need fast, reliable storage with replication
  • Media libraries need large capacity but can tolerate slower access
  • AI/ML workloads need both - fast storage for models, large capacity for datasets

The homelab has heterogeneous nodes including x86_64 servers and ARM64 Raspberry Pis, plus an external NAS for bulk storage.

How do we provide tiered storage that balances performance, reliability, and capacity for diverse homelab workloads?

Decision Drivers

  • Performance - fast IOPS for databases and critical workloads
  • Capacity - large storage for media, datasets, and archives
  • Reliability - data must survive node failures
  • Heterogeneous support - work on both x86_64 and ARM64 (with limitations)
  • Backup capability - support for off-cluster backups
  • GitOps deployment - Helm charts with Flux management

Considered Options

  1. Longhorn + NFS dual-tier storage
  2. Rook-Ceph for everything
  3. OpenEBS with Mayastor
  4. NFS only
  5. Longhorn only

Decision Outcome

Chosen option: Option 1 - Longhorn + NFS dual-tier storage

Two storage tiers optimized for different use cases:

  • longhorn (default): Fast distributed block storage on NVMe/SSDs for databases and critical workloads
  • nfs-slow: High-capacity NFS storage on external NAS for media, datasets, and bulk storage

Positive Consequences

  • Right-sized storage for each workload type
  • Longhorn provides HA with automatic replication
  • NFS provides massive capacity without consuming cluster disk space
  • ReadWriteMany (RWX) easy on NFS tier
  • Cost-effective - use existing NAS investment

Negative Consequences

  • Two storage systems to manage
  • NFS is slower (hence nfs-slow naming)
  • NFS single point of failure (no replication)
  • Network dependency for both tiers

Architecture

┌────────────────────────────────────────────────────────────────────────────┐
│                              TIER 1: LONGHORN                              │
│                        (Fast Distributed Block Storage)                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                         │
│  │   khelben   │  │   mystra    │  │   selune    │                         │
│  │  (NVIDIA)   │  │   (AMD)     │  │   (AMD)     │                         │
│  │             │  │             │  │             │                         │
│  │ /var/mnt/   │  │ /var/mnt/   │  │ /var/mnt/   │                         │
│  │  longhorn   │  │  longhorn   │  │  longhorn   │                         │
│  │  (NVMe)     │  │  (SSD)      │  │  (SSD)      │                         │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘                         │
│         │                │                │                                 │
│         └────────────────┼────────────────┘                                 │
│                          ▼                                                  │
│              ┌───────────────────────┐                                      │
│              │   Longhorn Manager    │                                      │
│              │  (Schedules replicas) │                                      │
│              └───────────┬───────────┘                                      │
│                          ▼                                                  │
│     ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐                 │
│     │ Postgres │  │  Vault   │  │Prometheus│  │ClickHouse│                 │
│     │   PVC    │  │   PVC    │  │   PVC    │  │   PVC    │                 │
│     └──────────┘  └──────────┘  └──────────┘  └──────────┘                 │
└────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────────┐
│                              TIER 2: NFS-SLOW                              │
│                        (High-Capacity Bulk Storage)                         │
│                                                                            │
│  ┌────────────────────────────────────────────────────────────────┐        │
│  │                  candlekeep.lab.daviestechlabs.io              │        │
│  │                        (External NAS)                           │        │
│  │                                                                 │        │
│  │   /kubernetes                                                   │        │
│  │   ├── jellyfin-media/     (1TB+ media library)                 │        │
│  │   ├── nextcloud/          (user files)                         │        │
│  │   ├── immich/             (photo backups)                      │        │
│  │   ├── kavita/             (ebooks, comics, manga)              │        │
│  │   ├── mlflow-artifacts/   (model artifacts)                    │        │
│  │   ├── ray-models/         (AI model weights)                   │        │
│  │   └── gitea-runner/       (build caches)                       │        │
│  └────────────────────────────────────────────────────────────────┘        │
│                          │                                                  │
│                          ▼                                                  │
│              ┌───────────────────────┐                                      │
│              │   NFS CSI Driver      │                                      │
│              │  (csi-driver-nfs)     │                                      │
│              └───────────┬───────────┘                                      │
│                          ▼                                                  │
│     ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐                 │
│     │ Jellyfin │  │Nextcloud │  │  Immich  │  │  Kavita  │                 │
│     │   PVC    │  │   PVC    │  │   PVC    │  │   PVC    │                 │
│     └──────────┘  └──────────┘  └──────────┘  └──────────┘                 │
└────────────────────────────────────────────────────────────────────────────┘

Tier 1: Longhorn Configuration

Helm Values

persistence:
  defaultClass: true
  defaultClassReplicaCount: 2
  defaultDataPath: /var/mnt/longhorn

defaultSettings:
  defaultDataPath: /var/mnt/longhorn
  # Allow on vllm-tainted nodes
  taintToleration: "dedicated=vllm:NoSchedule"
  # Exclude Raspberry Pi nodes (ARM64)
  systemManagedComponentsNodeSelector: "kubernetes.io/arch:amd64"
  # Snapshot retention
  defaultRecurringJobs:
    - name: nightly-snapshots
      task: snapshot
      cron: "0 2 * * *"
      retain: 7
    - name: weekly-backups
      task: backup
      cron: "0 3 * * 0"
      retain: 4

Longhorn Storage Classes

StorageClass Replicas Use Case
longhorn (default) 2 General workloads, databases
longhorn-single 1 Development/ephemeral
longhorn-strict 3 Critical databases

Tier 2: NFS Configuration

Helm Values (csi-driver-nfs)

storageClass:
  create: true
  name: nfs-slow
  parameters:
    server: candlekeep.lab.daviestechlabs.io
    share: /kubernetes
  mountOptions:
    - nfsvers=4.1
    - nconnect=16    # Multiple TCP connections for throughput
    - hard           # Retry indefinitely on failure
    - noatime        # Don't update access times (performance)
  reclaimPolicy: Delete
  volumeBindingMode: Immediate

Why "nfs-slow"?

The naming is intentional - it sets correct expectations:

  • Latency: NAS is over network, higher latency than local NVMe
  • IOPS: Spinning disks in NAS can't match SSD performance
  • Throughput: Adequate for streaming media, not for databases
  • Benefit: Massive capacity without consuming cluster disk space

Storage Tier Selection Guide

Workload Type Storage Class Rationale
PostgreSQL (CNPG) longhorn or nfs-slow Depends on criticality
Prometheus/ClickHouse longhorn High write IOPS required
Vault longhorn Security-critical, needs HA
Media (Jellyfin, Kavita) nfs-slow Large files, sequential reads
Photos (Immich) nfs-slow Bulk storage for photos
User files (Nextcloud) nfs-slow Capacity over speed
AI/ML models (Ray) nfs-slow Large model weights
Build caches (Gitea runner) nfs-slow Ephemeral, large
MLflow artifacts nfs-slow Model artifacts storage

Volume Usage by Tier

Longhorn Volumes (Performance Tier)

Workload Size Replicas Access Mode
Prometheus 50Gi 2 RWO
Vault 2Gi 2 RWO
ClickHouse 100Gi 2 RWO
Alertmanager 1Gi 2 RWO

NFS Volumes (Capacity Tier)

Workload Size Access Mode Notes
Jellyfin 2Ti RWX Media library
Immich 500Gi RWX Photo storage
Nextcloud 1Ti RWX User files
Kavita 200Gi RWX Ebooks, comics
MLflow 100Gi RWX Model artifacts
Ray models 200Gi RWX AI model weights
Gitea runner 50Gi RWO Build caches
Gitea DB (CNPG) 10Gi RWO Capacity-optimized

Backup Strategy

Longhorn Tier

Local Snapshots

  • Frequency: Nightly at 2 AM
  • Retention: 7 days
  • Purpose: Quick recovery from accidental deletion

Off-Cluster Backups

  • Frequency: Weekly on Sundays at 3 AM
  • Destination: S3-compatible storage (MinIO/Backblaze)
  • Retention: 4 weeks
  • Purpose: Disaster recovery

NFS Tier

NAS-Level Backups

  • Handled by NAS backup solution (snapshots, replication)
  • Not managed by Kubernetes
  • Relies on NAS raid configuration for redundancy

Backup Target Configuration (Longhorn)

# ExternalSecret for backup credentials
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: longhorn-backup-secret
spec:
  secretStoreRef:
    kind: ClusterSecretStore
    name: vault
  target:
    name: longhorn-backup-secret
  data:
    - secretKey: AWS_ACCESS_KEY_ID
      remoteRef:
        key: kv/data/longhorn
        property: backup_access_key
    - secretKey: AWS_SECRET_ACCESS_KEY
      remoteRef:
        key: kv/data/longhorn
        property: backup_secret_key

Node Exclusions (Longhorn Only)

Raspberry Pi nodes excluded because:

  • Limited disk I/O performance
  • SD card wear concerns
  • Memory constraints for Longhorn components

GPU nodes included with tolerations:

  • khelben (NVIDIA) participates in Longhorn storage
  • Taint toleration allows Longhorn to schedule there

Performance Considerations

Longhorn Performance

  • khelben has NVMe - fastest storage node
  • mystra/selune have SATA SSDs - adequate for most workloads
  • 2 replicas across different nodes ensures single node failure survival
  • Trade-off: 2x storage consumption

NFS Performance

  • Optimized with nconnect=16 for parallel connections
  • noatime reduces unnecessary write operations
  • Sequential read workloads perform well (media streaming)
  • Random I/O workloads should use Longhorn instead

When to Choose Each Tier

Requirement Longhorn NFS-Slow
Low latency
High IOPS
Large capacity
ReadWriteMany (RWX) Limited
Node failure survival (NAS HA)
Kubernetes-native

Monitoring

Grafana Dashboard: Longhorn dashboard for:

  • Volume health and replica status
  • IOPS and throughput per volume
  • Disk space utilization per node
  • Backup job status

Alerts:

  • Volume degraded (replica count < desired)
  • Disk space low (< 20% free)
  • Backup job failed

Future Enhancements

  1. NAS high availability - Second NAS with replication
  2. Dedicated storage network - Separate VLAN for storage traffic
  3. NVMe-oF - Network NVMe for lower latency
  4. Tiered Longhorn - Hot (NVMe) and warm (SSD) within Longhorn
  5. S3 tier - MinIO for object storage workloads

References