updates to finish nfs-fast implementation.

2026-02-16 18:08:32 -05:00
parent 7685b2b757
commit b4e608f002
5 changed files with 134 additions and 37 deletions
--- a/decisions/0026-storage-strategy.md
+++ b/decisions/0026-storage-strategy.md
@@ -37,9 +37,10 @@ How do we provide tiered storage that balances performance, reliability, and cap

 Chosen option: **Option 1 - Longhorn + NFS dual-tier storage**

-Two storage tiers optimized for different use cases:
+Three storage tiers optimized for different use cases:
 - **`longhorn`** (default): Fast distributed block storage on NVMe/SSDs for databases and critical workloads
- **`nfs-slow`**: High-capacity NFS storage on external NAS for media, datasets, and bulk storage
+- **`nfs-fast`**: High-performance NFS + S3 storage on gravenhollow (all-SSD TrueNAS Scale, dual 10GbE, 12.2 TB) for AI model cache, hot data, and S3-compatible object storage via RustFS
+- **`nfs-slow`**: High-capacity NFS storage on candlekeep (QNAP HDD NAS) for media, datasets, and bulk storage

 ### Positive Consequences

@@ -90,7 +91,7 @@ Two storage tiers optimized for different use cases:
 │                                                                            │
 │  ┌────────────────────────────────────────────────────────────────┐        │
 │  │                  candlekeep.lab.daviestechlabs.io              │        │
-│  │                        (External NAS)                           │        │
+│  │                         (QNAP NAS)                              │        │
 │  │                                                                 │        │
 │  │   /kubernetes                                                   │        │
 │  │   ├── jellyfin-media/     (1TB+ media library)                 │        │
@@ -113,6 +114,38 @@ Two storage tiers optimized for different use cases:
 │     │   PVC    │  │   PVC    │  │   PVC    │  │   PVC    │                 │
 │     └──────────┘  └──────────┘  └──────────┘  └──────────┘                 │
 └────────────────────────────────────────────────────────────────────────────┘
+
+┌────────────────────────────────────────────────────────────────────────────┐
+│                              TIER 3: NFS-FAST                              │
+│                     (High-Performance SSD NFS + S3 Storage)                │
+│                                                                            │
+│  ┌────────────────────────────────────────────────────────────────┐        │
+│  │                gravenhollow.lab.daviestechlabs.io              │        │
+│  │          (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB)     │        │
+│  │                                                                │        │
+│  │   NFS: /mnt/gravenhollow/kubernetes                            │        │
+│  │   ├── ray-model-cache/    (AI model weights - hot)             │        │
+│  │   ├── mlflow-artifacts/   (ML experiment tracking)             │        │
+│  │   └── training-data/      (datasets for fine-tuning)           │        │
+│  │                                                                │        │
+│  │   S3 (RustFS): http://gravenhollow.lab.daviestechlabs.io:30292  │        │
+│  │   ├── kubeflow-pipelines   (pipeline artifacts)                │        │
+│  │   ├── training-data        (large dataset staging)             │        │
+│  │   └── longhorn-backups     (off-cluster backup target)         │        │
+│  └────────────────────────────────────────────────────────────────┘        │
+│                          │                                                  │
+│                          ▼                                                  │
+│              ┌───────────────────────┐                                      │
+│              │   NFS CSI Driver      │                                      │
+│              │  (csi-driver-nfs)     │                                      │
+│              └───────────┬───────────┘                                      │
+│                          ▼                                                  │
+│     ┌──────────┐  ┌──────────┐  ┌──────────┐                               │
+│     │Ray Model │  │  MLflow  │  │ Training │                               │
+│     │  Cache   │  │ Artifact │  │   Data   │                               │
+│     │   PVC    │  │   PVC    │  │   PVC    │                               │
+│     └──────────┘  └──────────┘  └──────────┘                               │
+└────────────────────────────────────────────────────────────────────────────┘
 ```

 ## Tier 1: Longhorn Configuration
@@ -179,19 +212,79 @@ The naming is intentional - it sets correct expectations:
 - **Throughput:** Adequate for streaming media, not for databases
 - **Benefit:** Massive capacity without consuming cluster disk space

+## Tier 3: NFS-Fast Configuration
+
+### Helm Values (second csi-driver-nfs installation)
+
+A second HelmRelease (`csi-driver-nfs-fast`) references the same OCI chart but only creates the StorageClass — the CSI driver pods are already running from the nfs-slow installation.
+
+```yaml
+controller:
+  replicas: 0
+node:
+  enabled: false
+storageClass:
+  create: true
+  name: nfs-fast
+  parameters:
+    server: gravenhollow.lab.daviestechlabs.io
+    share: /mnt/gravenhollow/kubernetes
+  mountOptions:
+    - nfsvers=4.2        # Server-side copy, fallocate, seekhole
+    - nconnect=16        # 16 TCP connections across bonded 10GbE
+    - rsize=1048576      # 1 MB read block size
+    - wsize=1048576      # 1 MB write block size
+    - hard               # Retry indefinitely on timeout
+    - noatime            # Skip access-time updates
+    - nodiratime         # Skip directory access-time updates
+    - nocto              # Disable close-to-open consistency (read-heavy workloads)
+    - actimeo=600        # Cache attributes for 10 min
+    - max_connect=16     # Allow up to 16 connections to the same server
+  reclaimPolicy: Delete
+  volumeBindingMode: Immediate
+```
+
+### Performance Tuning Rationale
+
+| Option | Why |
+|--------|-----|
+| `nfsvers=4.2` | Enables server-side copy, hole punch, and fallocate — TrueNAS Scale supports NFSv4.2 natively |
+| `nconnect=16` | Opens 16 parallel TCP connections per mount, spreading I/O across both 10GbE bond members |
+| `rsize/wsize=1048576` | 1 MB block sizes maximise throughput per operation — jumbo frames (MTU 9000) carry each 1 MB payload in fewer packets, reducing per-packet overhead |
+| `nocto` | Skips close-to-open consistency checks — safe because model weights and artifacts are write-once/read-many |
+| `actimeo=600` | Caches file and directory attributes for 10 minutes, reducing metadata round-trips for static content |
+| `nodiratime` | Avoids unnecessary directory timestamp writes alongside `noatime` |
+
+### Why "nfs-fast"?
+
+Gravenhollow addresses the performance gap between Longhorn (local) and candlekeep (HDD NAS):
+- **All-SSD:** No spinning disk latency — suitable for random I/O workloads like model loading
+- **Dual 10GbE:** 2× 10 Gbps network links via link aggregation
+- **12.2 TB capacity:** Enough for model cache, artifacts, and training data
+- **RustFS S3:** S3-compatible object storage endpoint for pipeline artifacts and backups
+- **Use case:** AI/ML model cache, MLflow artifacts, training data — workloads that need better than HDD but don't require local NVMe
+
+### S3 Endpoint (RustFS)
+
+Gravenhollow also provides S3-compatible object storage via RustFS:
+- **Endpoint:** `http://gravenhollow.lab.daviestechlabs.io:30292`
+- **Use cases:** Kubeflow pipeline artifacts, Longhorn off-cluster backups, training dataset staging
+- **Credentials:** Managed via Vault ExternalSecret (`/kv/data/gravenhollow` → `access_key`, `secret_key`)
+
 ## Storage Tier Selection Guide

 | Workload Type | Storage Class | Rationale |
 |---------------|---------------|-----------|
-| PostgreSQL (CNPG) | `longhorn` or `nfs-slow` | Depends on criticality |
+| PostgreSQL (CNPG) | `longhorn` | HA with replication, low latency |
 | Prometheus/ClickHouse | `longhorn` | High write IOPS required |
 | Vault | `longhorn` | Security-critical, needs HA |
+| AI/ML models (Ray) | `nfs-fast` | Large model weights, SSD speed |
+| MLflow artifacts | `nfs-fast` | Experiment tracking, frequent reads |
+| Training data | `nfs-fast` | Dataset staging for fine-tuning |
 | Media (Jellyfin, Kavita) | `nfs-slow` | Large files, sequential reads |
 | Photos (Immich) | `nfs-slow` | Bulk storage for photos |
 | User files (Nextcloud) | `nfs-slow` | Capacity over speed |
-| AI/ML models (Ray) | `nfs-slow` | Large model weights |
 | Build caches (Gitea runner) | `nfs-slow` | Ephemeral, large |
-| MLflow artifacts | `nfs-slow` | Model artifacts storage |

 ## Volume Usage by Tier

@@ -296,14 +389,15 @@ spec:

 ### When to Choose Each Tier

-| Requirement | Longhorn | NFS-Slow |
-|-------------|----------|----------|
-| Low latency | ✅ | ❌ |
-| High IOPS | ✅ | ❌ |
-| Large capacity | ❌ | ✅ |
-| ReadWriteMany (RWX) | Limited | ✅ |
-| Node failure survival | ✅ | ✅ (NAS HA) |
-| Kubernetes-native | ✅ | ✅ |
+| Requirement | Longhorn | NFS-Fast | NFS-Slow |
+|-------------|----------|----------|----------|
+| Low latency | ✅ | ⚡ | ❌ |
+| High IOPS | ✅ | ⚡ | ❌ |
+| Large capacity | ❌ | ✅ (12.2 TB) | ✅✅ |
+| ReadWriteMany (RWX) | Limited | ✅ | ✅ |
+| S3 compatible | ❌ | ✅ (RustFS) | ✅ (Quobjects) |
+| Node failure survival | ✅ | ✅ (NAS) | ✅ (NAS) |
+| Kubernetes-native | ✅ | ✅ | ✅ |

 ## Monitoring

@@ -320,11 +414,13 @@ spec:

 ## Future Enhancements

-1. **NAS high availability** - Second NAS with replication
-2. **Dedicated storage network** - Separate VLAN for storage traffic
+1. ~~**NAS high availability** - Second NAS with replication~~ ✅ Done — gravenhollow adds a second NAS
+2. **Dedicated storage network** - Separate VLAN for storage traffic (gravenhollow's dual 10GbE makes this more impactful)
 3. **NVMe-oF** - Network NVMe for lower latency
 4. **Tiered Longhorn** - Hot (NVMe) and warm (SSD) within Longhorn
-5. **S3 tier** - MinIO for object storage workloads
+5. ~~**S3 tier** - MinIO for object storage workloads~~ ✅ Done — gravenhollow RustFS provides S3
+6. **Migrate AI/ML PVCs to nfs-fast** - Move ray-model-cache and mlflow-artifacts from nfs-slow to nfs-fast
+7. **Longhorn backups to gravenhollow S3** - Use RustFS as off-cluster backup target

 ## References