updates to finish nfs-fast implementation.
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s

This commit is contained in:
2026-02-16 18:08:32 -05:00
parent 7685b2b757
commit b4e608f002
5 changed files with 134 additions and 37 deletions

View File

@@ -37,9 +37,10 @@ How do we provide tiered storage that balances performance, reliability, and cap
Chosen option: **Option 1 - Longhorn + NFS dual-tier storage**
Two storage tiers optimized for different use cases:
Three storage tiers optimized for different use cases:
- **`longhorn`** (default): Fast distributed block storage on NVMe/SSDs for databases and critical workloads
- **`nfs-slow`**: High-capacity NFS storage on external NAS for media, datasets, and bulk storage
- **`nfs-fast`**: High-performance NFS + S3 storage on gravenhollow (all-SSD TrueNAS Scale, dual 10GbE, 12.2 TB) for AI model cache, hot data, and S3-compatible object storage via RustFS
- **`nfs-slow`**: High-capacity NFS storage on candlekeep (QNAP HDD NAS) for media, datasets, and bulk storage
### Positive Consequences
@@ -90,7 +91,7 @@ Two storage tiers optimized for different use cases:
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ candlekeep.lab.daviestechlabs.io │ │
│ │ (External NAS) │ │
│ │ (QNAP NAS) │ │
│ │ │ │
│ │ /kubernetes │ │
│ │ ├── jellyfin-media/ (1TB+ media library) │ │
@@ -113,6 +114,38 @@ Two storage tiers optimized for different use cases:
│ │ PVC │ │ PVC │ │ PVC │ │ PVC │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────────┐
│ TIER 3: NFS-FAST │
│ (High-Performance SSD NFS + S3 Storage) │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ gravenhollow.lab.daviestechlabs.io │ │
│ │ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │ │
│ │ │ │
│ │ NFS: /mnt/gravenhollow/kubernetes │ │
│ │ ├── ray-model-cache/ (AI model weights - hot) │ │
│ │ ├── mlflow-artifacts/ (ML experiment tracking) │ │
│ │ └── training-data/ (datasets for fine-tuning) │ │
│ │ │ │
│ │ S3 (RustFS): http://gravenhollow.lab.daviestechlabs.io:30292 │ │
│ │ ├── kubeflow-pipelines (pipeline artifacts) │ │
│ │ ├── training-data (large dataset staging) │ │
│ │ └── longhorn-backups (off-cluster backup target) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ NFS CSI Driver │ │
│ │ (csi-driver-nfs) │ │
│ └───────────┬───────────┘ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Ray Model │ │ MLflow │ │ Training │ │
│ │ Cache │ │ Artifact │ │ Data │ │
│ │ PVC │ │ PVC │ │ PVC │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────────────────────────────────────────────────┘
```
## Tier 1: Longhorn Configuration
@@ -179,19 +212,79 @@ The naming is intentional - it sets correct expectations:
- **Throughput:** Adequate for streaming media, not for databases
- **Benefit:** Massive capacity without consuming cluster disk space
## Tier 3: NFS-Fast Configuration
### Helm Values (second csi-driver-nfs installation)
A second HelmRelease (`csi-driver-nfs-fast`) references the same OCI chart but only creates the StorageClass — the CSI driver pods are already running from the nfs-slow installation.
```yaml
controller:
replicas: 0
node:
enabled: false
storageClass:
create: true
name: nfs-fast
parameters:
server: gravenhollow.lab.daviestechlabs.io
share: /mnt/gravenhollow/kubernetes
mountOptions:
- nfsvers=4.2 # Server-side copy, fallocate, seekhole
- nconnect=16 # 16 TCP connections across bonded 10GbE
- rsize=1048576 # 1 MB read block size
- wsize=1048576 # 1 MB write block size
- hard # Retry indefinitely on timeout
- noatime # Skip access-time updates
- nodiratime # Skip directory access-time updates
- nocto # Disable close-to-open consistency (read-heavy workloads)
- actimeo=600 # Cache attributes for 10 min
- max_connect=16 # Allow up to 16 connections to the same server
reclaimPolicy: Delete
volumeBindingMode: Immediate
```
### Performance Tuning Rationale
| Option | Why |
|--------|-----|
| `nfsvers=4.2` | Enables server-side copy, hole punch, and fallocate — TrueNAS Scale supports NFSv4.2 natively |
| `nconnect=16` | Opens 16 parallel TCP connections per mount, spreading I/O across both 10GbE bond members |
| `rsize/wsize=1048576` | 1 MB block sizes maximise throughput per operation — jumbo frames (MTU 9000) carry each 1 MB payload in fewer packets, reducing per-packet overhead |
| `nocto` | Skips close-to-open consistency checks — safe because model weights and artifacts are write-once/read-many |
| `actimeo=600` | Caches file and directory attributes for 10 minutes, reducing metadata round-trips for static content |
| `nodiratime` | Avoids unnecessary directory timestamp writes alongside `noatime` |
### Why "nfs-fast"?
Gravenhollow addresses the performance gap between Longhorn (local) and candlekeep (HDD NAS):
- **All-SSD:** No spinning disk latency — suitable for random I/O workloads like model loading
- **Dual 10GbE:** 2× 10 Gbps network links via link aggregation
- **12.2 TB capacity:** Enough for model cache, artifacts, and training data
- **RustFS S3:** S3-compatible object storage endpoint for pipeline artifacts and backups
- **Use case:** AI/ML model cache, MLflow artifacts, training data — workloads that need better than HDD but don't require local NVMe
### S3 Endpoint (RustFS)
Gravenhollow also provides S3-compatible object storage via RustFS:
- **Endpoint:** `http://gravenhollow.lab.daviestechlabs.io:30292`
- **Use cases:** Kubeflow pipeline artifacts, Longhorn off-cluster backups, training dataset staging
- **Credentials:** Managed via Vault ExternalSecret (`/kv/data/gravenhollow``access_key`, `secret_key`)
## Storage Tier Selection Guide
| Workload Type | Storage Class | Rationale |
|---------------|---------------|-----------|
| PostgreSQL (CNPG) | `longhorn` or `nfs-slow` | Depends on criticality |
| PostgreSQL (CNPG) | `longhorn` | HA with replication, low latency |
| Prometheus/ClickHouse | `longhorn` | High write IOPS required |
| Vault | `longhorn` | Security-critical, needs HA |
| AI/ML models (Ray) | `nfs-fast` | Large model weights, SSD speed |
| MLflow artifacts | `nfs-fast` | Experiment tracking, frequent reads |
| Training data | `nfs-fast` | Dataset staging for fine-tuning |
| Media (Jellyfin, Kavita) | `nfs-slow` | Large files, sequential reads |
| Photos (Immich) | `nfs-slow` | Bulk storage for photos |
| User files (Nextcloud) | `nfs-slow` | Capacity over speed |
| AI/ML models (Ray) | `nfs-slow` | Large model weights |
| Build caches (Gitea runner) | `nfs-slow` | Ephemeral, large |
| MLflow artifacts | `nfs-slow` | Model artifacts storage |
## Volume Usage by Tier
@@ -296,14 +389,15 @@ spec:
### When to Choose Each Tier
| Requirement | Longhorn | NFS-Slow |
|-------------|----------|----------|
| Low latency | ✅ | ❌ |
| High IOPS | ✅ | ❌ |
| Large capacity | ❌ | ✅ |
| ReadWriteMany (RWX) | Limited | ✅ |
| Node failure survival | | ✅ (NAS HA) |
| Kubernetes-native | ✅ | ✅ |
| Requirement | Longhorn | NFS-Fast | NFS-Slow |
|-------------|----------|----------|----------|
| Low latency | ✅ | ⚡ | ❌ |
| High IOPS | ✅ | ⚡ | ❌ |
| Large capacity | ❌ | ✅ (12.2 TB) | ✅✅ |
| ReadWriteMany (RWX) | Limited | ✅ | ✅ |
| S3 compatible | | ✅ (RustFS) | ✅ (Quobjects) |
| Node failure survival | ✅ | ✅ (NAS) | ✅ (NAS) |
| Kubernetes-native | ✅ | ✅ | ✅ |
## Monitoring
@@ -320,11 +414,13 @@ spec:
## Future Enhancements
1. **NAS high availability** - Second NAS with replication
2. **Dedicated storage network** - Separate VLAN for storage traffic
1. ~~**NAS high availability** - Second NAS with replication~~ ✅ Done — gravenhollow adds a second NAS
2. **Dedicated storage network** - Separate VLAN for storage traffic (gravenhollow's dual 10GbE makes this more impactful)
3. **NVMe-oF** - Network NVMe for lower latency
4. **Tiered Longhorn** - Hot (NVMe) and warm (SSD) within Longhorn
5. **S3 tier** - MinIO for object storage workloads
5. ~~**S3 tier** - MinIO for object storage workloads~~ ✅ Done — gravenhollow RustFS provides S3
6. **Migrate AI/ML PVCs to nfs-fast** - Move ray-model-cache and mlflow-artifacts from nfs-slow to nfs-fast
7. **Longhorn backups to gravenhollow S3** - Use RustFS as off-cluster backup target
## References