updates to finish nfs-fast implementation.
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s

This commit is contained in:
2026-02-16 18:08:32 -05:00
parent 7685b2b757
commit b4e608f002
5 changed files with 134 additions and 37 deletions

View File

@@ -37,9 +37,10 @@ How do we provide tiered storage that balances performance, reliability, and cap
Chosen option: **Option 1 - Longhorn + NFS dual-tier storage** Chosen option: **Option 1 - Longhorn + NFS dual-tier storage**
Two storage tiers optimized for different use cases: Three storage tiers optimized for different use cases:
- **`longhorn`** (default): Fast distributed block storage on NVMe/SSDs for databases and critical workloads - **`longhorn`** (default): Fast distributed block storage on NVMe/SSDs for databases and critical workloads
- **`nfs-slow`**: High-capacity NFS storage on external NAS for media, datasets, and bulk storage - **`nfs-fast`**: High-performance NFS + S3 storage on gravenhollow (all-SSD TrueNAS Scale, dual 10GbE, 12.2 TB) for AI model cache, hot data, and S3-compatible object storage via RustFS
- **`nfs-slow`**: High-capacity NFS storage on candlekeep (QNAP HDD NAS) for media, datasets, and bulk storage
### Positive Consequences ### Positive Consequences
@@ -90,7 +91,7 @@ Two storage tiers optimized for different use cases:
│ │ │ │
│ ┌────────────────────────────────────────────────────────────────┐ │ │ ┌────────────────────────────────────────────────────────────────┐ │
│ │ candlekeep.lab.daviestechlabs.io │ │ │ │ candlekeep.lab.daviestechlabs.io │ │
│ │ (External NAS) │ │ │ │ (QNAP NAS) │ │
│ │ │ │ │ │ │ │
│ │ /kubernetes │ │ │ │ /kubernetes │ │
│ │ ├── jellyfin-media/ (1TB+ media library) │ │ │ │ ├── jellyfin-media/ (1TB+ media library) │ │
@@ -113,6 +114,38 @@ Two storage tiers optimized for different use cases:
│ │ PVC │ │ PVC │ │ PVC │ │ PVC │ │ │ │ PVC │ │ PVC │ │ PVC │ │ PVC │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────────────────────────────────────────────────┘ └────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────────┐
│ TIER 3: NFS-FAST │
│ (High-Performance SSD NFS + S3 Storage) │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ gravenhollow.lab.daviestechlabs.io │ │
│ │ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │ │
│ │ │ │
│ │ NFS: /mnt/gravenhollow/kubernetes │ │
│ │ ├── ray-model-cache/ (AI model weights - hot) │ │
│ │ ├── mlflow-artifacts/ (ML experiment tracking) │ │
│ │ └── training-data/ (datasets for fine-tuning) │ │
│ │ │ │
│ │ S3 (RustFS): http://gravenhollow.lab.daviestechlabs.io:30292 │ │
│ │ ├── kubeflow-pipelines (pipeline artifacts) │ │
│ │ ├── training-data (large dataset staging) │ │
│ │ └── longhorn-backups (off-cluster backup target) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ NFS CSI Driver │ │
│ │ (csi-driver-nfs) │ │
│ └───────────┬───────────┘ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Ray Model │ │ MLflow │ │ Training │ │
│ │ Cache │ │ Artifact │ │ Data │ │
│ │ PVC │ │ PVC │ │ PVC │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└────────────────────────────────────────────────────────────────────────────┘
``` ```
## Tier 1: Longhorn Configuration ## Tier 1: Longhorn Configuration
@@ -179,19 +212,79 @@ The naming is intentional - it sets correct expectations:
- **Throughput:** Adequate for streaming media, not for databases - **Throughput:** Adequate for streaming media, not for databases
- **Benefit:** Massive capacity without consuming cluster disk space - **Benefit:** Massive capacity without consuming cluster disk space
## Tier 3: NFS-Fast Configuration
### Helm Values (second csi-driver-nfs installation)
A second HelmRelease (`csi-driver-nfs-fast`) references the same OCI chart but only creates the StorageClass — the CSI driver pods are already running from the nfs-slow installation.
```yaml
controller:
replicas: 0
node:
enabled: false
storageClass:
create: true
name: nfs-fast
parameters:
server: gravenhollow.lab.daviestechlabs.io
share: /mnt/gravenhollow/kubernetes
mountOptions:
- nfsvers=4.2 # Server-side copy, fallocate, seekhole
- nconnect=16 # 16 TCP connections across bonded 10GbE
- rsize=1048576 # 1 MB read block size
- wsize=1048576 # 1 MB write block size
- hard # Retry indefinitely on timeout
- noatime # Skip access-time updates
- nodiratime # Skip directory access-time updates
- nocto # Disable close-to-open consistency (read-heavy workloads)
- actimeo=600 # Cache attributes for 10 min
- max_connect=16 # Allow up to 16 connections to the same server
reclaimPolicy: Delete
volumeBindingMode: Immediate
```
### Performance Tuning Rationale
| Option | Why |
|--------|-----|
| `nfsvers=4.2` | Enables server-side copy, hole punch, and fallocate — TrueNAS Scale supports NFSv4.2 natively |
| `nconnect=16` | Opens 16 parallel TCP connections per mount, spreading I/O across both 10GbE bond members |
| `rsize/wsize=1048576` | 1 MB block sizes maximise throughput per operation — jumbo frames (MTU 9000) carry each 1 MB payload in fewer packets, reducing per-packet overhead |
| `nocto` | Skips close-to-open consistency checks — safe because model weights and artifacts are write-once/read-many |
| `actimeo=600` | Caches file and directory attributes for 10 minutes, reducing metadata round-trips for static content |
| `nodiratime` | Avoids unnecessary directory timestamp writes alongside `noatime` |
### Why "nfs-fast"?
Gravenhollow addresses the performance gap between Longhorn (local) and candlekeep (HDD NAS):
- **All-SSD:** No spinning disk latency — suitable for random I/O workloads like model loading
- **Dual 10GbE:** 2× 10 Gbps network links via link aggregation
- **12.2 TB capacity:** Enough for model cache, artifacts, and training data
- **RustFS S3:** S3-compatible object storage endpoint for pipeline artifacts and backups
- **Use case:** AI/ML model cache, MLflow artifacts, training data — workloads that need better than HDD but don't require local NVMe
### S3 Endpoint (RustFS)
Gravenhollow also provides S3-compatible object storage via RustFS:
- **Endpoint:** `http://gravenhollow.lab.daviestechlabs.io:30292`
- **Use cases:** Kubeflow pipeline artifacts, Longhorn off-cluster backups, training dataset staging
- **Credentials:** Managed via Vault ExternalSecret (`/kv/data/gravenhollow``access_key`, `secret_key`)
## Storage Tier Selection Guide ## Storage Tier Selection Guide
| Workload Type | Storage Class | Rationale | | Workload Type | Storage Class | Rationale |
|---------------|---------------|-----------| |---------------|---------------|-----------|
| PostgreSQL (CNPG) | `longhorn` or `nfs-slow` | Depends on criticality | | PostgreSQL (CNPG) | `longhorn` | HA with replication, low latency |
| Prometheus/ClickHouse | `longhorn` | High write IOPS required | | Prometheus/ClickHouse | `longhorn` | High write IOPS required |
| Vault | `longhorn` | Security-critical, needs HA | | Vault | `longhorn` | Security-critical, needs HA |
| AI/ML models (Ray) | `nfs-fast` | Large model weights, SSD speed |
| MLflow artifacts | `nfs-fast` | Experiment tracking, frequent reads |
| Training data | `nfs-fast` | Dataset staging for fine-tuning |
| Media (Jellyfin, Kavita) | `nfs-slow` | Large files, sequential reads | | Media (Jellyfin, Kavita) | `nfs-slow` | Large files, sequential reads |
| Photos (Immich) | `nfs-slow` | Bulk storage for photos | | Photos (Immich) | `nfs-slow` | Bulk storage for photos |
| User files (Nextcloud) | `nfs-slow` | Capacity over speed | | User files (Nextcloud) | `nfs-slow` | Capacity over speed |
| AI/ML models (Ray) | `nfs-slow` | Large model weights |
| Build caches (Gitea runner) | `nfs-slow` | Ephemeral, large | | Build caches (Gitea runner) | `nfs-slow` | Ephemeral, large |
| MLflow artifacts | `nfs-slow` | Model artifacts storage |
## Volume Usage by Tier ## Volume Usage by Tier
@@ -296,14 +389,15 @@ spec:
### When to Choose Each Tier ### When to Choose Each Tier
| Requirement | Longhorn | NFS-Slow | | Requirement | Longhorn | NFS-Fast | NFS-Slow |
|-------------|----------|----------| |-------------|----------|----------|----------|
| Low latency | ✅ | ❌ | | Low latency | ✅ | ⚡ | ❌ |
| High IOPS | ✅ | ❌ | | High IOPS | ✅ | ⚡ | ❌ |
| Large capacity | ❌ | ✅ | | Large capacity | ❌ | ✅ (12.2 TB) | ✅✅ |
| ReadWriteMany (RWX) | Limited | ✅ | | ReadWriteMany (RWX) | Limited | ✅ | ✅ |
| Node failure survival | | ✅ (NAS HA) | | S3 compatible | | ✅ (RustFS) | ✅ (Quobjects) |
| Kubernetes-native | ✅ | ✅ | | Node failure survival | ✅ | ✅ (NAS) | ✅ (NAS) |
| Kubernetes-native | ✅ | ✅ | ✅ |
## Monitoring ## Monitoring
@@ -320,11 +414,13 @@ spec:
## Future Enhancements ## Future Enhancements
1. **NAS high availability** - Second NAS with replication 1. ~~**NAS high availability** - Second NAS with replication~~ ✅ Done — gravenhollow adds a second NAS
2. **Dedicated storage network** - Separate VLAN for storage traffic 2. **Dedicated storage network** - Separate VLAN for storage traffic (gravenhollow's dual 10GbE makes this more impactful)
3. **NVMe-oF** - Network NVMe for lower latency 3. **NVMe-oF** - Network NVMe for lower latency
4. **Tiered Longhorn** - Hot (NVMe) and warm (SSD) within Longhorn 4. **Tiered Longhorn** - Hot (NVMe) and warm (SSD) within Longhorn
5. **S3 tier** - MinIO for object storage workloads 5. ~~**S3 tier** - MinIO for object storage workloads~~ ✅ Done — gravenhollow RustFS provides S3
6. **Migrate AI/ML PVCs to nfs-fast** - Move ray-model-cache and mlflow-artifacts from nfs-slow to nfs-fast
7. **Longhorn backups to gravenhollow S3** - Use RustFS as off-cluster backup target
## References ## References

View File

@@ -82,8 +82,8 @@ Fighters are the workhorses, handling general compute without magical (GPU) abil
| Node | Character/Location | Role | Notes | | Node | Character/Location | Role | Notes |
|------|-------------------|------|-------| |------|-------------------|------|-------|
| `candlekeep` | Candlekeep | Primary NAS (Synology) | Library fortress, knowledge storage | | `candlekeep` | Candlekeep | Primary NAS (QNAP) | Library fortress, knowledge storage |
| `neverwinter` | Neverwinter | Fast NAS (TrueNAS Scale) | Jewel of the North, all-SSD, nfs-fast | | `gravenhollow` | Gravenhollow | Fast NAS (TrueNAS Scale) | Living memory of the Underdark, all-SSD, dual 10GbE, nfs-fast |
| `waterdeep` | Waterdeep | Mac Mini dev workstation | City of Splendors, primary city | | `waterdeep` | Waterdeep | Mac Mini dev workstation | City of Splendors, primary city |
### Future Expansion ### Future Expansion
@@ -139,11 +139,11 @@ Fighters are the workhorses, handling general compute without magical (GPU) abil
┌───────────────────────────────────────────────────────────────────────────────┐ ┌───────────────────────────────────────────────────────────────────────────────┐
│ 🏰 Locations (Off-Cluster Infrastructure) │ │ 🏰 Locations (Off-Cluster Infrastructure) │
│ │ │ │
│ 📚 candlekeep ❄️ neverwinter 🏙️ waterdeep │ │ 📚 candlekeep 🪨 gravenhollow 🏙️ waterdeep │
Synology NAS TrueNAS Scale (SSD) Mac Mini │ QNAP NAS TrueNAS Scale (SSD) Mac Mini │
│ nfs-default nfs-fast Dev workstation │ │ nfs-slow nfs-fast Dev workstation │
│ High capacity High speed Primary dev box │ │ High capacity High speed, 12.2TB Primary dev box │
│ "Library Fortress" "Jewel of the North" "City of Splendors" │ │ "Library Fortress" "Living Memory" "City of Splendors" │
└───────────────────────────────────────────────────────────────────────────────┘ └───────────────────────────────────────────────────────────────────────────────┘
``` ```
@@ -152,7 +152,7 @@ Fighters are the workhorses, handling general compute without magical (GPU) abil
| Location | Storage Class | Speed | Capacity | Use Case | | Location | Storage Class | Speed | Capacity | Use Case |
|----------|--------------|-------|----------|----------| |----------|--------------|-------|----------|----------|
| Candlekeep | `nfs-default` | HDD | High | Backups, archives, media | | Candlekeep | `nfs-default` | HDD | High | Backups, archives, media |
| Neverwinter | `nfs-fast` | SSD | Medium | Database WAL, hot data | | Gravenhollow | `nfs-fast` | SSD (12.2 TB) | Medium-High | Database WAL, hot data, model cache |
| Longhorn | `longhorn` | Local SSD | Distributed | Replicated app data | | Longhorn | `longhorn` | Local SSD | Distributed | Replicated app data |
## Node Labels ## Node Labels
@@ -182,6 +182,6 @@ All nodes are resolvable via:
* [Khelben Arunsun](https://forgottenrealms.fandom.com/wiki/Khelben_Arunsun) * [Khelben Arunsun](https://forgottenrealms.fandom.com/wiki/Khelben_Arunsun)
* [Elminster](https://forgottenrealms.fandom.com/wiki/Elminster_Aumar) * [Elminster](https://forgottenrealms.fandom.com/wiki/Elminster_Aumar)
* [Candlekeep](https://forgottenrealms.fandom.com/wiki/Candlekeep) * [Candlekeep](https://forgottenrealms.fandom.com/wiki/Candlekeep)
* [Neverwinter](https://forgottenrealms.fandom.com/wiki/Neverwinter) * [Gravenhollow](https://forgottenrealms.fandom.com/wiki/Gravenhollow)
* Related: [ADR-0035](0035-arm64-worker-strategy.md) - ARM64 Worker Strategy * Related: [ADR-0035](0035-arm64-worker-strategy.md) - ARM64 Worker Strategy
* Related: [ADR-0011](0011-kuberay-unified-serving.md) - KubeRay Unified Serving * Related: [ADR-0011](0011-kuberay-unified-serving.md) - KubeRay Unified Serving

View File

@@ -59,7 +59,7 @@ Chosen option: **Option 1 — External Ray worker on macOS**, because Ray native
* Network dependency — if waterdeep sleeps or disconnects, Ray tasks on it fail * Network dependency — if waterdeep sleeps or disconnects, Ray tasks on it fail
* MPS backend has limited operator coverage compared to CUDA/ROCm * MPS backend has limited operator coverage compared to CUDA/ROCm
* Python environment must be maintained separately (not in a container image) * Python environment must be maintained separately (not in a container image)
* No Longhorn storage — model cache managed locally or via NFS mount * No Longhorn storage — model cache managed locally or via NFS mount from gravenhollow (nfs-fast)
* Monitoring not automatically scraped by Prometheus (needs node-exporter or push gateway) * Monitoring not automatically scraped by Prometheus (needs node-exporter or push gateway)
## Pros and Cons of the Options ## Pros and Cons of the Options
@@ -125,7 +125,7 @@ Chosen option: **Option 1 — External Ray worker on macOS**, because Ray native
│ │ └── Training: LoRA/QLoRA fine-tuning via Ray Train │ │ │ │ └── Training: LoRA/QLoRA fine-tuning via Ray Train │ │
│ └──────────────────────────────────────────────────────────────────┘ │ │ └──────────────────────────────────────────────────────────────────┘ │
│ │ │ │
│ Model cache: ~/Library/Caches/huggingface + NFS mount │ Model cache: ~/Library/Caches/huggingface + NFS mount (gravenhollow)
└──────────────────────────────────────────────────────────────────────────┘ └──────────────────────────────────────────────────────────────────────────┘
``` ```
@@ -233,15 +233,15 @@ launchctl load ~/Library/LaunchAgents/io.ray.worker.plist
### 5. Model Cache via NFS ### 5. Model Cache via NFS
Mount the NAS model cache on waterdeep so models are shared with the cluster: Mount the gravenhollow NFS share on waterdeep so models are shared with the cluster via the fast all-SSD NAS:
```bash ```bash
# Mount candlekeep NFS share # Mount gravenhollow NFS share (all-SSD, dual 10GbE)
sudo mount -t nfs candlekeep.lab.daviestechlabs.io:/volume1/models \ sudo mount -t nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models \
/Volumes/model-cache /Volumes/model-cache
# Or add to /etc/fstab for persistence # Or add to /etc/fstab for persistence
# candlekeep.lab.daviestechlabs.io:/volume1/models /Volumes/model-cache nfs rw 0 0 # gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models /Volumes/model-cache nfs rw 0 0
# Symlink to HuggingFace cache location # Symlink to HuggingFace cache location
ln -s /Volumes/model-cache ~/.cache/huggingface/hub ln -s /Volumes/model-cache ~/.cache/huggingface/hub
@@ -315,6 +315,7 @@ caffeinate -s ray start --address=... --block
* Ray's GCS port (6379) will be exposed outside the cluster — restrict with firewall rules to waterdeep's IP only * Ray's GCS port (6379) will be exposed outside the cluster — restrict with firewall rules to waterdeep's IP only
* The Ray worker has no RBAC — it executes whatever tasks the head assigns * The Ray worker has no RBAC — it executes whatever tasks the head assigns
* Model weights on NFS are read-only from waterdeep (mount with `ro` option if possible) * Model weights on NFS are read-only from waterdeep (mount with `ro` option if possible)
* NFS traffic to gravenhollow traverses the LAN — ensure dual 10GbE links are active
* Consider Tailscale or WireGuard for encrypted transport if the Ray GCS traffic crosses untrusted network segments * Consider Tailscale or WireGuard for encrypted transport if the Ray GCS traffic crosses untrusted network segments
## Future Considerations ## Future Considerations

View File

@@ -31,8 +31,8 @@ flowchart TB
end end
subgraph Infrastructure["🏰 Locations (Off-Cluster Infrastructure)"] subgraph Infrastructure["🏰 Locations (Off-Cluster Infrastructure)"]
Candlekeep["📚 candlekeep<br/>Synology NAS<br/>nfs-default<br/><i>Library Fortress</i>"] Candlekeep["📚 candlekeep<br/>QNAP NAS<br/>nfs-slow<br/><i>Library Fortress</i>"]
Neverwinter["❄️ neverwinter<br/>TrueNAS Scale (SSD)<br/>nfs-fast<br/><i>Jewel of the North</i>"] Gravenhollow["🪨 gravenhollow<br/>TrueNAS Scale (SSD)<br/>nfs-fast · 12.2 TB<br/><i>Living Memory</i>"]
Waterdeep["🏙️ waterdeep<br/>Mac Mini<br/>Dev Workstation<br/><i>City of Splendors</i>"] Waterdeep["🏙️ waterdeep<br/>Mac Mini<br/>Dev Workstation<br/><i>City of Splendors</i>"]
end end
@@ -44,7 +44,7 @@ flowchart TB
end end
ControlPlane -.->|"etcd"| ControlPlane ControlPlane -.->|"etcd"| ControlPlane
Wizards -.->|"Fast Storage"| Neverwinter Wizards -.->|"Fast Storage"| Gravenhollow
Wizards -.->|"Backups"| Candlekeep Wizards -.->|"Backups"| Candlekeep
Rogues -.->|"NFS Mounts"| Candlekeep Rogues -.->|"NFS Mounts"| Candlekeep
Fighters -.->|"NFS Mounts"| Candlekeep Fighters -.->|"NFS Mounts"| Candlekeep
@@ -60,5 +60,5 @@ flowchart TB
class Khelben,Elminster,Drizzt,Danilo,Regis wizard class Khelben,Elminster,Drizzt,Danilo,Regis wizard
class Durnan,Elaith,Jarlaxle,Mirt,Volo rogue class Durnan,Elaith,Jarlaxle,Mirt,Volo rogue
class Wulfgar fighter class Wulfgar fighter
class Candlekeep,Neverwinter,Waterdeep location class Candlekeep,Gravenhollow,Waterdeep location
class AI,Edge,Compute,Storage workload class AI,Edge,Compute,Storage workload

View File

@@ -23,7 +23,7 @@ flowchart TB
MinIO["MinIO<br/>On-premises S3"] MinIO["MinIO<br/>On-premises S3"]
end end
subgraph Secondary["Secondary: NFS"] subgraph Secondary["Secondary: NFS"]
NAS["Synology NAS<br/>Long-term retention"] NAS["QNAP NAS<br/>Long-term retention"]
end end
end end