updates to finish nfs-fast implementation.
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
This commit is contained in:
@@ -37,9 +37,10 @@ How do we provide tiered storage that balances performance, reliability, and cap
|
||||
|
||||
Chosen option: **Option 1 - Longhorn + NFS dual-tier storage**
|
||||
|
||||
Two storage tiers optimized for different use cases:
|
||||
Three storage tiers optimized for different use cases:
|
||||
- **`longhorn`** (default): Fast distributed block storage on NVMe/SSDs for databases and critical workloads
|
||||
- **`nfs-slow`**: High-capacity NFS storage on external NAS for media, datasets, and bulk storage
|
||||
- **`nfs-fast`**: High-performance NFS + S3 storage on gravenhollow (all-SSD TrueNAS Scale, dual 10GbE, 12.2 TB) for AI model cache, hot data, and S3-compatible object storage via RustFS
|
||||
- **`nfs-slow`**: High-capacity NFS storage on candlekeep (QNAP HDD NAS) for media, datasets, and bulk storage
|
||||
|
||||
### Positive Consequences
|
||||
|
||||
@@ -90,7 +91,7 @@ Two storage tiers optimized for different use cases:
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ candlekeep.lab.daviestechlabs.io │ │
|
||||
│ │ (External NAS) │ │
|
||||
│ │ (QNAP NAS) │ │
|
||||
│ │ │ │
|
||||
│ │ /kubernetes │ │
|
||||
│ │ ├── jellyfin-media/ (1TB+ media library) │ │
|
||||
@@ -113,6 +114,38 @@ Two storage tiers optimized for different use cases:
|
||||
│ │ PVC │ │ PVC │ │ PVC │ │ PVC │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
┌────────────────────────────────────────────────────────────────────────────┐
|
||||
│ TIER 3: NFS-FAST │
|
||||
│ (High-Performance SSD NFS + S3 Storage) │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ gravenhollow.lab.daviestechlabs.io │ │
|
||||
│ │ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │ │
|
||||
│ │ │ │
|
||||
│ │ NFS: /mnt/gravenhollow/kubernetes │ │
|
||||
│ │ ├── ray-model-cache/ (AI model weights - hot) │ │
|
||||
│ │ ├── mlflow-artifacts/ (ML experiment tracking) │ │
|
||||
│ │ └── training-data/ (datasets for fine-tuning) │ │
|
||||
│ │ │ │
|
||||
│ │ S3 (RustFS): http://gravenhollow.lab.daviestechlabs.io:30292 │ │
|
||||
│ │ ├── kubeflow-pipelines (pipeline artifacts) │ │
|
||||
│ │ ├── training-data (large dataset staging) │ │
|
||||
│ │ └── longhorn-backups (off-cluster backup target) │ │
|
||||
│ └────────────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌───────────────────────┐ │
|
||||
│ │ NFS CSI Driver │ │
|
||||
│ │ (csi-driver-nfs) │ │
|
||||
│ └───────────┬───────────┘ │
|
||||
│ ▼ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │Ray Model │ │ MLflow │ │ Training │ │
|
||||
│ │ Cache │ │ Artifact │ │ Data │ │
|
||||
│ │ PVC │ │ PVC │ │ PVC │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Tier 1: Longhorn Configuration
|
||||
@@ -179,19 +212,79 @@ The naming is intentional - it sets correct expectations:
|
||||
- **Throughput:** Adequate for streaming media, not for databases
|
||||
- **Benefit:** Massive capacity without consuming cluster disk space
|
||||
|
||||
## Tier 3: NFS-Fast Configuration
|
||||
|
||||
### Helm Values (second csi-driver-nfs installation)
|
||||
|
||||
A second HelmRelease (`csi-driver-nfs-fast`) references the same OCI chart but only creates the StorageClass — the CSI driver pods are already running from the nfs-slow installation.
|
||||
|
||||
```yaml
|
||||
controller:
|
||||
replicas: 0
|
||||
node:
|
||||
enabled: false
|
||||
storageClass:
|
||||
create: true
|
||||
name: nfs-fast
|
||||
parameters:
|
||||
server: gravenhollow.lab.daviestechlabs.io
|
||||
share: /mnt/gravenhollow/kubernetes
|
||||
mountOptions:
|
||||
- nfsvers=4.2 # Server-side copy, fallocate, seekhole
|
||||
- nconnect=16 # 16 TCP connections across bonded 10GbE
|
||||
- rsize=1048576 # 1 MB read block size
|
||||
- wsize=1048576 # 1 MB write block size
|
||||
- hard # Retry indefinitely on timeout
|
||||
- noatime # Skip access-time updates
|
||||
- nodiratime # Skip directory access-time updates
|
||||
- nocto # Disable close-to-open consistency (read-heavy workloads)
|
||||
- actimeo=600 # Cache attributes for 10 min
|
||||
- max_connect=16 # Allow up to 16 connections to the same server
|
||||
reclaimPolicy: Delete
|
||||
volumeBindingMode: Immediate
|
||||
```
|
||||
|
||||
### Performance Tuning Rationale
|
||||
|
||||
| Option | Why |
|
||||
|--------|-----|
|
||||
| `nfsvers=4.2` | Enables server-side copy, hole punch, and fallocate — TrueNAS Scale supports NFSv4.2 natively |
|
||||
| `nconnect=16` | Opens 16 parallel TCP connections per mount, spreading I/O across both 10GbE bond members |
|
||||
| `rsize/wsize=1048576` | 1 MB block sizes maximise throughput per operation — jumbo frames (MTU 9000) carry each 1 MB payload in fewer packets, reducing per-packet overhead |
|
||||
| `nocto` | Skips close-to-open consistency checks — safe because model weights and artifacts are write-once/read-many |
|
||||
| `actimeo=600` | Caches file and directory attributes for 10 minutes, reducing metadata round-trips for static content |
|
||||
| `nodiratime` | Avoids unnecessary directory timestamp writes alongside `noatime` |
|
||||
|
||||
### Why "nfs-fast"?
|
||||
|
||||
Gravenhollow addresses the performance gap between Longhorn (local) and candlekeep (HDD NAS):
|
||||
- **All-SSD:** No spinning disk latency — suitable for random I/O workloads like model loading
|
||||
- **Dual 10GbE:** 2× 10 Gbps network links via link aggregation
|
||||
- **12.2 TB capacity:** Enough for model cache, artifacts, and training data
|
||||
- **RustFS S3:** S3-compatible object storage endpoint for pipeline artifacts and backups
|
||||
- **Use case:** AI/ML model cache, MLflow artifacts, training data — workloads that need better than HDD but don't require local NVMe
|
||||
|
||||
### S3 Endpoint (RustFS)
|
||||
|
||||
Gravenhollow also provides S3-compatible object storage via RustFS:
|
||||
- **Endpoint:** `http://gravenhollow.lab.daviestechlabs.io:30292`
|
||||
- **Use cases:** Kubeflow pipeline artifacts, Longhorn off-cluster backups, training dataset staging
|
||||
- **Credentials:** Managed via Vault ExternalSecret (`/kv/data/gravenhollow` → `access_key`, `secret_key`)
|
||||
|
||||
## Storage Tier Selection Guide
|
||||
|
||||
| Workload Type | Storage Class | Rationale |
|
||||
|---------------|---------------|-----------|
|
||||
| PostgreSQL (CNPG) | `longhorn` or `nfs-slow` | Depends on criticality |
|
||||
| PostgreSQL (CNPG) | `longhorn` | HA with replication, low latency |
|
||||
| Prometheus/ClickHouse | `longhorn` | High write IOPS required |
|
||||
| Vault | `longhorn` | Security-critical, needs HA |
|
||||
| AI/ML models (Ray) | `nfs-fast` | Large model weights, SSD speed |
|
||||
| MLflow artifacts | `nfs-fast` | Experiment tracking, frequent reads |
|
||||
| Training data | `nfs-fast` | Dataset staging for fine-tuning |
|
||||
| Media (Jellyfin, Kavita) | `nfs-slow` | Large files, sequential reads |
|
||||
| Photos (Immich) | `nfs-slow` | Bulk storage for photos |
|
||||
| User files (Nextcloud) | `nfs-slow` | Capacity over speed |
|
||||
| AI/ML models (Ray) | `nfs-slow` | Large model weights |
|
||||
| Build caches (Gitea runner) | `nfs-slow` | Ephemeral, large |
|
||||
| MLflow artifacts | `nfs-slow` | Model artifacts storage |
|
||||
|
||||
## Volume Usage by Tier
|
||||
|
||||
@@ -296,14 +389,15 @@ spec:
|
||||
|
||||
### When to Choose Each Tier
|
||||
|
||||
| Requirement | Longhorn | NFS-Slow |
|
||||
|-------------|----------|----------|
|
||||
| Low latency | ✅ | ❌ |
|
||||
| High IOPS | ✅ | ❌ |
|
||||
| Large capacity | ❌ | ✅ |
|
||||
| ReadWriteMany (RWX) | Limited | ✅ |
|
||||
| Node failure survival | ✅ | ✅ (NAS HA) |
|
||||
| Kubernetes-native | ✅ | ✅ |
|
||||
| Requirement | Longhorn | NFS-Fast | NFS-Slow |
|
||||
|-------------|----------|----------|----------|
|
||||
| Low latency | ✅ | ⚡ | ❌ |
|
||||
| High IOPS | ✅ | ⚡ | ❌ |
|
||||
| Large capacity | ❌ | ✅ (12.2 TB) | ✅✅ |
|
||||
| ReadWriteMany (RWX) | Limited | ✅ | ✅ |
|
||||
| S3 compatible | ❌ | ✅ (RustFS) | ✅ (Quobjects) |
|
||||
| Node failure survival | ✅ | ✅ (NAS) | ✅ (NAS) |
|
||||
| Kubernetes-native | ✅ | ✅ | ✅ |
|
||||
|
||||
## Monitoring
|
||||
|
||||
@@ -320,11 +414,13 @@ spec:
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **NAS high availability** - Second NAS with replication
|
||||
2. **Dedicated storage network** - Separate VLAN for storage traffic
|
||||
1. ~~**NAS high availability** - Second NAS with replication~~ ✅ Done — gravenhollow adds a second NAS
|
||||
2. **Dedicated storage network** - Separate VLAN for storage traffic (gravenhollow's dual 10GbE makes this more impactful)
|
||||
3. **NVMe-oF** - Network NVMe for lower latency
|
||||
4. **Tiered Longhorn** - Hot (NVMe) and warm (SSD) within Longhorn
|
||||
5. **S3 tier** - MinIO for object storage workloads
|
||||
5. ~~**S3 tier** - MinIO for object storage workloads~~ ✅ Done — gravenhollow RustFS provides S3
|
||||
6. **Migrate AI/ML PVCs to nfs-fast** - Move ray-model-cache and mlflow-artifacts from nfs-slow to nfs-fast
|
||||
7. **Longhorn backups to gravenhollow S3** - Use RustFS as off-cluster backup target
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -82,8 +82,8 @@ Fighters are the workhorses, handling general compute without magical (GPU) abil
|
||||
|
||||
| Node | Character/Location | Role | Notes |
|
||||
|------|-------------------|------|-------|
|
||||
| `candlekeep` | Candlekeep | Primary NAS (Synology) | Library fortress, knowledge storage |
|
||||
| `neverwinter` | Neverwinter | Fast NAS (TrueNAS Scale) | Jewel of the North, all-SSD, nfs-fast |
|
||||
| `candlekeep` | Candlekeep | Primary NAS (QNAP) | Library fortress, knowledge storage |
|
||||
| `gravenhollow` | Gravenhollow | Fast NAS (TrueNAS Scale) | Living memory of the Underdark, all-SSD, dual 10GbE, nfs-fast |
|
||||
| `waterdeep` | Waterdeep | Mac Mini dev workstation | City of Splendors, primary city |
|
||||
|
||||
### Future Expansion
|
||||
@@ -139,11 +139,11 @@ Fighters are the workhorses, handling general compute without magical (GPU) abil
|
||||
┌───────────────────────────────────────────────────────────────────────────────┐
|
||||
│ 🏰 Locations (Off-Cluster Infrastructure) │
|
||||
│ │
|
||||
│ 📚 candlekeep ❄️ neverwinter 🏙️ waterdeep │
|
||||
│ Synology NAS TrueNAS Scale (SSD) Mac Mini │
|
||||
│ nfs-default nfs-fast Dev workstation │
|
||||
│ High capacity High speed Primary dev box │
|
||||
│ "Library Fortress" "Jewel of the North" "City of Splendors" │
|
||||
│ 📚 candlekeep 🪨 gravenhollow 🏙️ waterdeep │
|
||||
│ QNAP NAS TrueNAS Scale (SSD) Mac Mini │
|
||||
│ nfs-slow nfs-fast Dev workstation │
|
||||
│ High capacity High speed, 12.2TB Primary dev box │
|
||||
│ "Library Fortress" "Living Memory" "City of Splendors" │
|
||||
└───────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
@@ -152,7 +152,7 @@ Fighters are the workhorses, handling general compute without magical (GPU) abil
|
||||
| Location | Storage Class | Speed | Capacity | Use Case |
|
||||
|----------|--------------|-------|----------|----------|
|
||||
| Candlekeep | `nfs-default` | HDD | High | Backups, archives, media |
|
||||
| Neverwinter | `nfs-fast` | SSD | Medium | Database WAL, hot data |
|
||||
| Gravenhollow | `nfs-fast` | SSD (12.2 TB) | Medium-High | Database WAL, hot data, model cache |
|
||||
| Longhorn | `longhorn` | Local SSD | Distributed | Replicated app data |
|
||||
|
||||
## Node Labels
|
||||
@@ -182,6 +182,6 @@ All nodes are resolvable via:
|
||||
* [Khelben Arunsun](https://forgottenrealms.fandom.com/wiki/Khelben_Arunsun)
|
||||
* [Elminster](https://forgottenrealms.fandom.com/wiki/Elminster_Aumar)
|
||||
* [Candlekeep](https://forgottenrealms.fandom.com/wiki/Candlekeep)
|
||||
* [Neverwinter](https://forgottenrealms.fandom.com/wiki/Neverwinter)
|
||||
* [Gravenhollow](https://forgottenrealms.fandom.com/wiki/Gravenhollow)
|
||||
* Related: [ADR-0035](0035-arm64-worker-strategy.md) - ARM64 Worker Strategy
|
||||
* Related: [ADR-0011](0011-kuberay-unified-serving.md) - KubeRay Unified Serving
|
||||
|
||||
@@ -59,7 +59,7 @@ Chosen option: **Option 1 — External Ray worker on macOS**, because Ray native
|
||||
* Network dependency — if waterdeep sleeps or disconnects, Ray tasks on it fail
|
||||
* MPS backend has limited operator coverage compared to CUDA/ROCm
|
||||
* Python environment must be maintained separately (not in a container image)
|
||||
* No Longhorn storage — model cache managed locally or via NFS mount
|
||||
* No Longhorn storage — model cache managed locally or via NFS mount from gravenhollow (nfs-fast)
|
||||
* Monitoring not automatically scraped by Prometheus (needs node-exporter or push gateway)
|
||||
|
||||
## Pros and Cons of the Options
|
||||
@@ -125,7 +125,7 @@ Chosen option: **Option 1 — External Ray worker on macOS**, because Ray native
|
||||
│ │ └── Training: LoRA/QLoRA fine-tuning via Ray Train │ │
|
||||
│ └──────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Model cache: ~/Library/Caches/huggingface + NFS mount │
|
||||
│ Model cache: ~/Library/Caches/huggingface + NFS mount (gravenhollow) │
|
||||
└──────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
@@ -233,15 +233,15 @@ launchctl load ~/Library/LaunchAgents/io.ray.worker.plist
|
||||
|
||||
### 5. Model Cache via NFS
|
||||
|
||||
Mount the NAS model cache on waterdeep so models are shared with the cluster:
|
||||
Mount the gravenhollow NFS share on waterdeep so models are shared with the cluster via the fast all-SSD NAS:
|
||||
|
||||
```bash
|
||||
# Mount candlekeep NFS share
|
||||
sudo mount -t nfs candlekeep.lab.daviestechlabs.io:/volume1/models \
|
||||
# Mount gravenhollow NFS share (all-SSD, dual 10GbE)
|
||||
sudo mount -t nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models \
|
||||
/Volumes/model-cache
|
||||
|
||||
# Or add to /etc/fstab for persistence
|
||||
# candlekeep.lab.daviestechlabs.io:/volume1/models /Volumes/model-cache nfs rw 0 0
|
||||
# gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models /Volumes/model-cache nfs rw 0 0
|
||||
|
||||
# Symlink to HuggingFace cache location
|
||||
ln -s /Volumes/model-cache ~/.cache/huggingface/hub
|
||||
@@ -315,6 +315,7 @@ caffeinate -s ray start --address=... --block
|
||||
* Ray's GCS port (6379) will be exposed outside the cluster — restrict with firewall rules to waterdeep's IP only
|
||||
* The Ray worker has no RBAC — it executes whatever tasks the head assigns
|
||||
* Model weights on NFS are read-only from waterdeep (mount with `ro` option if possible)
|
||||
* NFS traffic to gravenhollow traverses the LAN — ensure dual 10GbE links are active
|
||||
* Consider Tailscale or WireGuard for encrypted transport if the Ray GCS traffic crosses untrusted network segments
|
||||
|
||||
## Future Considerations
|
||||
|
||||
@@ -31,8 +31,8 @@ flowchart TB
|
||||
end
|
||||
|
||||
subgraph Infrastructure["🏰 Locations (Off-Cluster Infrastructure)"]
|
||||
Candlekeep["📚 candlekeep<br/>Synology NAS<br/>nfs-default<br/><i>Library Fortress</i>"]
|
||||
Neverwinter["❄️ neverwinter<br/>TrueNAS Scale (SSD)<br/>nfs-fast<br/><i>Jewel of the North</i>"]
|
||||
Candlekeep["📚 candlekeep<br/>QNAP NAS<br/>nfs-slow<br/><i>Library Fortress</i>"]
|
||||
Gravenhollow["🪨 gravenhollow<br/>TrueNAS Scale (SSD)<br/>nfs-fast · 12.2 TB<br/><i>Living Memory</i>"]
|
||||
Waterdeep["🏙️ waterdeep<br/>Mac Mini<br/>Dev Workstation<br/><i>City of Splendors</i>"]
|
||||
end
|
||||
|
||||
@@ -44,7 +44,7 @@ flowchart TB
|
||||
end
|
||||
|
||||
ControlPlane -.->|"etcd"| ControlPlane
|
||||
Wizards -.->|"Fast Storage"| Neverwinter
|
||||
Wizards -.->|"Fast Storage"| Gravenhollow
|
||||
Wizards -.->|"Backups"| Candlekeep
|
||||
Rogues -.->|"NFS Mounts"| Candlekeep
|
||||
Fighters -.->|"NFS Mounts"| Candlekeep
|
||||
@@ -60,5 +60,5 @@ flowchart TB
|
||||
class Khelben,Elminster,Drizzt,Danilo,Regis wizard
|
||||
class Durnan,Elaith,Jarlaxle,Mirt,Volo rogue
|
||||
class Wulfgar fighter
|
||||
class Candlekeep,Neverwinter,Waterdeep location
|
||||
class Candlekeep,Gravenhollow,Waterdeep location
|
||||
class AI,Edge,Compute,Storage workload
|
||||
|
||||
@@ -23,7 +23,7 @@ flowchart TB
|
||||
MinIO["MinIO<br/>On-premises S3"]
|
||||
end
|
||||
subgraph Secondary["Secondary: NFS"]
|
||||
NAS["Synology NAS<br/>Long-term retention"]
|
||||
NAS["QNAP NAS<br/>Long-term retention"]
|
||||
end
|
||||
end
|
||||
|
||||
|
||||
Reference in New Issue
Block a user