ADR-0062: replace Cloudflare R2 with RustFS via Cloudflare Tunnel
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m8s

gravenhollow RustFS is already S3-compatible — expose it through
the existing Cloudflare Tunnel with a dedicated HTTPRoute at
assets.daviestechlabs.io. Cloudflare CDN caches at edge PoPs.

Eliminates: R2 bucket, rclone sync CronJob, R2 API token, and
6-hour sync delay. Single source of truth on gravenhollow.
This commit is contained in:
2026-02-21 16:17:20 -05:00
parent 9fe12e0cff
commit 654b7ae774

View File

@@ -3,7 +3,7 @@
* Status: proposed * Status: proposed
* Date: 2026-02-21 * Date: 2026-02-21
* Deciders: Billy * Deciders: Billy
* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare R2 CDN * Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare-cached RustFS
## Context and Problem Statement ## Context and Problem Statement
@@ -23,8 +23,9 @@ How do we streamline custom 3D avatar creation for companions-frontend with AI a
* Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend * Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend
* VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL * VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL
* Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS * Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS
* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-backed path for multi-MB VRM downloads * Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-cached path for multi-MB VRM downloads
* Models are write-once/read-many — ideal for aggressive caching * Models are write-once/read-many — ideal for aggressive caching
* gravenhollow already runs RustFS (S3-compatible) — exposing it via Cloudflare Tunnel gives CDN caching without a separate storage tier
## Considered Options ## Considered Options
@@ -45,7 +46,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M
* Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed * Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed
* No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist * No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist
* LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms * LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms
* Remote performance — Cloudflare R2 CDN with zero egress fees and 300+ global PoPs for remote users via Cloudflare Tunnel * Remote performance — RustFS exposed through Cloudflare Tunnel with CDN caching at 300+ global PoPs; no separate storage tier needed
* Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library * Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library
* VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend * VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend
* Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked * Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked
@@ -57,8 +58,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M
* VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed * VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed
* Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed * Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed
* Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy * Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy
* Cloudflare R2 sync adds a CronJob and requires a Cloudflare R2 API token in Vault * Cache misses from remote users hit gravenhollow via the tunnel — negligible with immutable files and long TTLs
* Two-hop promotion path (Quobyte S3 → gravenhollow NFS → Cloudflare R2) adds operational steps
## Architecture ## Architecture
@@ -128,23 +128,24 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M
│ └── animations/ (shared animation clips) │ │ └── animations/ (shared animation clips) │
│ │ │ │
│ S3 (RustFS): avatar-models bucket │ │ S3 (RustFS): avatar-models bucket │
│ (mirror of NFS dir for Cloudflare R2 sync) │ (same data as NFS dir, served via S3 API for Cloudflare Tunnel)
└──────────┬─────────────────────────────────┬────────────────────────────┘ └──────────┬─────────────────────────────────┬────────────────────────────┘
│ │ │ │
NFS mount (nfs-fast) rclone sync (cron) NFS mount (nfs-fast) S3 API (RustFS :30292)
for pod volume via Cloudflare Tunnel
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────────────────┐ ┌──────────────────────────┐ ┌──────────────────────────────────────────┐
│ companions-frontend │ │ Cloudflare R2 │ companions-frontend │ │ Cloudflare Tunnel + CDN
│ (Kubernetes pod) │ │ Bucket: avatar-models │ (Kubernetes pod) │ │
│ │ │ │ │ │ assets.daviestechlabs.io
│ /models/ volume mount │ │ Custom domain: │ /models/ volume mount │ │ → envoy-external
│ (nfs-fast PVC, RO) │ │ assets.daviestechlabs.io/models/ │ (nfs-fast PVC, RO) │ │ → avatar-assets-svc (in-cluster)
│ │ │ │ │ │ → gravenhollow RustFS :30292
│ Go FileServer: │ │ Cache-Control: public, max-age=31536000 │ Go FileServer: │ │
│ /assets/models/ → │ │ (immutable, versioned filenames) │ /assets/models/ → │ │ Cloudflare CDN caches at 300+ PoPs
│ serves from PVC │ │ │ serves from PVC │ │ Cache-Control: public, max-age=31536000
│ │ │ Free egress (no bandwidth charges) │ │ │ │ (immutable, versioned filenames)
└──────────┬───────────────┘ └──────────────────────┬───────────────────┘ └──────────┬───────────────┘ └──────────────────────┬───────────────────┘
│ │ │ │
LAN clients Remote clients LAN clients Remote clients
@@ -158,7 +159,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M
│ AvatarManager.loadModel('/assets/models/Companion-A.vrm') │ │ AvatarManager.loadModel('/assets/models/Companion-A.vrm') │
│ │ │ │
│ LAN: fetch from companions-frontend pod (NFS-backed, ~10GbE) │ │ LAN: fetch from companions-frontend pod (NFS-backed, ~10GbE) │
│ Remote: fetch from Cloudflare R2 CDN (cache-hit, global PoPs) │ Remote: fetch from assets.daviestechlabs.io (Cloudflare CDN-cached)
└─────────────────────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────────────────────┘
``` ```
@@ -222,7 +223,7 @@ When the Kasm session is accessed remotely, set `BLENDER_HOST` to the Kasm works
| **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) | | **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) |
| **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo | | **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo |
| **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed | | **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed |
| **CDN sync** | CronJob `rclone sync` from gravenhollow RustFS `avatar-models` bucket → Cloudflare R2 `avatar-models` bucket | | **CDN** | Model immediately available via `assets.daviestechlabs.io` — Cloudflare Tunnel proxies to RustFS, CDN caches at edge |
### 5. Deployment and Storage Architecture ### 5. Deployment and Storage Architecture
@@ -261,74 +262,75 @@ Benefits:
- **ReadOnlyMany** — multiple replicas can share the same PVC - **ReadOnlyMany** — multiple replicas can share the same PVC
- Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow - Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow
#### Remote Serving (Cloudflare R2 CDN) #### Remote Serving (Cloudflare-cached RustFS)
Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Serving multi-MB VRM files through the tunnel works but adds latency and consumes tunnel bandwidth. Cloudflare R2 provides a better path: Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Rather than duplicating assets to a separate storage tier (e.g., Cloudflare R2), gravenhollow's RustFS S3 endpoint is exposed directly through the Cloudflare Tunnel with a dedicated hostname. Cloudflare's CDN automatically caches responses at edge PoPs — since VRM files are immutable with year-long TTLs, virtually all requests are served from cache.
| | | | | |
|---|---| |---|---|
| **Bucket** | `avatar-models` on Cloudflare R2 | | **Origin** | gravenhollow RustFS `avatar-models` bucket (`:30292`, same data as NFS dir) |
| **Custom domain** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) | | **Public hostname** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) |
| **Free egress** | R2 has zero egress fees — ideal for large binary assets | | **Tunnel routing** | Cloudflare Tunnel → `envoy-external``avatar-assets-svc` → gravenhollow RustFS |
| **Cache** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` | | **CDN caching** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` |
| **Sync** | CronJob in cluster: `rclone sync gravenhollow-s3:avatar-models r2:avatar-models --checksum` | | **Egress** | Cloudflare-proxied traffic has no bandwidth surcharge |
| **Auth** | Public read (models are not sensitive); write via R2 API token in Vault | | **Auth** | Public read (models are not sensitive); RustFS write credentials stay internal |
| **No sync needed** | Single source of truth — NFS and RustFS serve the same data from gravenhollow |
##### R2 Sync CronJob ##### In-Cluster Proxy Service
An ExternalName or Endpoints service proxies cluster traffic to gravenhollow's RustFS endpoint so the HTTPRoute can reference it:
```yaml ```yaml
apiVersion: batch/v1 # Service pointing to gravenhollow RustFS for avatar assets
kind: CronJob apiVersion: v1
kind: Service
metadata: metadata:
name: avatar-models-r2-sync name: avatar-assets
namespace: ai-ml namespace: ai-ml
spec: spec:
schedule: "0 */6 * * *" # Every 6 hours type: ExternalName
jobTemplate: externalName: gravenhollow.lab.daviestechlabs.io
spec: ports:
template: - port: 30292
spec: protocol: TCP
containers:
- name: sync
image: rclone/rclone:1.68
command:
- rclone
- sync
- gravenhollow-s3:avatar-models
- r2:avatar-models
- --checksum
- --transfers=4
- -v
volumeMounts:
- name: rclone-config
mountPath: /config/rclone
readOnly: true
volumes:
- name: rclone-config
secret:
secretName: rclone-r2-config
restartPolicy: OnFailure
``` ```
##### rclone Config (ExternalSecret from Vault) ##### HTTPRoute (Cloudflare Tunnel → RustFS)
```ini ```yaml
[gravenhollow-s3] apiVersion: gateway.networking.k8s.io/v1
type = s3 kind: HTTPRoute
provider = Other metadata:
endpoint = https://gravenhollow.lab.daviestechlabs.io:30292 name: avatar-assets
access_key_id = <from-vault> namespace: ai-ml
secret_access_key = <from-vault> annotations:
external-dns.alpha.kubernetes.io/hostname: assets.daviestechlabs.io
[r2] spec:
type = s3 hostnames:
provider = Cloudflare - assets.daviestechlabs.io
endpoint = https://<account-id>.r2.cloudflarestorage.com parentRefs:
access_key_id = <from-vault> - name: envoy-external
secret_access_key = <from-vault> namespace: network
region = auto rules:
- matches:
- path:
type: PathPrefix
value: /avatar-models/
backendRefs:
- name: avatar-assets
port: 30292
filters:
- type: ResponseHeaderModifier
responseHeaderModifier:
set:
- name: Cache-Control
value: "public, max-age=31536000, immutable"
- name: Access-Control-Allow-Origin
value: "https://companions-chat.daviestechlabs.io"
``` ```
Cloudflare Tunnel picks up `assets.daviestechlabs.io` via the existing wildcard ingress rule (`*.daviestechlabs.io → envoy-external`). The CDN caches based on the `Cache-Control` header — after the first request per PoP, all subsequent loads are served from Cloudflare's edge.
##### Client-Side Routing ##### Client-Side Routing
The frontend detects whether the user is on LAN or remote and routes model fetches accordingly: The frontend detects whether the user is on LAN or remote and routes model fetches accordingly:
@@ -337,11 +339,11 @@ The frontend detects whether the user is on LAN or remote and routes model fetch
// avatar.js — model URL resolution // avatar.js — model URL resolution
function resolveModelURL(path) { function resolveModelURL(path) {
// LAN users: serve from the Go server (NFS-backed, same origin) // LAN users: serve from the Go server (NFS-backed, same origin)
// Remote users: serve from Cloudflare R2 CDN // Remote users: serve from Cloudflare-cached RustFS
const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io'); const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io');
if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm
return `https://assets.daviestechlabs.io${path.replace('/assets', '')}`; return `https://assets.daviestechlabs.io/avatar-models/${path.split('/').pop()}`;
// → https://assets.daviestechlabs.io/models/Companion-A.vrm // → https://assets.daviestechlabs.io/avatar-models/Companion-A.vrm
} }
``` ```
@@ -360,8 +362,7 @@ VRM files are immutable once promoted — updated models get a new filename (e.g
|----------|---------|------|--------| |----------|---------|------|--------|
| Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only | | Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only |
| gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN | | gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN |
| gravenhollow RustFS S3 (`avatar-models`) | Mirror of NFS dir for R2 sync source | S3 API | CronJob rclone | | gravenhollow RustFS S3 (`avatar-models`) | Same data as NFS, exposed to Cloudflare Tunnel for remote users | S3 API via HTTPRoute | Cloudflare CDN-cached, global |
| Cloudflare R2 (`avatar-models`) | CDN-served copy for remote users | R2 public bucket | Global, zero egress fees |
## BlenderMCP Capabilities Used ## BlenderMCP Capabilities Used
@@ -382,26 +383,28 @@ VRM files are immutable once promoted — updated models get a new filename (e.g
* **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config. * **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config.
* **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted. * **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted.
* **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required. * **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required.
* **R2 credentials:** Cloudflare R2 API token stored in Vault (`kv/data/cloudflare-r2`), accessed via ExternalSecret by the sync CronJob. Write-only scope — the CronJob uploads but cannot delete the bucket. * **RustFS exposure:** The `avatar-models` RustFS bucket is exposed read-only through Cloudflare Tunnel. RustFS write credentials remain internal. The HTTPRoute only routes GET requests to the bucket path — no write operations are reachable externally.
* **Public R2 bucket:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. R2 bucket is read-only public via custom domain; write access requires the API token. * **Public assets:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. CORS restricts to `companions-chat.daviestechlabs.io` origin.
* **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change. * **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change.
## Pros and Cons of the Options ## Pros and Cons of the Options
### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow NFS + Cloudflare R2 ### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow (NFS + RustFS via Cloudflare)
* Good, because AI-assisted modelling reduces manual effort for avatar creation * Good, because AI-assisted modelling reduces manual effort for avatar creation
* Good, because assets persist in S3 across sessions and are accessible from CI * Good, because assets persist in S3 across sessions and are accessible from CI
* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, and Cloudflare are already deployed * Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, Cloudflare Tunnel are all already deployed
* Good, because VS Code MCP integration means one editor for code and 3D work * Good, because VS Code MCP integration means one editor for code and 3D work
* Good, because Kasm sandboxes Blender execution away from the cluster * Good, because Kasm sandboxes Blender execution away from the cluster
* Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models) * Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models)
* Good, because Cloudflare R2 has zero egress fees and global CDN caching for remote users * Good, because RustFS through Cloudflare Tunnel provides CDN caching with zero additional storage tiers — no R2 bucket, no sync CronJob, no extra credentials
* Good, because single source of truth — gravenhollow serves both LAN (NFS) and remote (RustFS → Cloudflare CDN) from the same data
* Good, because immutable versioned filenames enable aggressive caching and trivial rollback * Good, because immutable versioned filenames enable aggressive caching and trivial rollback
* Good, because models are available to remote users immediately after promotion (no sync delay)
* Bad, because BlenderMCP is a third-party tool with arbitrary code execution * Bad, because BlenderMCP is a third-party tool with arbitrary code execution
* Bad, because socket communication adds latency for remote Kasm sessions * Bad, because socket communication adds latency for remote Kasm sessions
* Bad, because VRM rigging quality may require manual adjustment after AI scaffolding * Bad, because VRM rigging quality may require manual adjustment after AI scaffolding
* Bad, because two-hop promotion path adds operational complexity * Bad, because cache misses hit gravenhollow via the tunnel (negligible with immutable files + long TTLs)
### Option 2 — Local Blender + BlenderMCP on developer laptop ### Option 2 — Local Blender + BlenderMCP on developer laptop
@@ -440,5 +443,5 @@ VRM files are immutable once promoted — updated models get a new filename (e.g
* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend) * [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend)
* [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures) * [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures)
* [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation) * [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation)
* [Cloudflare R2 Documentation](https://developers.cloudflare.com/r2/) * [Cloudflare Tunnel Docs](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/)
* [Cloudflare R2 Custom Domains](https://developers.cloudflare.com/r2/buckets/public-buckets/#custom-domains) * [Cloudflare CDN Cache Rules](https://developers.cloudflare.com/cache/)