ADR-0062: replace Cloudflare R2 with RustFS via Cloudflare Tunnel
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m8s

gravenhollow RustFS is already S3-compatible — expose it through
the existing Cloudflare Tunnel with a dedicated HTTPRoute at
assets.daviestechlabs.io. Cloudflare CDN caches at edge PoPs.

Eliminates: R2 bucket, rclone sync CronJob, R2 API token, and
6-hour sync delay. Single source of truth on gravenhollow.
This commit is contained in:
2026-02-21 16:17:20 -05:00
parent 9fe12e0cff
commit 654b7ae774

View File

@@ -3,7 +3,7 @@
* Status: proposed
* Date: 2026-02-21
* Deciders: Billy
* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare R2 CDN
* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare-cached RustFS
## Context and Problem Statement
@@ -23,8 +23,9 @@ How do we streamline custom 3D avatar creation for companions-frontend with AI a
* Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend
* VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL
* Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS
* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-backed path for multi-MB VRM downloads
* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-cached path for multi-MB VRM downloads
* Models are write-once/read-many — ideal for aggressive caching
* gravenhollow already runs RustFS (S3-compatible) — exposing it via Cloudflare Tunnel gives CDN caching without a separate storage tier
## Considered Options
@@ -45,7 +46,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M
* Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed
* No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist
* LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms
* Remote performance — Cloudflare R2 CDN with zero egress fees and 300+ global PoPs for remote users via Cloudflare Tunnel
* Remote performance — RustFS exposed through Cloudflare Tunnel with CDN caching at 300+ global PoPs; no separate storage tier needed
* Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library
* VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend
* Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked
@@ -57,8 +58,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M
* VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed
* Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed
* Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy
* Cloudflare R2 sync adds a CronJob and requires a Cloudflare R2 API token in Vault
* Two-hop promotion path (Quobyte S3 → gravenhollow NFS → Cloudflare R2) adds operational steps
* Cache misses from remote users hit gravenhollow via the tunnel — negligible with immutable files and long TTLs
## Architecture
@@ -128,23 +128,24 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M
│ └── animations/ (shared animation clips) │
│ │
│ S3 (RustFS): avatar-models bucket │
│ (mirror of NFS dir for Cloudflare R2 sync)
│ (same data as NFS dir, served via S3 API for Cloudflare Tunnel)
└──────────┬─────────────────────────────────┬────────────────────────────┘
│ │
NFS mount (nfs-fast) rclone sync (cron)
NFS mount (nfs-fast) S3 API (RustFS :30292)
for pod volume via Cloudflare Tunnel
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────────────────┐
│ companions-frontend │ │ Cloudflare R2
│ (Kubernetes pod) │ │ Bucket: avatar-models
│ │ │
│ /models/ volume mount │ │ Custom domain:
│ (nfs-fast PVC, RO) │ │ assets.daviestechlabs.io/models/
│ │ │
│ Go FileServer: │ │ Cache-Control: public, max-age=31536000
│ /assets/models/ → │ │ (immutable, versioned filenames)
│ serves from PVC │ │
│ │ │ Free egress (no bandwidth charges) │
│ companions-frontend │ │ Cloudflare Tunnel + CDN
│ (Kubernetes pod) │ │
│ │ │ assets.daviestechlabs.io
│ /models/ volume mount │ │ → envoy-external
│ (nfs-fast PVC, RO) │ │ → avatar-assets-svc (in-cluster)
│ │ │ → gravenhollow RustFS :30292
│ Go FileServer: │ │
│ /assets/models/ → │ │ Cloudflare CDN caches at 300+ PoPs
│ serves from PVC │ │ Cache-Control: public, max-age=31536000
│ │ │ (immutable, versioned filenames)
└──────────┬───────────────┘ └──────────────────────┬───────────────────┘
│ │
LAN clients Remote clients
@@ -158,7 +159,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M
│ AvatarManager.loadModel('/assets/models/Companion-A.vrm') │
│ │
│ LAN: fetch from companions-frontend pod (NFS-backed, ~10GbE) │
│ Remote: fetch from Cloudflare R2 CDN (cache-hit, global PoPs)
│ Remote: fetch from assets.daviestechlabs.io (Cloudflare CDN-cached)
└─────────────────────────────────────────────────────────────────────────┘
```
@@ -222,7 +223,7 @@ When the Kasm session is accessed remotely, set `BLENDER_HOST` to the Kasm works
| **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) |
| **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo |
| **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed |
| **CDN sync** | CronJob `rclone sync` from gravenhollow RustFS `avatar-models` bucket → Cloudflare R2 `avatar-models` bucket |
| **CDN** | Model immediately available via `assets.daviestechlabs.io` — Cloudflare Tunnel proxies to RustFS, CDN caches at edge |
### 5. Deployment and Storage Architecture
@@ -261,74 +262,75 @@ Benefits:
- **ReadOnlyMany** — multiple replicas can share the same PVC
- Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow
#### Remote Serving (Cloudflare R2 CDN)
#### Remote Serving (Cloudflare-cached RustFS)
Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Serving multi-MB VRM files through the tunnel works but adds latency and consumes tunnel bandwidth. Cloudflare R2 provides a better path:
Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Rather than duplicating assets to a separate storage tier (e.g., Cloudflare R2), gravenhollow's RustFS S3 endpoint is exposed directly through the Cloudflare Tunnel with a dedicated hostname. Cloudflare's CDN automatically caches responses at edge PoPs — since VRM files are immutable with year-long TTLs, virtually all requests are served from cache.
| | |
|---|---|
| **Bucket** | `avatar-models` on Cloudflare R2 |
| **Custom domain** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) |
| **Free egress** | R2 has zero egress fees — ideal for large binary assets |
| **Cache** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` |
| **Sync** | CronJob in cluster: `rclone sync gravenhollow-s3:avatar-models r2:avatar-models --checksum` |
| **Auth** | Public read (models are not sensitive); write via R2 API token in Vault |
| **Origin** | gravenhollow RustFS `avatar-models` bucket (`:30292`, same data as NFS dir) |
| **Public hostname** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) |
| **Tunnel routing** | Cloudflare Tunnel → `envoy-external``avatar-assets-svc` → gravenhollow RustFS |
| **CDN caching** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` |
| **Egress** | Cloudflare-proxied traffic has no bandwidth surcharge |
| **Auth** | Public read (models are not sensitive); RustFS write credentials stay internal |
| **No sync needed** | Single source of truth — NFS and RustFS serve the same data from gravenhollow |
##### R2 Sync CronJob
##### In-Cluster Proxy Service
An ExternalName or Endpoints service proxies cluster traffic to gravenhollow's RustFS endpoint so the HTTPRoute can reference it:
```yaml
apiVersion: batch/v1
kind: CronJob
# Service pointing to gravenhollow RustFS for avatar assets
apiVersion: v1
kind: Service
metadata:
name: avatar-models-r2-sync
name: avatar-assets
namespace: ai-ml
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: rclone/rclone:1.68
command:
- rclone
- sync
- gravenhollow-s3:avatar-models
- r2:avatar-models
- --checksum
- --transfers=4
- -v
volumeMounts:
- name: rclone-config
mountPath: /config/rclone
readOnly: true
volumes:
- name: rclone-config
secret:
secretName: rclone-r2-config
restartPolicy: OnFailure
type: ExternalName
externalName: gravenhollow.lab.daviestechlabs.io
ports:
- port: 30292
protocol: TCP
```
##### rclone Config (ExternalSecret from Vault)
##### HTTPRoute (Cloudflare Tunnel → RustFS)
```ini
[gravenhollow-s3]
type = s3
provider = Other
endpoint = https://gravenhollow.lab.daviestechlabs.io:30292
access_key_id = <from-vault>
secret_access_key = <from-vault>
[r2]
type = s3
provider = Cloudflare
endpoint = https://<account-id>.r2.cloudflarestorage.com
access_key_id = <from-vault>
secret_access_key = <from-vault>
region = auto
```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: avatar-assets
namespace: ai-ml
annotations:
external-dns.alpha.kubernetes.io/hostname: assets.daviestechlabs.io
spec:
hostnames:
- assets.daviestechlabs.io
parentRefs:
- name: envoy-external
namespace: network
rules:
- matches:
- path:
type: PathPrefix
value: /avatar-models/
backendRefs:
- name: avatar-assets
port: 30292
filters:
- type: ResponseHeaderModifier
responseHeaderModifier:
set:
- name: Cache-Control
value: "public, max-age=31536000, immutable"
- name: Access-Control-Allow-Origin
value: "https://companions-chat.daviestechlabs.io"
```
Cloudflare Tunnel picks up `assets.daviestechlabs.io` via the existing wildcard ingress rule (`*.daviestechlabs.io → envoy-external`). The CDN caches based on the `Cache-Control` header — after the first request per PoP, all subsequent loads are served from Cloudflare's edge.
##### Client-Side Routing
The frontend detects whether the user is on LAN or remote and routes model fetches accordingly:
@@ -337,11 +339,11 @@ The frontend detects whether the user is on LAN or remote and routes model fetch
// avatar.js — model URL resolution
function resolveModelURL(path) {
// LAN users: serve from the Go server (NFS-backed, same origin)
// Remote users: serve from Cloudflare R2 CDN
// Remote users: serve from Cloudflare-cached RustFS
const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io');
if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm
return `https://assets.daviestechlabs.io${path.replace('/assets', '')}`;
// → https://assets.daviestechlabs.io/models/Companion-A.vrm
return `https://assets.daviestechlabs.io/avatar-models/${path.split('/').pop()}`;
// → https://assets.daviestechlabs.io/avatar-models/Companion-A.vrm
}
```
@@ -360,8 +362,7 @@ VRM files are immutable once promoted — updated models get a new filename (e.g
|----------|---------|------|--------|
| Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only |
| gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN |
| gravenhollow RustFS S3 (`avatar-models`) | Mirror of NFS dir for R2 sync source | S3 API | CronJob rclone |
| Cloudflare R2 (`avatar-models`) | CDN-served copy for remote users | R2 public bucket | Global, zero egress fees |
| gravenhollow RustFS S3 (`avatar-models`) | Same data as NFS, exposed to Cloudflare Tunnel for remote users | S3 API via HTTPRoute | Cloudflare CDN-cached, global |
## BlenderMCP Capabilities Used
@@ -382,26 +383,28 @@ VRM files are immutable once promoted — updated models get a new filename (e.g
* **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config.
* **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted.
* **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required.
* **R2 credentials:** Cloudflare R2 API token stored in Vault (`kv/data/cloudflare-r2`), accessed via ExternalSecret by the sync CronJob. Write-only scope — the CronJob uploads but cannot delete the bucket.
* **Public R2 bucket:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. R2 bucket is read-only public via custom domain; write access requires the API token.
* **RustFS exposure:** The `avatar-models` RustFS bucket is exposed read-only through Cloudflare Tunnel. RustFS write credentials remain internal. The HTTPRoute only routes GET requests to the bucket path — no write operations are reachable externally.
* **Public assets:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. CORS restricts to `companions-chat.daviestechlabs.io` origin.
* **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change.
## Pros and Cons of the Options
### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow NFS + Cloudflare R2
### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow (NFS + RustFS via Cloudflare)
* Good, because AI-assisted modelling reduces manual effort for avatar creation
* Good, because assets persist in S3 across sessions and are accessible from CI
* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, and Cloudflare are already deployed
* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, Cloudflare Tunnel are all already deployed
* Good, because VS Code MCP integration means one editor for code and 3D work
* Good, because Kasm sandboxes Blender execution away from the cluster
* Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models)
* Good, because Cloudflare R2 has zero egress fees and global CDN caching for remote users
* Good, because RustFS through Cloudflare Tunnel provides CDN caching with zero additional storage tiers — no R2 bucket, no sync CronJob, no extra credentials
* Good, because single source of truth — gravenhollow serves both LAN (NFS) and remote (RustFS → Cloudflare CDN) from the same data
* Good, because immutable versioned filenames enable aggressive caching and trivial rollback
* Good, because models are available to remote users immediately after promotion (no sync delay)
* Bad, because BlenderMCP is a third-party tool with arbitrary code execution
* Bad, because socket communication adds latency for remote Kasm sessions
* Bad, because VRM rigging quality may require manual adjustment after AI scaffolding
* Bad, because two-hop promotion path adds operational complexity
* Bad, because cache misses hit gravenhollow via the tunnel (negligible with immutable files + long TTLs)
### Option 2 — Local Blender + BlenderMCP on developer laptop
@@ -440,5 +443,5 @@ VRM files are immutable once promoted — updated models get a new filename (e.g
* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend)
* [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures)
* [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation)
* [Cloudflare R2 Documentation](https://developers.cloudflare.com/r2/)
* [Cloudflare R2 Custom Domains](https://developers.cloudflare.com/r2/buckets/public-buckets/#custom-domains)
* [Cloudflare Tunnel Docs](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/)
* [Cloudflare CDN Cache Rules](https://developers.cloudflare.com/cache/)