diff --git a/decisions/0062-blender-mcp-3d-avatar-workflow.md b/decisions/0062-blender-mcp-3d-avatar-workflow.md index d8c4689..266c0ef 100644 --- a/decisions/0062-blender-mcp-3d-avatar-workflow.md +++ b/decisions/0062-blender-mcp-3d-avatar-workflow.md @@ -3,7 +3,7 @@ * Status: proposed * Date: 2026-02-21 * Deciders: Billy -* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare R2 CDN +* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare-cached RustFS ## Context and Problem Statement @@ -23,8 +23,9 @@ How do we streamline custom 3D avatar creation for companions-frontend with AI a * Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend * VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL * Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS -* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-backed path for multi-MB VRM downloads +* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-cached path for multi-MB VRM downloads * Models are write-once/read-many — ideal for aggressive caching +* gravenhollow already runs RustFS (S3-compatible) — exposing it via Cloudflare Tunnel gives CDN caching without a separate storage tier ## Considered Options @@ -45,7 +46,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M * Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed * No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist * LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms -* Remote performance — Cloudflare R2 CDN with zero egress fees and 300+ global PoPs for remote users via Cloudflare Tunnel +* Remote performance — RustFS exposed through Cloudflare Tunnel with CDN caching at 300+ global PoPs; no separate storage tier needed * Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library * VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend * Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked @@ -57,8 +58,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M * VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed * Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed * Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy -* Cloudflare R2 sync adds a CronJob and requires a Cloudflare R2 API token in Vault -* Two-hop promotion path (Quobyte S3 → gravenhollow NFS → Cloudflare R2) adds operational steps +* Cache misses from remote users hit gravenhollow via the tunnel — negligible with immutable files and long TTLs ## Architecture @@ -128,23 +128,24 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M │ └── animations/ (shared animation clips) │ │ │ │ S3 (RustFS): avatar-models bucket │ -│ (mirror of NFS dir for Cloudflare R2 sync) │ +│ (same data as NFS dir, served via S3 API for Cloudflare Tunnel) │ └──────────┬─────────────────────────────────┬────────────────────────────┘ │ │ - NFS mount (nfs-fast) rclone sync (cron) + NFS mount (nfs-fast) S3 API (RustFS :30292) + for pod volume via Cloudflare Tunnel │ │ ▼ ▼ ┌──────────────────────────┐ ┌──────────────────────────────────────────┐ -│ companions-frontend │ │ Cloudflare R2 │ -│ (Kubernetes pod) │ │ Bucket: avatar-models │ -│ │ │ │ -│ /models/ volume mount │ │ Custom domain: │ -│ (nfs-fast PVC, RO) │ │ assets.daviestechlabs.io/models/ │ -│ │ │ │ -│ Go FileServer: │ │ Cache-Control: public, max-age=31536000 │ -│ /assets/models/ → │ │ (immutable, versioned filenames) │ -│ serves from PVC │ │ │ -│ │ │ Free egress (no bandwidth charges) │ +│ companions-frontend │ │ Cloudflare Tunnel + CDN │ +│ (Kubernetes pod) │ │ │ +│ │ │ assets.daviestechlabs.io │ +│ /models/ volume mount │ │ → envoy-external │ +│ (nfs-fast PVC, RO) │ │ → avatar-assets-svc (in-cluster) │ +│ │ │ → gravenhollow RustFS :30292 │ +│ Go FileServer: │ │ │ +│ /assets/models/ → │ │ Cloudflare CDN caches at 300+ PoPs │ +│ serves from PVC │ │ Cache-Control: public, max-age=31536000 │ +│ │ │ (immutable, versioned filenames) │ └──────────┬───────────────┘ └──────────────────────┬───────────────────┘ │ │ LAN clients Remote clients @@ -158,7 +159,7 @@ Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code M │ AvatarManager.loadModel('/assets/models/Companion-A.vrm') │ │ │ │ LAN: fetch from companions-frontend pod (NFS-backed, ~10GbE) │ -│ Remote: fetch from Cloudflare R2 CDN (cache-hit, global PoPs) │ +│ Remote: fetch from assets.daviestechlabs.io (Cloudflare CDN-cached) │ └─────────────────────────────────────────────────────────────────────────┘ ``` @@ -222,7 +223,7 @@ When the Kasm session is accessed remotely, set `BLENDER_HOST` to the Kasm works | **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) | | **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo | | **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed | -| **CDN sync** | CronJob `rclone sync` from gravenhollow RustFS `avatar-models` bucket → Cloudflare R2 `avatar-models` bucket | +| **CDN** | Model immediately available via `assets.daviestechlabs.io` — Cloudflare Tunnel proxies to RustFS, CDN caches at edge | ### 5. Deployment and Storage Architecture @@ -261,74 +262,75 @@ Benefits: - **ReadOnlyMany** — multiple replicas can share the same PVC - Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow -#### Remote Serving (Cloudflare R2 CDN) +#### Remote Serving (Cloudflare-cached RustFS) -Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Serving multi-MB VRM files through the tunnel works but adds latency and consumes tunnel bandwidth. Cloudflare R2 provides a better path: +Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Rather than duplicating assets to a separate storage tier (e.g., Cloudflare R2), gravenhollow's RustFS S3 endpoint is exposed directly through the Cloudflare Tunnel with a dedicated hostname. Cloudflare's CDN automatically caches responses at edge PoPs — since VRM files are immutable with year-long TTLs, virtually all requests are served from cache. | | | |---|---| -| **Bucket** | `avatar-models` on Cloudflare R2 | -| **Custom domain** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) | -| **Free egress** | R2 has zero egress fees — ideal for large binary assets | -| **Cache** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` | -| **Sync** | CronJob in cluster: `rclone sync gravenhollow-s3:avatar-models r2:avatar-models --checksum` | -| **Auth** | Public read (models are not sensitive); write via R2 API token in Vault | +| **Origin** | gravenhollow RustFS `avatar-models` bucket (`:30292`, same data as NFS dir) | +| **Public hostname** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) | +| **Tunnel routing** | Cloudflare Tunnel → `envoy-external` → `avatar-assets-svc` → gravenhollow RustFS | +| **CDN caching** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` | +| **Egress** | Cloudflare-proxied traffic has no bandwidth surcharge | +| **Auth** | Public read (models are not sensitive); RustFS write credentials stay internal | +| **No sync needed** | Single source of truth — NFS and RustFS serve the same data from gravenhollow | -##### R2 Sync CronJob +##### In-Cluster Proxy Service + +An ExternalName or Endpoints service proxies cluster traffic to gravenhollow's RustFS endpoint so the HTTPRoute can reference it: ```yaml -apiVersion: batch/v1 -kind: CronJob +# Service pointing to gravenhollow RustFS for avatar assets +apiVersion: v1 +kind: Service metadata: - name: avatar-models-r2-sync + name: avatar-assets namespace: ai-ml spec: - schedule: "0 */6 * * *" # Every 6 hours - jobTemplate: - spec: - template: - spec: - containers: - - name: sync - image: rclone/rclone:1.68 - command: - - rclone - - sync - - gravenhollow-s3:avatar-models - - r2:avatar-models - - --checksum - - --transfers=4 - - -v - volumeMounts: - - name: rclone-config - mountPath: /config/rclone - readOnly: true - volumes: - - name: rclone-config - secret: - secretName: rclone-r2-config - restartPolicy: OnFailure + type: ExternalName + externalName: gravenhollow.lab.daviestechlabs.io + ports: + - port: 30292 + protocol: TCP ``` -##### rclone Config (ExternalSecret from Vault) +##### HTTPRoute (Cloudflare Tunnel → RustFS) -```ini -[gravenhollow-s3] -type = s3 -provider = Other -endpoint = https://gravenhollow.lab.daviestechlabs.io:30292 -access_key_id = -secret_access_key = - -[r2] -type = s3 -provider = Cloudflare -endpoint = https://.r2.cloudflarestorage.com -access_key_id = -secret_access_key = -region = auto +```yaml +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: avatar-assets + namespace: ai-ml + annotations: + external-dns.alpha.kubernetes.io/hostname: assets.daviestechlabs.io +spec: + hostnames: + - assets.daviestechlabs.io + parentRefs: + - name: envoy-external + namespace: network + rules: + - matches: + - path: + type: PathPrefix + value: /avatar-models/ + backendRefs: + - name: avatar-assets + port: 30292 + filters: + - type: ResponseHeaderModifier + responseHeaderModifier: + set: + - name: Cache-Control + value: "public, max-age=31536000, immutable" + - name: Access-Control-Allow-Origin + value: "https://companions-chat.daviestechlabs.io" ``` +Cloudflare Tunnel picks up `assets.daviestechlabs.io` via the existing wildcard ingress rule (`*.daviestechlabs.io → envoy-external`). The CDN caches based on the `Cache-Control` header — after the first request per PoP, all subsequent loads are served from Cloudflare's edge. + ##### Client-Side Routing The frontend detects whether the user is on LAN or remote and routes model fetches accordingly: @@ -337,11 +339,11 @@ The frontend detects whether the user is on LAN or remote and routes model fetch // avatar.js — model URL resolution function resolveModelURL(path) { // LAN users: serve from the Go server (NFS-backed, same origin) - // Remote users: serve from Cloudflare R2 CDN + // Remote users: serve from Cloudflare-cached RustFS const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io'); if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm - return `https://assets.daviestechlabs.io${path.replace('/assets', '')}`; - // → https://assets.daviestechlabs.io/models/Companion-A.vrm + return `https://assets.daviestechlabs.io/avatar-models/${path.split('/').pop()}`; + // → https://assets.daviestechlabs.io/avatar-models/Companion-A.vrm } ``` @@ -360,8 +362,7 @@ VRM files are immutable once promoted — updated models get a new filename (e.g |----------|---------|------|--------| | Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only | | gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN | -| gravenhollow RustFS S3 (`avatar-models`) | Mirror of NFS dir for R2 sync source | S3 API | CronJob rclone | -| Cloudflare R2 (`avatar-models`) | CDN-served copy for remote users | R2 public bucket | Global, zero egress fees | +| gravenhollow RustFS S3 (`avatar-models`) | Same data as NFS, exposed to Cloudflare Tunnel for remote users | S3 API via HTTPRoute | Cloudflare CDN-cached, global | ## BlenderMCP Capabilities Used @@ -382,26 +383,28 @@ VRM files are immutable once promoted — updated models get a new filename (e.g * **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config. * **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted. * **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required. -* **R2 credentials:** Cloudflare R2 API token stored in Vault (`kv/data/cloudflare-r2`), accessed via ExternalSecret by the sync CronJob. Write-only scope — the CronJob uploads but cannot delete the bucket. -* **Public R2 bucket:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. R2 bucket is read-only public via custom domain; write access requires the API token. +* **RustFS exposure:** The `avatar-models` RustFS bucket is exposed read-only through Cloudflare Tunnel. RustFS write credentials remain internal. The HTTPRoute only routes GET requests to the bucket path — no write operations are reachable externally. +* **Public assets:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. CORS restricts to `companions-chat.daviestechlabs.io` origin. * **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change. ## Pros and Cons of the Options -### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow NFS + Cloudflare R2 +### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow (NFS + RustFS via Cloudflare) * Good, because AI-assisted modelling reduces manual effort for avatar creation * Good, because assets persist in S3 across sessions and are accessible from CI -* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, and Cloudflare are already deployed +* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, Cloudflare Tunnel are all already deployed * Good, because VS Code MCP integration means one editor for code and 3D work * Good, because Kasm sandboxes Blender execution away from the cluster * Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models) -* Good, because Cloudflare R2 has zero egress fees and global CDN caching for remote users +* Good, because RustFS through Cloudflare Tunnel provides CDN caching with zero additional storage tiers — no R2 bucket, no sync CronJob, no extra credentials +* Good, because single source of truth — gravenhollow serves both LAN (NFS) and remote (RustFS → Cloudflare CDN) from the same data * Good, because immutable versioned filenames enable aggressive caching and trivial rollback +* Good, because models are available to remote users immediately after promotion (no sync delay) * Bad, because BlenderMCP is a third-party tool with arbitrary code execution * Bad, because socket communication adds latency for remote Kasm sessions * Bad, because VRM rigging quality may require manual adjustment after AI scaffolding -* Bad, because two-hop promotion path adds operational complexity +* Bad, because cache misses hit gravenhollow via the tunnel (negligible with immutable files + long TTLs) ### Option 2 — Local Blender + BlenderMCP on developer laptop @@ -440,5 +443,5 @@ VRM files are immutable once promoted — updated models get a new filename (e.g * [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend) * [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures) * [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation) -* [Cloudflare R2 Documentation](https://developers.cloudflare.com/r2/) -* [Cloudflare R2 Custom Domains](https://developers.cloudflare.com/r2/buckets/public-buckets/#custom-domains) +* [Cloudflare Tunnel Docs](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/) +* [Cloudflare CDN Cache Rules](https://developers.cloudflare.com/cache/)