ADR-0062: BlenderMCP 3D avatar workflow with Kasm, gravenhollow NFS, and Cloudflare R2

2026-02-21 16:13:36 -05:00
parent defbd5b2f9
commit 9fe12e0cff
1 changed files with 444 additions and 0 deletions
--- a/decisions/0062-blender-mcp-3d-avatar-workflow.md
+++ b/decisions/0062-blender-mcp-3d-avatar-workflow.md
@@ -0,0 +1,444 @@
+# BlenderMCP for 3D Avatar Creation via Kasm Workstation
+
+* Status: proposed
+* Date: 2026-02-21
+* Deciders: Billy
+* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare R2 CDN
+
+## Context and Problem Statement
+
+The companions-frontend serves VRM avatar models for its Three.js-based 3D character rendering (see [ADR-0046](0046-companions-frontend-architecture.md)). Today the avatar library is limited to three models (`Seed-san.vrm`, `Aka.vrm`, `Midori.vrm`) — only one of which actually ships in the repo — and every model must be sourced or hand-sculpted externally.
+
+Creating custom VRM avatars is a manual, time-intensive process: open Blender, sculpt/rig a character, export to VRM, iterate. There is no integration between the AI coding workflow (VS Code / Copilot) and Blender, so context switching between the editor and the 3D tool is constant.
+
+How do we streamline custom 3D avatar creation for companions-frontend with AI assistance, while keeping assets durable and accessible across workstations?
+
+## Decision Drivers
+
+* The existing avatar pipeline is manual and disconnected from the development workflow
+* BlenderMCP (v1.5.5, 17k+ GitHub stars) bridges AI assistants to Blender via the Model Context Protocol — enabling prompt-driven 3D modelling, material control, scene manipulation, and code execution inside Blender
+* Kasm Workspaces already run in the cluster (`productivity` namespace) and support Docker-in-Docker with volume plugins for persistent storage
+* VS Code supports MCP servers natively (GitHub Copilot agent mode), meaning the same editor used for code can drive Blender scene creation
+* Custom volume mounts in Kasm map `/s3` to S3-compatible storage via the rclone Docker volume plugin — providing durable, off-node persistence
+* Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend
+* VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL
+* Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS
+* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-backed path for multi-MB VRM downloads
+* Models are write-once/read-many — ideal for aggressive caching
+
+## Considered Options
+
+1. **BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3 (`kasm` bucket)**
+2. **Local Blender + BlenderMCP on a developer laptop**
+3. **Hyper3D / Rodin cloud generation only (no Blender)**
+4. **Manual Blender workflow (status quo)**
+
+## Decision Outcome
+
+Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3**, because it integrates AI-assisted modelling directly into the existing Kasm + VS Code workflow, stores assets durably in S3, and requires no additional infrastructure beyond what is already deployed.
+
+### Positive Consequences
+
+* AI-assisted 3D modelling — prompt-driven creation, material application, and scene manipulation inside Blender via MCP
+* Zero context switching — VS Code agent mode drives Blender commands through the same editor used for code
+* Persistent storage — VRM exports written to `/s3` survive session teardown and are available from any Kasm session or CI pipeline
+* Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed
+* No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist
+* LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms
+* Remote performance — Cloudflare R2 CDN with zero egress fees and 300+ global PoPs for remote users via Cloudflare Tunnel
+* Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library
+* VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend
+* Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked
+
+### Negative Consequences
+
+* BlenderMCP `execute_blender_code` tool runs arbitrary Python in Blender — must trust AI-generated code or review before execution
+* Socket-based communication (TCP 9876) between the MCP server and Blender add-on adds a failure mode
+* VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed
+* Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed
+* Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy
+* Cloudflare R2 sync adds a CronJob and requires a Cloudflare R2 API token in Vault
+* Two-hop promotion path (Quobyte S3 → gravenhollow NFS → Cloudflare R2) adds operational steps
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                          Developer Workstation                          │
+│                                                                         │
+│  ┌──────────────────────────────────┐                                   │
+│  │         VS Code (local)          │                                   │
+│  │                                  │                                   │
+│  │  GitHub Copilot (agent mode)     │                                   │
+│  │         │                        │                                   │
+│  │         ▼                        │                                   │
+│  │  BlenderMCP Server (MCP)         │                                   │
+│  │  (uvx blender-mcp)              │                                   │
+│  │         │                        │                                   │
+│  └─────────┼────────────────────────┘                                   │
+│            │ TCP :9876 (JSON over socket)                                │
+└────────────┼────────────────────────────────────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│               Kasm Blender Workstation (browser session)                │
+│               kasm.daviestechlabs.io                                    │
+│                                                                         │
+│  ┌──────────────────────────────────────────────────────┐              │
+│  │                   Blender 4.x                        │              │
+│  │                                                      │              │
+│  │  Add-ons:                                            │              │
+│  │   • BlenderMCP (addon.py) — socket server :9876      │              │
+│  │   • VRM Add-on for Blender — import/export VRM       │              │
+│  │                                                      │              │
+│  │  ┌────────────────────────────────────────────────┐  │              │
+│  │  │  /s3/blender-avatars/                          │  │              │
+│  │  │  ├── projects/          (.blend source files)  │  │              │
+│  │  │  ├── exports/           (.vrm exported models) │  │              │
+│  │  │  └── textures/          (shared texture lib)   │  │              │
+│  │  └────────────────────────────────────────────────┘  │              │
+│  └──────────────────────────────────────────────────────┘              │
+│                          │                                              │
+│                    rclone volume                                        │
+│                    plugin (S3)                                          │
+└──────────────────────────┼──────────────────────────────────────────────┘
+                           │
+                           ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                    Quobyte S3 Endpoint                                   │
+│                    Bucket: kasm                                          │
+│                                                                         │
+│  kasm/blender-avatars/projects/Companion-A.blend                        │
+│  kasm/blender-avatars/exports/Companion-A.vrm                           │
+│  kasm/blender-avatars/textures/skin-tone-01.png                         │
+└──────────────────────────┬──────────────────────────────────────────────┘
+                           │
+                    rclone sync (promotion)
+                           │
+                           ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│            gravenhollow.lab.daviestechlabs.io                            │
+│            (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB)            │
+│                                                                         │
+│  NFS: /mnt/gravenhollow/kubernetes/avatar-models/                       │
+│  ├── Seed-san.vrm          (default model)                              │
+│  ├── Aka.vrm               (Legend tier)                                │
+│  ├── Midori.vrm            (Legend tier)                                │
+│  ├── Companion-A.vrm       (custom, promoted from Kasm S3)             │
+│  └── animations/           (shared animation clips)                     │
+│                                                                         │
+│  S3 (RustFS): avatar-models bucket                                      │
+│  (mirror of NFS dir for Cloudflare R2 sync)                             │
+└──────────┬─────────────────────────────────┬────────────────────────────┘
+           │                                 │
+     NFS mount (nfs-fast)              rclone sync (cron)
+           │                                 │
+           ▼                                 ▼
+┌──────────────────────────┐   ┌──────────────────────────────────────────┐
+│  companions-frontend     │   │         Cloudflare R2                    │
+│  (Kubernetes pod)        │   │    Bucket: avatar-models                 │
+│                          │   │                                          │
+│  /models/ volume mount   │   │  Custom domain:                          │
+│  (nfs-fast PVC, RO)     │   │  assets.daviestechlabs.io/models/        │
+│                          │   │                                          │
+│  Go FileServer:          │   │  Cache-Control: public, max-age=31536000 │
+│  /assets/models/ →       │   │  (immutable, versioned filenames)        │
+│    serves from PVC       │   │                                          │
+│                          │   │  Free egress (no bandwidth charges)      │
+└──────────┬───────────────┘   └──────────────────────┬───────────────────┘
+           │                                          │
+     LAN clients                              Remote clients
+  companions-chat.lab...                   companions-chat via
+  (envoy-internal, direct)                 Cloudflare Tunnel
+           │                                          │
+           └──────────────────┬───────────────────────┘
+                              ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                         Browser (Three.js)                               │
+│  AvatarManager.loadModel('/assets/models/Companion-A.vrm')              │
+│                                                                         │
+│  LAN:    fetch from companions-frontend pod (NFS-backed, ~10GbE)       │
+│  Remote: fetch from Cloudflare R2 CDN (cache-hit, global PoPs)         │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+## Workflow
+
+### 1. Kasm Workspace Setup
+
+The Kasm Blender workspace image is configured with:
+
+| Component | Version | Purpose |
+|-----------|---------|---------|
+| Blender | 4.x | 3D modelling and sculpting |
+| BlenderMCP add-on (`addon.py`) | 1.5.5 | Socket server for MCP commands |
+| VRM Add-on for Blender | latest | Import/export VRM format |
+| Python | 3.10+ | Blender scripting runtime |
+
+The Kasm storage mapping mounts `/s3` via the rclone Docker volume plugin to the Quobyte S3 endpoint (`kasm` bucket). The sub-path `blender-avatars/` is used for all 3D asset work.
+
+### 2. VS Code MCP Configuration
+
+Add BlenderMCP as an MCP server in VS Code (`.vscode/mcp.json` or user settings):
+
+```json
+{
+  "servers": {
+    "blender": {
+      "command": "uvx",
+      "args": ["blender-mcp"],
+      "env": {
+        "BLENDER_HOST": "localhost",
+        "BLENDER_PORT": "9876",
+        "DISABLE_TELEMETRY": "true"
+      }
+    }
+  }
+}
+```
+
+When the Kasm session is accessed remotely, set `BLENDER_HOST` to the Kasm workstation's reachable address.
+
+### 3. Avatar Creation Workflow
+
+1. **Launch** the Kasm Blender workspace via `kasm.daviestechlabs.io`
+2. **Enable** the BlenderMCP add-on in Blender → 3D View sidebar → "BlenderMCP" tab → "Connect to Claude"
+3. **Open VS Code** with Copilot agent mode and the BlenderMCP MCP server running
+4. **Prompt** the AI to create or modify avatars:
+   - _"Create a humanoid character with anime-style proportions, blue hair, and a fantasy outfit"_
+   - _"Apply a metallic gold material to the armor pieces"_
+   - _"Set up the lighting for a character showcase render"_
+   - _"Rig this character for VRM export with standard humanoid bones"_
+5. **Export** the finished model to VRM via the VRM add-on (or via BlenderMCP `execute_blender_code` calling the VRM export operator)
+6. **Save** the `.vrm` to `/s3/blender-avatars/exports/` and the `.blend` source to `/s3/blender-avatars/projects/`
+7. **Import** the VRM into companions-frontend — copy to `assets/models/`, update the allowlists in `internal/database/database.go` and `static/js/avatar.js`
+
+### 4. Asset Pipeline (Kasm S3 → gravenhollow → production)
+
+| Stage | Action |
+|-------|--------|
+| **Create** | AI-assisted modelling + VRM export in Kasm Blender → `/s3/blender-avatars/exports/*.vrm` |
+| **Store** | rclone syncs `/s3` to Quobyte S3 `kasm` bucket automatically |
+| **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) |
+| **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo |
+| **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed |
+| **CDN sync** | CronJob `rclone sync` from gravenhollow RustFS `avatar-models` bucket → Cloudflare R2 `avatar-models` bucket |
+
+### 5. Deployment and Storage Architecture
+
+#### Local Serving (LAN users)
+
+Companions-frontend currently serves VRM models via `http.FileServer(http.Dir("assets"))` from the container filesystem. This bakes models into the image and requires a rebuild to add new avatars.
+
+The new approach mounts avatar models from gravenhollow via an `nfs-fast` PVC:
+
+```yaml
+# PersistentVolumeClaim for avatar models
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: avatar-models
+  namespace: ai-ml
+spec:
+  storageClassName: nfs-fast
+  accessModes: [ReadOnlyMany]
+  resources:
+    requests:
+      storage: 10Gi
+```
+
+The pod mounts this PVC at `/models` and the Go server serves it at `/assets/models/`:
+
+```go
+// Replace embedded assets with NFS-backed volume
+mux.Handle("/assets/models/", http.StripPrefix("/assets/models/",
+    http.FileServer(http.Dir("/models"))))
+```
+
+Benefits:
+- **No image rebuild** to add/update models — write to gravenhollow NFS, pod sees it immediately (with `actimeo=600` cache, within 10 minutes)
+- **All-SSD + dual 10GbE** — VRM files (typically 5–30 MB) load in <100ms on LAN
+- **ReadOnlyMany** — multiple replicas can share the same PVC
+- Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow
+
+#### Remote Serving (Cloudflare R2 CDN)
+
+Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Serving multi-MB VRM files through the tunnel works but adds latency and consumes tunnel bandwidth. Cloudflare R2 provides a better path:
+
+| | |
+|---|---|
+| **Bucket** | `avatar-models` on Cloudflare R2 |
+| **Custom domain** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) |
+| **Free egress** | R2 has zero egress fees — ideal for large binary assets |
+| **Cache** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` |
+| **Sync** | CronJob in cluster: `rclone sync gravenhollow-s3:avatar-models r2:avatar-models --checksum` |
+| **Auth** | Public read (models are not sensitive); write via R2 API token in Vault |
+
+##### R2 Sync CronJob
+
+```yaml
+apiVersion: batch/v1
+kind: CronJob
+metadata:
+  name: avatar-models-r2-sync
+  namespace: ai-ml
+spec:
+  schedule: "0 */6 * * *"    # Every 6 hours
+  jobTemplate:
+    spec:
+      template:
+        spec:
+          containers:
+            - name: sync
+              image: rclone/rclone:1.68
+              command:
+                - rclone
+                - sync
+                - gravenhollow-s3:avatar-models
+                - r2:avatar-models
+                - --checksum
+                - --transfers=4
+                - -v
+              volumeMounts:
+                - name: rclone-config
+                  mountPath: /config/rclone
+                  readOnly: true
+          volumes:
+            - name: rclone-config
+              secret:
+                secretName: rclone-r2-config
+          restartPolicy: OnFailure
+```
+
+##### rclone Config (ExternalSecret from Vault)
+
+```ini
+[gravenhollow-s3]
+type = s3
+provider = Other
+endpoint = https://gravenhollow.lab.daviestechlabs.io:30292
+access_key_id = <from-vault>
+secret_access_key = <from-vault>
+
+[r2]
+type = s3
+provider = Cloudflare
+endpoint = https://<account-id>.r2.cloudflarestorage.com
+access_key_id = <from-vault>
+secret_access_key = <from-vault>
+region = auto
+```
+
+##### Client-Side Routing
+
+The frontend detects whether the user is on LAN or remote and routes model fetches accordingly:
+
+```javascript
+// avatar.js — model URL resolution
+function resolveModelURL(path) {
+    // LAN users: serve from the Go server (NFS-backed, same origin)
+    // Remote users: serve from Cloudflare R2 CDN
+    const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io');
+    if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm
+    return `https://assets.daviestechlabs.io${path.replace('/assets', '')}`;  
+    // → https://assets.daviestechlabs.io/models/Companion-A.vrm
+}
+```
+
+Alternatively, the Go server can set the model base URL via a template variable based on the `Host` header, keeping the logic server-side.
+
+#### Versioning Strategy
+
+VRM files are immutable once promoted — updated models get a new filename (e.g., `Companion-A-v2.vrm`) rather than overwriting. This ensures:
+- Cloudflare CDN cache never serves stale content
+- Rollback is trivial — point the allowlist back to the previous version
+- Browser `Cache-Control: immutable` works correctly
+
+#### Storage Tier Summary
+
+| Location | Purpose | Tier | Access |
+|----------|---------|------|--------|
+| Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only |
+| gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN |
+| gravenhollow RustFS S3 (`avatar-models`) | Mirror of NFS dir for R2 sync source | S3 API | CronJob rclone |
+| Cloudflare R2 (`avatar-models`) | CDN-served copy for remote users | R2 public bucket | Global, zero egress fees |
+
+## BlenderMCP Capabilities Used
+
+| MCP Tool | Avatar Workflow Use |
+|----------|-------------------|
+| `get_scene_info` | Inspect current scene before modifications |
+| `create_object` | Scaffold base meshes for characters |
+| `modify_object` | Adjust proportions, positions, bone placement |
+| `set_material` | Apply skin, hair, clothing materials |
+| `execute_blender_code` | Run VRM export scripts, batch operations, custom rigging |
+| `get_screenshot` | AI reviews viewport to understand current state |
+| `poly_haven_download` | Fetch HDRIs, textures for environment/materials |
+| `hyper3d_generate` | Generate base 3D models from text prompts via Hyper3D Rodin |
+
+## Security Considerations
+
+* **Code execution:** BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender. The Kasm session is sandboxed (DinD container with no cluster access), limiting blast radius. Always save before executing AI-generated code.
+* **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config.
+* **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted.
+* **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required.
+* **R2 credentials:** Cloudflare R2 API token stored in Vault (`kv/data/cloudflare-r2`), accessed via ExternalSecret by the sync CronJob. Write-only scope — the CronJob uploads but cannot delete the bucket.
+* **Public R2 bucket:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. R2 bucket is read-only public via custom domain; write access requires the API token.
+* **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change.
+
+## Pros and Cons of the Options
+
+### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow NFS + Cloudflare R2
+
+* Good, because AI-assisted modelling reduces manual effort for avatar creation
+* Good, because assets persist in S3 across sessions and are accessible from CI
+* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, and Cloudflare are already deployed
+* Good, because VS Code MCP integration means one editor for code and 3D work
+* Good, because Kasm sandboxes Blender execution away from the cluster
+* Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models)
+* Good, because Cloudflare R2 has zero egress fees and global CDN caching for remote users
+* Good, because immutable versioned filenames enable aggressive caching and trivial rollback
+* Bad, because BlenderMCP is a third-party tool with arbitrary code execution
+* Bad, because socket communication adds latency for remote Kasm sessions
+* Bad, because VRM rigging quality may require manual adjustment after AI scaffolding
+* Bad, because two-hop promotion path adds operational complexity
+
+### Option 2 — Local Blender + BlenderMCP on developer laptop
+
+* Good, because lowest latency (everything local)
+* Good, because no Kasm dependency
+* Bad, because assets are local — no durable S3 storage without manual sync
+* Bad, because Blender + add-ons must be installed on every dev machine
+* Bad, because not reproducible across machines
+
+### Option 3 — Hyper3D / Rodin cloud generation only
+
+* Good, because no Blender installation needed
+* Good, because fully prompt-driven model generation
+* Bad, because limited control over output — no fine-tuning materials, rigging, or proportions
+* Bad, because Hyper3D free tier has daily generation limits
+* Bad, because generated models require post-processing for VRM compliance (humanoid rig, expressions, visemes)
+* Bad, because vendor dependency for a core asset pipeline
+
+### Option 4 — Manual Blender workflow (status quo)
+
+* Good, because full manual control
+* Good, because no new tooling
+* Bad, because slow — no AI assistance for repetitive modelling tasks
+* Bad, because no integration with the development workflow
+* Bad, because assets stored ad-hoc with no structured pipeline to companions-frontend
+
+## Links
+
+* Related to [ADR-0046](0046-companions-frontend-architecture.md) (companions-frontend architecture — Three.js + VRM avatars)
+* Related to [ADR-0026](0026-storage-strategy.md) (storage strategy — gravenhollow NFS-fast, Quobyte S3, rclone)
+* Related to [ADR-0044](0044-dns-and-external-access.md) (DNS and external access — Cloudflare Tunnel, split-horizon)
+* Related to [ADR-0049](0049-self-hosted-productivity-suite.md) (Kasm Workspaces)
+* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
+* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
+* [VRM Specification](https://vrm.dev/en/)
+* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend)
+* [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures)
+* [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation)
+* [Cloudflare R2 Documentation](https://developers.cloudflare.com/r2/)
+* [Cloudflare R2 Custom Domains](https://developers.cloudflare.com/r2/buckets/public-buckets/#custom-domains)