diff --git a/decisions/0062-blender-mcp-3d-avatar-workflow.md b/decisions/0062-blender-mcp-3d-avatar-workflow.md new file mode 100644 index 0000000..d8c4689 --- /dev/null +++ b/decisions/0062-blender-mcp-3d-avatar-workflow.md @@ -0,0 +1,444 @@ +# BlenderMCP for 3D Avatar Creation via Kasm Workstation + +* Status: proposed +* Date: 2026-02-21 +* Deciders: Billy +* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare R2 CDN + +## Context and Problem Statement + +The companions-frontend serves VRM avatar models for its Three.js-based 3D character rendering (see [ADR-0046](0046-companions-frontend-architecture.md)). Today the avatar library is limited to three models (`Seed-san.vrm`, `Aka.vrm`, `Midori.vrm`) — only one of which actually ships in the repo — and every model must be sourced or hand-sculpted externally. + +Creating custom VRM avatars is a manual, time-intensive process: open Blender, sculpt/rig a character, export to VRM, iterate. There is no integration between the AI coding workflow (VS Code / Copilot) and Blender, so context switching between the editor and the 3D tool is constant. + +How do we streamline custom 3D avatar creation for companions-frontend with AI assistance, while keeping assets durable and accessible across workstations? + +## Decision Drivers + +* The existing avatar pipeline is manual and disconnected from the development workflow +* BlenderMCP (v1.5.5, 17k+ GitHub stars) bridges AI assistants to Blender via the Model Context Protocol — enabling prompt-driven 3D modelling, material control, scene manipulation, and code execution inside Blender +* Kasm Workspaces already run in the cluster (`productivity` namespace) and support Docker-in-Docker with volume plugins for persistent storage +* VS Code supports MCP servers natively (GitHub Copilot agent mode), meaning the same editor used for code can drive Blender scene creation +* Custom volume mounts in Kasm map `/s3` to S3-compatible storage via the rclone Docker volume plugin — providing durable, off-node persistence +* Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend +* VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL +* Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS +* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-backed path for multi-MB VRM downloads +* Models are write-once/read-many — ideal for aggressive caching + +## Considered Options + +1. **BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3 (`kasm` bucket)** +2. **Local Blender + BlenderMCP on a developer laptop** +3. **Hyper3D / Rodin cloud generation only (no Blender)** +4. **Manual Blender workflow (status quo)** + +## Decision Outcome + +Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3**, because it integrates AI-assisted modelling directly into the existing Kasm + VS Code workflow, stores assets durably in S3, and requires no additional infrastructure beyond what is already deployed. + +### Positive Consequences + +* AI-assisted 3D modelling — prompt-driven creation, material application, and scene manipulation inside Blender via MCP +* Zero context switching — VS Code agent mode drives Blender commands through the same editor used for code +* Persistent storage — VRM exports written to `/s3` survive session teardown and are available from any Kasm session or CI pipeline +* Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed +* No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist +* LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms +* Remote performance — Cloudflare R2 CDN with zero egress fees and 300+ global PoPs for remote users via Cloudflare Tunnel +* Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library +* VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend +* Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked + +### Negative Consequences + +* BlenderMCP `execute_blender_code` tool runs arbitrary Python in Blender — must trust AI-generated code or review before execution +* Socket-based communication (TCP 9876) between the MCP server and Blender add-on adds a failure mode +* VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed +* Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed +* Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy +* Cloudflare R2 sync adds a CronJob and requires a Cloudflare R2 API token in Vault +* Two-hop promotion path (Quobyte S3 → gravenhollow NFS → Cloudflare R2) adds operational steps + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Developer Workstation │ +│ │ +│ ┌──────────────────────────────────┐ │ +│ │ VS Code (local) │ │ +│ │ │ │ +│ │ GitHub Copilot (agent mode) │ │ +│ │ │ │ │ +│ │ ▼ │ │ +│ │ BlenderMCP Server (MCP) │ │ +│ │ (uvx blender-mcp) │ │ +│ │ │ │ │ +│ └─────────┼────────────────────────┘ │ +│ │ TCP :9876 (JSON over socket) │ +└────────────┼────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ Kasm Blender Workstation (browser session) │ +│ kasm.daviestechlabs.io │ +│ │ +│ ┌──────────────────────────────────────────────────────┐ │ +│ │ Blender 4.x │ │ +│ │ │ │ +│ │ Add-ons: │ │ +│ │ • BlenderMCP (addon.py) — socket server :9876 │ │ +│ │ • VRM Add-on for Blender — import/export VRM │ │ +│ │ │ │ +│ │ ┌────────────────────────────────────────────────┐ │ │ +│ │ │ /s3/blender-avatars/ │ │ │ +│ │ │ ├── projects/ (.blend source files) │ │ │ +│ │ │ ├── exports/ (.vrm exported models) │ │ │ +│ │ │ └── textures/ (shared texture lib) │ │ │ +│ │ └────────────────────────────────────────────────┘ │ │ +│ └──────────────────────────────────────────────────────┘ │ +│ │ │ +│ rclone volume │ +│ plugin (S3) │ +└──────────────────────────┼──────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ Quobyte S3 Endpoint │ +│ Bucket: kasm │ +│ │ +│ kasm/blender-avatars/projects/Companion-A.blend │ +│ kasm/blender-avatars/exports/Companion-A.vrm │ +│ kasm/blender-avatars/textures/skin-tone-01.png │ +└──────────────────────────┬──────────────────────────────────────────────┘ + │ + rclone sync (promotion) + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ gravenhollow.lab.daviestechlabs.io │ +│ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │ +│ │ +│ NFS: /mnt/gravenhollow/kubernetes/avatar-models/ │ +│ ├── Seed-san.vrm (default model) │ +│ ├── Aka.vrm (Legend tier) │ +│ ├── Midori.vrm (Legend tier) │ +│ ├── Companion-A.vrm (custom, promoted from Kasm S3) │ +│ └── animations/ (shared animation clips) │ +│ │ +│ S3 (RustFS): avatar-models bucket │ +│ (mirror of NFS dir for Cloudflare R2 sync) │ +└──────────┬─────────────────────────────────┬────────────────────────────┘ + │ │ + NFS mount (nfs-fast) rclone sync (cron) + │ │ + ▼ ▼ +┌──────────────────────────┐ ┌──────────────────────────────────────────┐ +│ companions-frontend │ │ Cloudflare R2 │ +│ (Kubernetes pod) │ │ Bucket: avatar-models │ +│ │ │ │ +│ /models/ volume mount │ │ Custom domain: │ +│ (nfs-fast PVC, RO) │ │ assets.daviestechlabs.io/models/ │ +│ │ │ │ +│ Go FileServer: │ │ Cache-Control: public, max-age=31536000 │ +│ /assets/models/ → │ │ (immutable, versioned filenames) │ +│ serves from PVC │ │ │ +│ │ │ Free egress (no bandwidth charges) │ +└──────────┬───────────────┘ └──────────────────────┬───────────────────┘ + │ │ + LAN clients Remote clients + companions-chat.lab... companions-chat via + (envoy-internal, direct) Cloudflare Tunnel + │ │ + └──────────────────┬───────────────────────┘ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ Browser (Three.js) │ +│ AvatarManager.loadModel('/assets/models/Companion-A.vrm') │ +│ │ +│ LAN: fetch from companions-frontend pod (NFS-backed, ~10GbE) │ +│ Remote: fetch from Cloudflare R2 CDN (cache-hit, global PoPs) │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +## Workflow + +### 1. Kasm Workspace Setup + +The Kasm Blender workspace image is configured with: + +| Component | Version | Purpose | +|-----------|---------|---------| +| Blender | 4.x | 3D modelling and sculpting | +| BlenderMCP add-on (`addon.py`) | 1.5.5 | Socket server for MCP commands | +| VRM Add-on for Blender | latest | Import/export VRM format | +| Python | 3.10+ | Blender scripting runtime | + +The Kasm storage mapping mounts `/s3` via the rclone Docker volume plugin to the Quobyte S3 endpoint (`kasm` bucket). The sub-path `blender-avatars/` is used for all 3D asset work. + +### 2. VS Code MCP Configuration + +Add BlenderMCP as an MCP server in VS Code (`.vscode/mcp.json` or user settings): + +```json +{ + "servers": { + "blender": { + "command": "uvx", + "args": ["blender-mcp"], + "env": { + "BLENDER_HOST": "localhost", + "BLENDER_PORT": "9876", + "DISABLE_TELEMETRY": "true" + } + } + } +} +``` + +When the Kasm session is accessed remotely, set `BLENDER_HOST` to the Kasm workstation's reachable address. + +### 3. Avatar Creation Workflow + +1. **Launch** the Kasm Blender workspace via `kasm.daviestechlabs.io` +2. **Enable** the BlenderMCP add-on in Blender → 3D View sidebar → "BlenderMCP" tab → "Connect to Claude" +3. **Open VS Code** with Copilot agent mode and the BlenderMCP MCP server running +4. **Prompt** the AI to create or modify avatars: + - _"Create a humanoid character with anime-style proportions, blue hair, and a fantasy outfit"_ + - _"Apply a metallic gold material to the armor pieces"_ + - _"Set up the lighting for a character showcase render"_ + - _"Rig this character for VRM export with standard humanoid bones"_ +5. **Export** the finished model to VRM via the VRM add-on (or via BlenderMCP `execute_blender_code` calling the VRM export operator) +6. **Save** the `.vrm` to `/s3/blender-avatars/exports/` and the `.blend` source to `/s3/blender-avatars/projects/` +7. **Import** the VRM into companions-frontend — copy to `assets/models/`, update the allowlists in `internal/database/database.go` and `static/js/avatar.js` + +### 4. Asset Pipeline (Kasm S3 → gravenhollow → production) + +| Stage | Action | +|-------|--------| +| **Create** | AI-assisted modelling + VRM export in Kasm Blender → `/s3/blender-avatars/exports/*.vrm` | +| **Store** | rclone syncs `/s3` to Quobyte S3 `kasm` bucket automatically | +| **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) | +| **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo | +| **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed | +| **CDN sync** | CronJob `rclone sync` from gravenhollow RustFS `avatar-models` bucket → Cloudflare R2 `avatar-models` bucket | + +### 5. Deployment and Storage Architecture + +#### Local Serving (LAN users) + +Companions-frontend currently serves VRM models via `http.FileServer(http.Dir("assets"))` from the container filesystem. This bakes models into the image and requires a rebuild to add new avatars. + +The new approach mounts avatar models from gravenhollow via an `nfs-fast` PVC: + +```yaml +# PersistentVolumeClaim for avatar models +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: avatar-models + namespace: ai-ml +spec: + storageClassName: nfs-fast + accessModes: [ReadOnlyMany] + resources: + requests: + storage: 10Gi +``` + +The pod mounts this PVC at `/models` and the Go server serves it at `/assets/models/`: + +```go +// Replace embedded assets with NFS-backed volume +mux.Handle("/assets/models/", http.StripPrefix("/assets/models/", + http.FileServer(http.Dir("/models")))) +``` + +Benefits: +- **No image rebuild** to add/update models — write to gravenhollow NFS, pod sees it immediately (with `actimeo=600` cache, within 10 minutes) +- **All-SSD + dual 10GbE** — VRM files (typically 5–30 MB) load in <100ms on LAN +- **ReadOnlyMany** — multiple replicas can share the same PVC +- Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow + +#### Remote Serving (Cloudflare R2 CDN) + +Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Serving multi-MB VRM files through the tunnel works but adds latency and consumes tunnel bandwidth. Cloudflare R2 provides a better path: + +| | | +|---|---| +| **Bucket** | `avatar-models` on Cloudflare R2 | +| **Custom domain** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) | +| **Free egress** | R2 has zero egress fees — ideal for large binary assets | +| **Cache** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` | +| **Sync** | CronJob in cluster: `rclone sync gravenhollow-s3:avatar-models r2:avatar-models --checksum` | +| **Auth** | Public read (models are not sensitive); write via R2 API token in Vault | + +##### R2 Sync CronJob + +```yaml +apiVersion: batch/v1 +kind: CronJob +metadata: + name: avatar-models-r2-sync + namespace: ai-ml +spec: + schedule: "0 */6 * * *" # Every 6 hours + jobTemplate: + spec: + template: + spec: + containers: + - name: sync + image: rclone/rclone:1.68 + command: + - rclone + - sync + - gravenhollow-s3:avatar-models + - r2:avatar-models + - --checksum + - --transfers=4 + - -v + volumeMounts: + - name: rclone-config + mountPath: /config/rclone + readOnly: true + volumes: + - name: rclone-config + secret: + secretName: rclone-r2-config + restartPolicy: OnFailure +``` + +##### rclone Config (ExternalSecret from Vault) + +```ini +[gravenhollow-s3] +type = s3 +provider = Other +endpoint = https://gravenhollow.lab.daviestechlabs.io:30292 +access_key_id = +secret_access_key = + +[r2] +type = s3 +provider = Cloudflare +endpoint = https://.r2.cloudflarestorage.com +access_key_id = +secret_access_key = +region = auto +``` + +##### Client-Side Routing + +The frontend detects whether the user is on LAN or remote and routes model fetches accordingly: + +```javascript +// avatar.js — model URL resolution +function resolveModelURL(path) { + // LAN users: serve from the Go server (NFS-backed, same origin) + // Remote users: serve from Cloudflare R2 CDN + const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io'); + if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm + return `https://assets.daviestechlabs.io${path.replace('/assets', '')}`; + // → https://assets.daviestechlabs.io/models/Companion-A.vrm +} +``` + +Alternatively, the Go server can set the model base URL via a template variable based on the `Host` header, keeping the logic server-side. + +#### Versioning Strategy + +VRM files are immutable once promoted — updated models get a new filename (e.g., `Companion-A-v2.vrm`) rather than overwriting. This ensures: +- Cloudflare CDN cache never serves stale content +- Rollback is trivial — point the allowlist back to the previous version +- Browser `Cache-Control: immutable` works correctly + +#### Storage Tier Summary + +| Location | Purpose | Tier | Access | +|----------|---------|------|--------| +| Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only | +| gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN | +| gravenhollow RustFS S3 (`avatar-models`) | Mirror of NFS dir for R2 sync source | S3 API | CronJob rclone | +| Cloudflare R2 (`avatar-models`) | CDN-served copy for remote users | R2 public bucket | Global, zero egress fees | + +## BlenderMCP Capabilities Used + +| MCP Tool | Avatar Workflow Use | +|----------|-------------------| +| `get_scene_info` | Inspect current scene before modifications | +| `create_object` | Scaffold base meshes for characters | +| `modify_object` | Adjust proportions, positions, bone placement | +| `set_material` | Apply skin, hair, clothing materials | +| `execute_blender_code` | Run VRM export scripts, batch operations, custom rigging | +| `get_screenshot` | AI reviews viewport to understand current state | +| `poly_haven_download` | Fetch HDRIs, textures for environment/materials | +| `hyper3d_generate` | Generate base 3D models from text prompts via Hyper3D Rodin | + +## Security Considerations + +* **Code execution:** BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender. The Kasm session is sandboxed (DinD container with no cluster access), limiting blast radius. Always save before executing AI-generated code. +* **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config. +* **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted. +* **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required. +* **R2 credentials:** Cloudflare R2 API token stored in Vault (`kv/data/cloudflare-r2`), accessed via ExternalSecret by the sync CronJob. Write-only scope — the CronJob uploads but cannot delete the bucket. +* **Public R2 bucket:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. R2 bucket is read-only public via custom domain; write access requires the API token. +* **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change. + +## Pros and Cons of the Options + +### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow NFS + Cloudflare R2 + +* Good, because AI-assisted modelling reduces manual effort for avatar creation +* Good, because assets persist in S3 across sessions and are accessible from CI +* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, and Cloudflare are already deployed +* Good, because VS Code MCP integration means one editor for code and 3D work +* Good, because Kasm sandboxes Blender execution away from the cluster +* Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models) +* Good, because Cloudflare R2 has zero egress fees and global CDN caching for remote users +* Good, because immutable versioned filenames enable aggressive caching and trivial rollback +* Bad, because BlenderMCP is a third-party tool with arbitrary code execution +* Bad, because socket communication adds latency for remote Kasm sessions +* Bad, because VRM rigging quality may require manual adjustment after AI scaffolding +* Bad, because two-hop promotion path adds operational complexity + +### Option 2 — Local Blender + BlenderMCP on developer laptop + +* Good, because lowest latency (everything local) +* Good, because no Kasm dependency +* Bad, because assets are local — no durable S3 storage without manual sync +* Bad, because Blender + add-ons must be installed on every dev machine +* Bad, because not reproducible across machines + +### Option 3 — Hyper3D / Rodin cloud generation only + +* Good, because no Blender installation needed +* Good, because fully prompt-driven model generation +* Bad, because limited control over output — no fine-tuning materials, rigging, or proportions +* Bad, because Hyper3D free tier has daily generation limits +* Bad, because generated models require post-processing for VRM compliance (humanoid rig, expressions, visemes) +* Bad, because vendor dependency for a core asset pipeline + +### Option 4 — Manual Blender workflow (status quo) + +* Good, because full manual control +* Good, because no new tooling +* Bad, because slow — no AI assistance for repetitive modelling tasks +* Bad, because no integration with the development workflow +* Bad, because assets stored ad-hoc with no structured pipeline to companions-frontend + +## Links + +* Related to [ADR-0046](0046-companions-frontend-architecture.md) (companions-frontend architecture — Three.js + VRM avatars) +* Related to [ADR-0026](0026-storage-strategy.md) (storage strategy — gravenhollow NFS-fast, Quobyte S3, rclone) +* Related to [ADR-0044](0044-dns-and-external-access.md) (DNS and external access — Cloudflare Tunnel, split-horizon) +* Related to [ADR-0049](0049-self-hosted-productivity-suite.md) (Kasm Workspaces) +* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp) +* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/) +* [VRM Specification](https://vrm.dev/en/) +* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend) +* [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures) +* [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation) +* [Cloudflare R2 Documentation](https://developers.cloudflare.com/r2/) +* [Cloudflare R2 Custom Domains](https://developers.cloudflare.com/r2/buckets/public-buckets/#custom-domains)