Files
homelab-design/decisions/0062-blender-mcp-3d-avatar-workflow.md
Billy D. 9fe12e0cff
Some checks failed
Update README with ADR Index / update-readme (push) Has been cancelled
ADR-0062: BlenderMCP 3D avatar workflow with Kasm, gravenhollow NFS, and Cloudflare R2
2026-02-21 16:13:36 -05:00

445 lines
28 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# BlenderMCP for 3D Avatar Creation via Kasm Workstation
* Status: proposed
* Date: 2026-02-21
* Deciders: Billy
* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare R2 CDN
## Context and Problem Statement
The companions-frontend serves VRM avatar models for its Three.js-based 3D character rendering (see [ADR-0046](0046-companions-frontend-architecture.md)). Today the avatar library is limited to three models (`Seed-san.vrm`, `Aka.vrm`, `Midori.vrm`) — only one of which actually ships in the repo — and every model must be sourced or hand-sculpted externally.
Creating custom VRM avatars is a manual, time-intensive process: open Blender, sculpt/rig a character, export to VRM, iterate. There is no integration between the AI coding workflow (VS Code / Copilot) and Blender, so context switching between the editor and the 3D tool is constant.
How do we streamline custom 3D avatar creation for companions-frontend with AI assistance, while keeping assets durable and accessible across workstations?
## Decision Drivers
* The existing avatar pipeline is manual and disconnected from the development workflow
* BlenderMCP (v1.5.5, 17k+ GitHub stars) bridges AI assistants to Blender via the Model Context Protocol — enabling prompt-driven 3D modelling, material control, scene manipulation, and code execution inside Blender
* Kasm Workspaces already run in the cluster (`productivity` namespace) and support Docker-in-Docker with volume plugins for persistent storage
* VS Code supports MCP servers natively (GitHub Copilot agent mode), meaning the same editor used for code can drive Blender scene creation
* Custom volume mounts in Kasm map `/s3` to S3-compatible storage via the rclone Docker volume plugin — providing durable, off-node persistence
* Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend
* VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL
* Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS
* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-backed path for multi-MB VRM downloads
* Models are write-once/read-many — ideal for aggressive caching
## Considered Options
1. **BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3 (`kasm` bucket)**
2. **Local Blender + BlenderMCP on a developer laptop**
3. **Hyper3D / Rodin cloud generation only (no Blender)**
4. **Manual Blender workflow (status quo)**
## Decision Outcome
Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3**, because it integrates AI-assisted modelling directly into the existing Kasm + VS Code workflow, stores assets durably in S3, and requires no additional infrastructure beyond what is already deployed.
### Positive Consequences
* AI-assisted 3D modelling — prompt-driven creation, material application, and scene manipulation inside Blender via MCP
* Zero context switching — VS Code agent mode drives Blender commands through the same editor used for code
* Persistent storage — VRM exports written to `/s3` survive session teardown and are available from any Kasm session or CI pipeline
* Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed
* No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist
* LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms
* Remote performance — Cloudflare R2 CDN with zero egress fees and 300+ global PoPs for remote users via Cloudflare Tunnel
* Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library
* VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend
* Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked
### Negative Consequences
* BlenderMCP `execute_blender_code` tool runs arbitrary Python in Blender — must trust AI-generated code or review before execution
* Socket-based communication (TCP 9876) between the MCP server and Blender add-on adds a failure mode
* VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed
* Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed
* Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy
* Cloudflare R2 sync adds a CronJob and requires a Cloudflare R2 API token in Vault
* Two-hop promotion path (Quobyte S3 → gravenhollow NFS → Cloudflare R2) adds operational steps
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Developer Workstation │
│ │
│ ┌──────────────────────────────────┐ │
│ │ VS Code (local) │ │
│ │ │ │
│ │ GitHub Copilot (agent mode) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ BlenderMCP Server (MCP) │ │
│ │ (uvx blender-mcp) │ │
│ │ │ │ │
│ └─────────┼────────────────────────┘ │
│ │ TCP :9876 (JSON over socket) │
└────────────┼────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Kasm Blender Workstation (browser session) │
│ kasm.daviestechlabs.io │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Blender 4.x │ │
│ │ │ │
│ │ Add-ons: │ │
│ │ • BlenderMCP (addon.py) — socket server :9876 │ │
│ │ • VRM Add-on for Blender — import/export VRM │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────┐ │ │
│ │ │ /s3/blender-avatars/ │ │ │
│ │ │ ├── projects/ (.blend source files) │ │ │
│ │ │ ├── exports/ (.vrm exported models) │ │ │
│ │ │ └── textures/ (shared texture lib) │ │ │
│ │ └────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ rclone volume │
│ plugin (S3) │
└──────────────────────────┼──────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Quobyte S3 Endpoint │
│ Bucket: kasm │
│ │
│ kasm/blender-avatars/projects/Companion-A.blend │
│ kasm/blender-avatars/exports/Companion-A.vrm │
│ kasm/blender-avatars/textures/skin-tone-01.png │
└──────────────────────────┬──────────────────────────────────────────────┘
rclone sync (promotion)
┌─────────────────────────────────────────────────────────────────────────┐
│ gravenhollow.lab.daviestechlabs.io │
│ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │
│ │
│ NFS: /mnt/gravenhollow/kubernetes/avatar-models/ │
│ ├── Seed-san.vrm (default model) │
│ ├── Aka.vrm (Legend tier) │
│ ├── Midori.vrm (Legend tier) │
│ ├── Companion-A.vrm (custom, promoted from Kasm S3) │
│ └── animations/ (shared animation clips) │
│ │
│ S3 (RustFS): avatar-models bucket │
│ (mirror of NFS dir for Cloudflare R2 sync) │
└──────────┬─────────────────────────────────┬────────────────────────────┘
│ │
NFS mount (nfs-fast) rclone sync (cron)
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────────────────┐
│ companions-frontend │ │ Cloudflare R2 │
│ (Kubernetes pod) │ │ Bucket: avatar-models │
│ │ │ │
│ /models/ volume mount │ │ Custom domain: │
│ (nfs-fast PVC, RO) │ │ assets.daviestechlabs.io/models/ │
│ │ │ │
│ Go FileServer: │ │ Cache-Control: public, max-age=31536000 │
│ /assets/models/ → │ │ (immutable, versioned filenames) │
│ serves from PVC │ │ │
│ │ │ Free egress (no bandwidth charges) │
└──────────┬───────────────┘ └──────────────────────┬───────────────────┘
│ │
LAN clients Remote clients
companions-chat.lab... companions-chat via
(envoy-internal, direct) Cloudflare Tunnel
│ │
└──────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Browser (Three.js) │
│ AvatarManager.loadModel('/assets/models/Companion-A.vrm') │
│ │
│ LAN: fetch from companions-frontend pod (NFS-backed, ~10GbE) │
│ Remote: fetch from Cloudflare R2 CDN (cache-hit, global PoPs) │
└─────────────────────────────────────────────────────────────────────────┘
```
## Workflow
### 1. Kasm Workspace Setup
The Kasm Blender workspace image is configured with:
| Component | Version | Purpose |
|-----------|---------|---------|
| Blender | 4.x | 3D modelling and sculpting |
| BlenderMCP add-on (`addon.py`) | 1.5.5 | Socket server for MCP commands |
| VRM Add-on for Blender | latest | Import/export VRM format |
| Python | 3.10+ | Blender scripting runtime |
The Kasm storage mapping mounts `/s3` via the rclone Docker volume plugin to the Quobyte S3 endpoint (`kasm` bucket). The sub-path `blender-avatars/` is used for all 3D asset work.
### 2. VS Code MCP Configuration
Add BlenderMCP as an MCP server in VS Code (`.vscode/mcp.json` or user settings):
```json
{
"servers": {
"blender": {
"command": "uvx",
"args": ["blender-mcp"],
"env": {
"BLENDER_HOST": "localhost",
"BLENDER_PORT": "9876",
"DISABLE_TELEMETRY": "true"
}
}
}
}
```
When the Kasm session is accessed remotely, set `BLENDER_HOST` to the Kasm workstation's reachable address.
### 3. Avatar Creation Workflow
1. **Launch** the Kasm Blender workspace via `kasm.daviestechlabs.io`
2. **Enable** the BlenderMCP add-on in Blender → 3D View sidebar → "BlenderMCP" tab → "Connect to Claude"
3. **Open VS Code** with Copilot agent mode and the BlenderMCP MCP server running
4. **Prompt** the AI to create or modify avatars:
- _"Create a humanoid character with anime-style proportions, blue hair, and a fantasy outfit"_
- _"Apply a metallic gold material to the armor pieces"_
- _"Set up the lighting for a character showcase render"_
- _"Rig this character for VRM export with standard humanoid bones"_
5. **Export** the finished model to VRM via the VRM add-on (or via BlenderMCP `execute_blender_code` calling the VRM export operator)
6. **Save** the `.vrm` to `/s3/blender-avatars/exports/` and the `.blend` source to `/s3/blender-avatars/projects/`
7. **Import** the VRM into companions-frontend — copy to `assets/models/`, update the allowlists in `internal/database/database.go` and `static/js/avatar.js`
### 4. Asset Pipeline (Kasm S3 → gravenhollow → production)
| Stage | Action |
|-------|--------|
| **Create** | AI-assisted modelling + VRM export in Kasm Blender → `/s3/blender-avatars/exports/*.vrm` |
| **Store** | rclone syncs `/s3` to Quobyte S3 `kasm` bucket automatically |
| **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) |
| **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo |
| **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed |
| **CDN sync** | CronJob `rclone sync` from gravenhollow RustFS `avatar-models` bucket → Cloudflare R2 `avatar-models` bucket |
### 5. Deployment and Storage Architecture
#### Local Serving (LAN users)
Companions-frontend currently serves VRM models via `http.FileServer(http.Dir("assets"))` from the container filesystem. This bakes models into the image and requires a rebuild to add new avatars.
The new approach mounts avatar models from gravenhollow via an `nfs-fast` PVC:
```yaml
# PersistentVolumeClaim for avatar models
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: avatar-models
namespace: ai-ml
spec:
storageClassName: nfs-fast
accessModes: [ReadOnlyMany]
resources:
requests:
storage: 10Gi
```
The pod mounts this PVC at `/models` and the Go server serves it at `/assets/models/`:
```go
// Replace embedded assets with NFS-backed volume
mux.Handle("/assets/models/", http.StripPrefix("/assets/models/",
http.FileServer(http.Dir("/models"))))
```
Benefits:
- **No image rebuild** to add/update models — write to gravenhollow NFS, pod sees it immediately (with `actimeo=600` cache, within 10 minutes)
- **All-SSD + dual 10GbE** — VRM files (typically 530 MB) load in <100ms on LAN
- **ReadOnlyMany** — multiple replicas can share the same PVC
- Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow
#### Remote Serving (Cloudflare R2 CDN)
Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Serving multi-MB VRM files through the tunnel works but adds latency and consumes tunnel bandwidth. Cloudflare R2 provides a better path:
| | |
|---|---|
| **Bucket** | `avatar-models` on Cloudflare R2 |
| **Custom domain** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) |
| **Free egress** | R2 has zero egress fees — ideal for large binary assets |
| **Cache** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` |
| **Sync** | CronJob in cluster: `rclone sync gravenhollow-s3:avatar-models r2:avatar-models --checksum` |
| **Auth** | Public read (models are not sensitive); write via R2 API token in Vault |
##### R2 Sync CronJob
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: avatar-models-r2-sync
namespace: ai-ml
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: sync
image: rclone/rclone:1.68
command:
- rclone
- sync
- gravenhollow-s3:avatar-models
- r2:avatar-models
- --checksum
- --transfers=4
- -v
volumeMounts:
- name: rclone-config
mountPath: /config/rclone
readOnly: true
volumes:
- name: rclone-config
secret:
secretName: rclone-r2-config
restartPolicy: OnFailure
```
##### rclone Config (ExternalSecret from Vault)
```ini
[gravenhollow-s3]
type = s3
provider = Other
endpoint = https://gravenhollow.lab.daviestechlabs.io:30292
access_key_id = <from-vault>
secret_access_key = <from-vault>
[r2]
type = s3
provider = Cloudflare
endpoint = https://<account-id>.r2.cloudflarestorage.com
access_key_id = <from-vault>
secret_access_key = <from-vault>
region = auto
```
##### Client-Side Routing
The frontend detects whether the user is on LAN or remote and routes model fetches accordingly:
```javascript
// avatar.js — model URL resolution
function resolveModelURL(path) {
// LAN users: serve from the Go server (NFS-backed, same origin)
// Remote users: serve from Cloudflare R2 CDN
const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io');
if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm
return `https://assets.daviestechlabs.io${path.replace('/assets', '')}`;
// → https://assets.daviestechlabs.io/models/Companion-A.vrm
}
```
Alternatively, the Go server can set the model base URL via a template variable based on the `Host` header, keeping the logic server-side.
#### Versioning Strategy
VRM files are immutable once promoted — updated models get a new filename (e.g., `Companion-A-v2.vrm`) rather than overwriting. This ensures:
- Cloudflare CDN cache never serves stale content
- Rollback is trivial — point the allowlist back to the previous version
- Browser `Cache-Control: immutable` works correctly
#### Storage Tier Summary
| Location | Purpose | Tier | Access |
|----------|---------|------|--------|
| Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only |
| gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN |
| gravenhollow RustFS S3 (`avatar-models`) | Mirror of NFS dir for R2 sync source | S3 API | CronJob rclone |
| Cloudflare R2 (`avatar-models`) | CDN-served copy for remote users | R2 public bucket | Global, zero egress fees |
## BlenderMCP Capabilities Used
| MCP Tool | Avatar Workflow Use |
|----------|-------------------|
| `get_scene_info` | Inspect current scene before modifications |
| `create_object` | Scaffold base meshes for characters |
| `modify_object` | Adjust proportions, positions, bone placement |
| `set_material` | Apply skin, hair, clothing materials |
| `execute_blender_code` | Run VRM export scripts, batch operations, custom rigging |
| `get_screenshot` | AI reviews viewport to understand current state |
| `poly_haven_download` | Fetch HDRIs, textures for environment/materials |
| `hyper3d_generate` | Generate base 3D models from text prompts via Hyper3D Rodin |
## Security Considerations
* **Code execution:** BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender. The Kasm session is sandboxed (DinD container with no cluster access), limiting blast radius. Always save before executing AI-generated code.
* **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config.
* **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted.
* **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required.
* **R2 credentials:** Cloudflare R2 API token stored in Vault (`kv/data/cloudflare-r2`), accessed via ExternalSecret by the sync CronJob. Write-only scope — the CronJob uploads but cannot delete the bucket.
* **Public R2 bucket:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. R2 bucket is read-only public via custom domain; write access requires the API token.
* **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change.
## Pros and Cons of the Options
### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow NFS + Cloudflare R2
* Good, because AI-assisted modelling reduces manual effort for avatar creation
* Good, because assets persist in S3 across sessions and are accessible from CI
* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, and Cloudflare are already deployed
* Good, because VS Code MCP integration means one editor for code and 3D work
* Good, because Kasm sandboxes Blender execution away from the cluster
* Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models)
* Good, because Cloudflare R2 has zero egress fees and global CDN caching for remote users
* Good, because immutable versioned filenames enable aggressive caching and trivial rollback
* Bad, because BlenderMCP is a third-party tool with arbitrary code execution
* Bad, because socket communication adds latency for remote Kasm sessions
* Bad, because VRM rigging quality may require manual adjustment after AI scaffolding
* Bad, because two-hop promotion path adds operational complexity
### Option 2 — Local Blender + BlenderMCP on developer laptop
* Good, because lowest latency (everything local)
* Good, because no Kasm dependency
* Bad, because assets are local — no durable S3 storage without manual sync
* Bad, because Blender + add-ons must be installed on every dev machine
* Bad, because not reproducible across machines
### Option 3 — Hyper3D / Rodin cloud generation only
* Good, because no Blender installation needed
* Good, because fully prompt-driven model generation
* Bad, because limited control over output — no fine-tuning materials, rigging, or proportions
* Bad, because Hyper3D free tier has daily generation limits
* Bad, because generated models require post-processing for VRM compliance (humanoid rig, expressions, visemes)
* Bad, because vendor dependency for a core asset pipeline
### Option 4 — Manual Blender workflow (status quo)
* Good, because full manual control
* Good, because no new tooling
* Bad, because slow — no AI assistance for repetitive modelling tasks
* Bad, because no integration with the development workflow
* Bad, because assets stored ad-hoc with no structured pipeline to companions-frontend
## Links
* Related to [ADR-0046](0046-companions-frontend-architecture.md) (companions-frontend architecture — Three.js + VRM avatars)
* Related to [ADR-0026](0026-storage-strategy.md) (storage strategy — gravenhollow NFS-fast, Quobyte S3, rclone)
* Related to [ADR-0044](0044-dns-and-external-access.md) (DNS and external access — Cloudflare Tunnel, split-horizon)
* Related to [ADR-0049](0049-self-hosted-productivity-suite.md) (Kasm Workspaces)
* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
* [VRM Specification](https://vrm.dev/en/)
* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend)
* [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures)
* [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation)
* [Cloudflare R2 Documentation](https://developers.cloudflare.com/r2/)
* [Cloudflare R2 Custom Domains](https://developers.cloudflare.com/r2/buckets/public-buckets/#custom-domains)