Files
homelab-design/decisions/0059-mac-mini-ray-worker.md
Billy D. 4affddf9b4
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m5s
replacing mcp blender with reproducable flow.
2026-02-24 05:34:29 -05:00

18 KiB
Raw Blame History

Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation

  • Status: accepted
  • Date: 2026-02-16
  • Updated: 2026-02-23
  • Deciders: Billy
  • Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role

Context and Problem Statement

waterdeep is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see ADR-0037). The original proposal was to add it to the Ray cluster as an external inference/training worker, but:

  • All Ray inference slots are already allocated and stable — adding a 5th GPU class (MPS) increases complexity without filling a gap
  • vLLM's MPS backend remains experimental — not production-ready for serving
  • The real unmet need is 3D avatar creation for companions-frontend (ADR-0063)

ADR-0063 describes an automated ComfyUI + TRELLIS + UniRig pipeline for image-to-VRM avatar generation, running on a personal desktop as an on-demand Ray worker. This supersedes the manual BlenderMCP Kasm workflow from ADR-0062. waterdeep retains its role as an interactive Blender workstation for manual refinement of auto-generated models.

waterdeep's M4 Pro has a 16-core GPU with hardware-accelerated Metal rendering and 48 GB of unified memory shared between CPU and GPU. Running Blender natively on waterdeep with BlenderMCP gives a dramatically better 3D creation experience than Kasm.

How should we use waterdeep to maximise the 3D avatar creation pipeline for companions-frontend?

Decision Drivers

  • Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance
  • waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting
  • 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping
  • VS Code with Copilot agent mode and BlenderMCP server are installed on waterdeep — VS Code drives Blender via localhost:9876 with zero-latency socket communication
  • Exported VRM models must reach gravenhollow for production serving (ADR-0063)
  • rclone chosen for asset promotion to gravenhollow's RustFS S3 endpoint — simpler than NFS mounts on macOS, consistent with existing Kasm rclone patterns, and avoids autofs/NFS fstab complexity
  • The automated ComfyUI pipeline from ADR-0063 handles most avatar generation; waterdeep serves as the manual refinement station
  • ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified

Considered Options

  1. Local AI agent on waterdeep — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via rclone (S3)
  2. External Ray worker on macOS (original proposal) — join the Ray cluster for inference and training
  3. Keep Kasm-only workflow — rely entirely on the browser-based Kasm Blender workstation from ADR-0062

Decision Outcome

Chosen option: Option 1 — Local AI agent on waterdeep, because the Mac Mini's Metal GPU makes it dramatically better for 3D work than CPU-rendered Kasm, the Ray cluster doesn't need another worker, and the local workflow eliminates network latency between VS Code, the MCP server, and Blender.

Positive Consequences

  • Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting
  • Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution
  • 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure
  • VS Code + Copilot agent mode + BlenderMCP server installed natively — single editor drives both code and Blender commands
  • rclone for asset promotion — consistent with Kasm rclone patterns, avoids macOS NFS/autofs complexity
  • Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload
  • Kasm Blender remains available as a browser-based fallback for remote/mobile access
  • Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend

Negative Consequences

  • Blender, VS Code, and add-ons must be installed and maintained locally on waterdeep via Homebrew
  • Assets created locally need explicit rclone copy to promote to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
  • waterdeep is a single machine — no redundancy for the 3D creation workflow
  • Not managed by Kubernetes or GitOps — relies on Homebrew-managed tooling

Pros and Cons of the Options

Option 1: Local AI agent on waterdeep

  • Good, because Metal GPU acceleration makes Blender usable for real 3D work (sculpting, rendering, material preview)
  • Good, because localhost MCP socket eliminates all network latency
  • Good, because 48 GB unified memory supports complex scenes without swapping
  • Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer
  • Good, because waterdeep stays a dev workstation, aligning with its named role
  • Bad, because local-only — no browser-based remote access (use Kasm for that)
  • Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP, VS Code)
  • Bad, because asset promotion to gravenhollow requires explicit rclone command

Option 2: External Ray worker on macOS (original proposal)

  • Good, because adds GPU compute to the Ray cluster
  • Good, because training jobs gain MPS acceleration
  • Bad, because vLLM MPS backend is experimental — not production-ready
  • Bad, because adds a 5th GPU class (MPS) to an already complex fleet
  • Bad, because Ray GCS port exposure adds security surface
  • Bad, because doesn't address the actual unmet need (3D avatar creation)
  • Bad, because waterdeep becomes a server, degrading its dev workstation role

Option 3: Kasm-only workflow

  • Good, because browser-based — usable from any device
  • Good, because no local installation required
  • Bad, because CPU-rendered Blender inside DinD — poor viewport performance
  • Bad, because network latency between VS Code and Blender socket
  • Bad, because limited memory inside Kasm container
  • Bad, because no GPU acceleration for rendering or sculpting

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│  waterdeep (Mac Mini M4 Pro · 48 GB unified · Metal GPU)                │
│                                                                         │
│  ┌──────────────────────────────────────────────────────┐              │
│  │         VS Code + GitHub Copilot (agent mode)        │              │
│  │                                                      │              │
│  │  BlenderMCP Server (uvx blender-mcp)                 │              │
│  │  DISABLE_TELEMETRY=true                              │              │
│  │         │                                            │              │
│  │         │ TCP localhost:9876 (zero latency)           │              │
│  │         ▼                                            │              │
│  └─────────┬────────────────────────────────────────────┘              │
│            │                                                            │
│  ┌─────────▼────────────────────────────────────────────┐              │
│  │              Blender 4.x (native macOS)              │              │
│  │                                                      │              │
│  │  Renderer: Metal (Eevee real-time + Cycles GPU)      │              │
│  │  Add-ons:                                            │              │
│  │   • BlenderMCP (addon.py) — socket server :9876      │              │
│  │   • VRM Add-on for Blender — import/export VRM       │              │
│  │                                                      │              │
│  │  Working files: ~/blender-avatars/                    │              │
│  │  ├── projects/          (.blend source files)        │              │
│  │  ├── exports/           (.vrm exported models)       │              │
│  │  └── textures/          (shared texture library)     │              │
│  └──────────────────────────────────────────────────────┘              │
│                          │                                              │
│                    rclone (S3 asset promotion)                           │
│                    gravenhollow RustFS :30292                            │
└──────────────────────────┼──────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────────────┐
│            gravenhollow.lab.daviestechlabs.io                            │
│            (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB)            │
│                                                                         │
│  NFS: /mnt/gravenhollow/kubernetes/avatar-models/                       │
│  ├── Seed-san.vrm          (default model)                              │
│  ├── Companion-A.vrm       (promoted from waterdeep)                    │
│  └── animations/           (shared animation clips)                     │
│                                                                         │
│  S3 (RustFS): avatar-models bucket                                      │
│  (same data, served via Cloudflare Tunnel for remote users)             │
└──────────────────────────┬──────────────────────────────────────────────┘
                           │
              ┌────────────┴───────────────┐
              │                            │
        NFS (nfs-fast PVC)          Cloudflare Tunnel
              │                     (assets.daviestechlabs.io)
              ▼                            │
┌──────────────────────────┐               ▼
│  companions-frontend     │   ┌──────────────────────────┐
│  (Kubernetes pod)        │   │  Remote users (CDN-cached │
│  LAN users               │   │  via Cloudflare edge)     │
└──────────────────────────┘   └──────────────────────────┘

Implementation Plan

1. Install Blender and Add-ons

# Install Blender via Homebrew
brew install --cask blender

# Download BlenderMCP add-on
curl -LO https://raw.githubusercontent.com/ahujasid/blender-mcp/main/addon.py

# Install in Blender:
# Edit > Preferences > Add-ons > Install... > select addon.py
# Enable "Interface: Blender MCP"

# Install VRM Add-on for Blender:
# Download from https://vrm-addon-for-blender.info/en/
# Edit > Preferences > Add-ons > Install... > select VRM add-on zip
# Enable "Import-Export: VRM"

2. VS Code MCP Configuration

// .vscode/mcp.json (in companions-frontend or global settings)
{
  "servers": {
    "blender": {
      "command": "uvx",
      "args": ["blender-mcp"],
      "env": {
        "BLENDER_HOST": "localhost",
        "BLENDER_PORT": "9876",
        "DISABLE_TELEMETRY": "true"
      }
    }
  }
}

3. Python Environment for BlenderMCP

# Install uv (per ADR-0012)
curl -LsSf https://astral.sh/uv/install.sh | sh

# uvx handles the BlenderMCP server environment automatically
# Verify it works:
uvx blender-mcp --help

4. rclone for Asset Promotion

Use rclone to promote finished VRM exports to gravenhollow's RustFS S3 endpoint. This is consistent with the promotion pattern from ADR-0063 and avoids macOS NFS/autofs complexity.

# Install rclone
brew install rclone

# Configure gravenhollow RustFS endpoint
rclone config create gravenhollow s3 \
    provider=Other \
    endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \
    access_key_id=<key> \
    secret_access_key=<secret>

# Promote a finished VRM
rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/

# Sync all exports (idempotent)
rclone sync ~/blender-avatars/exports/ gravenhollow:avatar-models/ --exclude "*.blend"

Why rclone over NFS? macOS autofs/NFS mounts are fragile across reboots and network changes. rclone is a single binary, works over HTTPS, and matches the promotion pattern already used in Kasm workflows. The explicit rclone copy command also serves as a deliberate promotion gate — only intentionally promoted models reach production.

5. Avatar Creation Workflow (waterdeep)

  1. Open Blender on waterdeep (native Metal-accelerated)
  2. Enable BlenderMCP → 3D View sidebar → "BlenderMCP" tab → click "Connect"
  3. Open VS Code with Copilot agent mode — BlenderMCP server starts automatically
  4. Create avatars using AI-assisted prompts:
    • "Create an anime-style character with silver hair and a mage outfit"
    • "Apply metallic blue material to the staff"
    • "Rig this character for VRM export with standard humanoid bones"
    • "Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"
  5. Preview in real-time — Metal GPU renders Eevee viewport at 60fps
  6. Promote the finished VRM to gravenhollow via rclone:
    rclone copy ~/blender-avatars/exports/Silver-Mage-v1.vrm gravenhollow:avatar-models/
    
  7. Register in companions-frontend — update AllowedAvatarModels in Go and JS allowlists, commit

6. Workflow Comparison: waterdeep vs Kasm

Aspect waterdeep (local) Kasm (browser)
GPU rendering Metal 16-core GPU — Eevee real-time, Cycles GPU CPU-only software rendering
Viewport FPS 60fps (Metal) 515fps (CPU rasterisation)
MCP latency localhost socket — sub-millisecond Network hop to Kasm container
Memory 48 GB unified, shared with GPU Limited by Kasm container allocation
Sculpting Smooth, hardware-accelerated Laggy, CPU-bound
Asset promotion rclone to gravenhollow RustFS S3 Auto rclone to Quobyte S3 → manual promote to gravenhollow
Access Local only (waterdeep physical/VNC) Any browser, anywhere
Setup Homebrew + manual add-on install Pre-baked in Kasm image
Use when Primary creation workflow Remote access, quick edits, mobile

Security Considerations

  • BlenderMCP's execute_blender_code runs arbitrary Python in Blender — review AI-generated code before execution, especially file I/O operations
  • Telemetry disabled via DISABLE_TELEMETRY=true in MCP server config
  • BlenderMCP socket (port 9876) bound to localhost — not exposed to the network
  • NFS traffic to gravenhollow traverses the LAN — no sensitive data in VRM files
  • waterdeep has no cluster access — compromise doesn't impact Kubernetes workloads
  • .blend source files stay local on waterdeep; only finished VRM exports are promoted to gravenhollow

Future Considerations

  • DGX Spark (ADR-0058): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation
  • Blender + MLX: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature
  • Automated promotion: A file watcher (fswatch/launchd) could auto-run rclone sync when a new VRM appears in ~/blender-avatars/exports/
  • VRM validation: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow