updates to adrs and fixing to reflect go refactor.

2026-02-23 06:14:23 -05:00
parent f19fa3e969
commit 100ba21eba
7 changed files with 181 additions and 129 deletions
--- a/decisions/0059-mac-mini-ray-worker.md
+++ b/decisions/0059-mac-mini-ray-worker.md
@@ -1,8 +1,8 @@
 # Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation

-* Status: proposed
+* Status: accepted
 * Date: 2026-02-16
-* Updated: 2026-02-21
+* Updated: 2026-02-23
 * Deciders: Billy
 * Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role

@@ -25,14 +25,15 @@ How should we use waterdeep to maximise the 3D avatar creation pipeline for comp
 * Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance
 * waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting
 * 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping
-* VS Code with Copilot agent mode can drive BlenderMCP locally with zero-latency socket communication (localhost:9876)
+* VS Code with Copilot agent mode and BlenderMCP server are installed on waterdeep — VS Code drives Blender via localhost:9876 with zero-latency socket communication
 * Exported VRM models must reach gravenhollow for production serving ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md))
+* **rclone** chosen for asset promotion to gravenhollow's RustFS S3 endpoint — simpler than NFS mounts on macOS, consistent with existing Kasm rclone patterns, and avoids autofs/NFS fstab complexity
 * The Kasm Blender workflow from ADR-0062 remains available as a fallback (browser-based, no local install required)
 * ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified

 ## Considered Options

-1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via NFS/rclone
+1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via rclone (S3)
 2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training
 3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062

@@ -45,17 +46,18 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
 * Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting
 * Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution
 * 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure
-* VS Code + Copilot agent mode runs natively with full local context for both code and Blender commands
+* VS Code + Copilot agent mode + BlenderMCP server installed natively — single editor drives both code and Blender commands
+* rclone for asset promotion — consistent with Kasm rclone patterns, avoids macOS NFS/autofs complexity
 * Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload
 * Kasm Blender remains available as a browser-based fallback for remote/mobile access
 * Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend

 ### Negative Consequences

-* Blender + add-ons must be installed and maintained locally on waterdeep
-* Assets created locally need explicit promotion to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
+* Blender, VS Code, and add-ons must be installed and maintained locally on waterdeep via Homebrew
+* Assets created locally need explicit `rclone copy` to promote to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
 * waterdeep is a single machine — no redundancy for the 3D creation workflow
-* Not managed by Kubernetes or GitOps — relies on manual or Homebrew-managed tooling
+* Not managed by Kubernetes or GitOps — relies on Homebrew-managed tooling

 ## Pros and Cons of the Options

@@ -67,8 +69,8 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
 * Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer
 * Good, because waterdeep stays a dev workstation, aligning with its named role
 * Bad, because local-only — no browser-based remote access (use Kasm for that)
-* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP)
-* Bad, because asset promotion to gravenhollow requires explicit action
+* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP, VS Code)
+* Bad, because asset promotion to gravenhollow requires explicit rclone command

 ### Option 2: External Ray worker on macOS (original proposal)

@@ -119,8 +121,8 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
 │  │  └── textures/          (shared texture library)     │              │
 │  └──────────────────────────────────────────────────────┘              │
 │                          │                                              │
-│                    NFS mount or rclone                                   │
-│                    (asset promotion)                                     │
+│                    rclone (S3 asset promotion)                           │
+│                    gravenhollow RustFS :30292                            │
 └──────────────────────────┼──────────────────────────────────────────────┘
                           │
                           ▼
@@ -200,24 +202,9 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
 uvx blender-mcp --help
 ```

-### 4. NFS Mount for Asset Promotion
+### 4. rclone for Asset Promotion

-Mount gravenhollow's avatar-models directory for direct promotion of finished VRM exports:
-
-```bash
-# Create mount point
-sudo mkdir -p /Volumes/avatar-models
-
-# Mount gravenhollow NFS (all-SSD, dual 10GbE)
-sudo mount -t nfs \
-    gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models \
-    /Volumes/avatar-models
-
-# Add to /etc/auto_master for persistent mount (macOS autofs)
-# /Volumes/avatar-models  -fstype=nfs  gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models
-```
-
-Alternatively, use rclone for S3-based promotion:
+Use rclone to promote finished VRM exports to gravenhollow's RustFS S3 endpoint. This is consistent with the Kasm rclone volume plugin pattern from [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) and avoids macOS NFS/autofs complexity.

 ```bash
 # Install rclone
@@ -232,8 +219,13 @@ rclone config create gravenhollow s3 \

 # Promote a finished VRM
 rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/
+
+# Sync all exports (idempotent)
+rclone sync ~/blender-avatars/exports/ gravenhollow:avatar-models/ --exclude "*.blend"
 ```

+> **Why rclone over NFS?** macOS autofs/NFS mounts are fragile across reboots and network changes. rclone is a single binary, works over HTTPS, and matches the promotion pattern already used in Kasm workflows. The explicit `rclone copy` command also serves as a deliberate promotion gate — only intentionally promoted models reach production.
+
 ### 5. Avatar Creation Workflow (waterdeep)

 1. **Open Blender** on waterdeep (native Metal-accelerated)
@@ -245,9 +237,9 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models
   - _"Rig this character for VRM export with standard humanoid bones"_
   - _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_
 5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps
-6. **Promote** the finished VRM to gravenhollow:
+6. **Promote** the finished VRM to gravenhollow via rclone:
   ```bash
-   cp ~/blender-avatars/exports/Silver-Mage-v1.vrm /Volumes/avatar-models/
+   rclone copy ~/blender-avatars/exports/Silver-Mage-v1.vrm gravenhollow:avatar-models/
   ```
 7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit

@@ -260,7 +252,7 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models
 | **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container |
 | **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation |
 | **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound |
-| **Asset promotion** | NFS mount or rclone to gravenhollow | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
+| **Asset promotion** | rclone to gravenhollow RustFS S3 | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
 | **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere |
 | **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image |
 | **Use when** | Primary creation workflow | Remote access, quick edits, mobile |
@@ -278,7 +270,7 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models

 * **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation
 * **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature
-* **Automated promotion**: A file watcher (fswatch/launchd) could auto-promote VRM exports from `~/blender-avatars/exports/` to gravenhollow when a new file appears
+* **Automated promotion**: A file watcher (fswatch/launchd) could auto-run `rclone sync` when a new VRM appears in `~/blender-avatars/exports/`
 * **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow

 ## Links