diff --git a/decisions/0059-mac-mini-ray-worker.md b/decisions/0059-mac-mini-ray-worker.md index c3e188d..a7d8b83 100644 --- a/decisions/0059-mac-mini-ray-worker.md +++ b/decisions/0059-mac-mini-ray-worker.md @@ -1,338 +1,294 @@ -# Add Mac Mini M4 Pro (waterdeep) to Ray Cluster as External Worker +# Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation * Status: proposed * Date: 2026-02-16 +* Updated: 2026-02-21 * Deciders: Billy -* Technical Story: Expand Ray cluster with Apple Silicon compute for inference and training +* Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role ## Context and Problem Statement -The homelab Ray cluster currently runs entirely within Kubernetes, with GPU workers pinned to specific nodes: +**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see [ADR-0037](0037-node-naming-conventions.md)). The original proposal was to add it to the Ray cluster as an external inference/training worker, but: -| Node | GPU | Memory | Workload | -|------|-----|--------|----------| -| khelben | Strix Halo (ROCm) | 128 GB unified | vLLM 70B (0.95 GPU) | -| elminster | RTX 2070 (CUDA) | 8 GB VRAM | Whisper (0.5) + TTS (0.5) | -| drizzt | Radeon 680M (ROCm) | 12 GB VRAM | Embeddings (0.8) | -| danilo | Intel Arc (i915) | ~6 GB shared | Reranker (0.8) | +- All Ray inference slots are already allocated and stable — adding a 5th GPU class (MPS) increases complexity without filling a gap +- vLLM's MPS backend remains experimental — not production-ready for serving +- The real unmet need is **3D avatar creation** for companions-frontend ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md)) -All GPUs are fully allocated to inference (see [ADR-0005](0005-multi-gpu-strategy.md), [ADR-0011](0011-kuberay-unified-gpu-backend.md)). Training is currently CPU-only and distributed across cluster nodes via Ray Train ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)). +[ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) describes using BlenderMCP in a Kasm Blender workstation for AI-assisted avatar creation. While Kasm works, it runs Blender inside a DinD container with **no GPU acceleration** — rendering and viewport interaction are CPU-only, which is painfully slow for sculpting, material preview, and VRM export iteration. -**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see [ADR-0037](0037-node-naming-conventions.md)). Its Apple Silicon GPU (MPS backend) and unified memory architecture make it a strong candidate for both inference and training workloads — but macOS cannot run Talos Linux or easily join the Kubernetes cluster as a native node. +waterdeep's M4 Pro has a 16-core GPU with hardware-accelerated Metal rendering and 48 GB of unified memory shared between CPU and GPU. Running Blender natively on waterdeep with BlenderMCP gives a dramatically better 3D creation experience than Kasm. -How do we integrate waterdeep's compute into the Ray cluster without disrupting the existing Kubernetes-managed infrastructure? +How should we use waterdeep to maximise the 3D avatar creation pipeline for companions-frontend? ## Decision Drivers -* 48 GB unified memory is sufficient for medium-large models (e.g., 7B–30B at Q4/Q8 quantisation) -* Apple Silicon MPS backend is supported by PyTorch and vLLM (experimental) -* macOS cannot run Talos Linux — must integrate without Kubernetes -* Ray natively supports heterogeneous clusters with external workers -* Must not impact existing inference serving stability -* Training workloads ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) would benefit from a GPU-accelerated worker -* ARM64 architecture requires compatible Python packages and model formats +* Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance +* waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting +* 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping +* VS Code with Copilot agent mode can drive BlenderMCP locally with zero-latency socket communication (localhost:9876) +* Exported VRM models must reach gravenhollow for production serving ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md)) +* The Kasm Blender workflow from ADR-0062 remains available as a fallback (browser-based, no local install required) +* ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified ## Considered Options -1. **External Ray worker on macOS** — run a Ray worker process natively on waterdeep that connects to the cluster Ray head over the network -2. **Linux VM on Mac** — run UTM/Parallels VM with Linux, join as a Kubernetes node -3. **K3s agent on macOS** — run K3s directly on macOS via Docker Desktop +1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via NFS/rclone +2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training +3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062 ## Decision Outcome -Chosen option: **Option 1 — External Ray worker on macOS**, because Ray natively supports heterogeneous workers joining over the network. This avoids the complexity of running Kubernetes on macOS, lets waterdeep remain a development workstation, and leverages Apple Silicon MPS acceleration transparently through PyTorch. +Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Mini's Metal GPU makes it dramatically better for 3D work than CPU-rendered Kasm, the Ray cluster doesn't need another worker, and the local workflow eliminates network latency between VS Code, the MCP server, and Blender. ### Positive Consequences -* Zero Kubernetes overhead on waterdeep — remains a usable dev workstation -* 48 GB unified memory available for models (vs split VRAM/RAM on discrete GPUs) -* MPS GPU acceleration for both inference and training -* Adds a 5th GPU class to the Ray fleet (Apple MPS alongside ROCm, CUDA, Intel, RDNA2) -* Training jobs ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) gain a GPU-accelerated worker -* Can run a secondary LLM instance for overflow or A/B testing -* Quick to set up — single `ray start` command -* Worker can be stopped/started without affecting the cluster +* Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting +* Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution +* 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure +* VS Code + Copilot agent mode runs natively with full local context for both code and Blender commands +* Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload +* Kasm Blender remains available as a browser-based fallback for remote/mobile access +* Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend ### Negative Consequences -* Not managed by KubeRay or Flux — requires manual or launchd-based lifecycle management -* Network dependency — if waterdeep sleeps or disconnects, Ray tasks on it fail -* MPS backend has limited operator coverage compared to CUDA/ROCm -* Python environment must be maintained separately (not in a container image) -* No Longhorn storage — model cache managed locally or via NFS mount from gravenhollow (nfs-fast) -* Monitoring not automatically scraped by Prometheus (needs node-exporter or push gateway) +* Blender + add-ons must be installed and maintained locally on waterdeep +* Assets created locally need explicit promotion to gravenhollow (vs Kasm's automatic rclone to Quobyte S3) +* waterdeep is a single machine — no redundancy for the 3D creation workflow +* Not managed by Kubernetes or GitOps — relies on manual or Homebrew-managed tooling ## Pros and Cons of the Options -### Option 1: External Ray worker on macOS +### Option 1: Local AI agent on waterdeep -* Good, because Ray is designed for heterogeneous multi-node clusters -* Good, because no VM overhead — full access to Metal/MPS and unified memory -* Good, because waterdeep remains a functional dev workstation -* Good, because trivial to start/stop (single process) -* Bad, because not managed by Kubernetes or GitOps -* Bad, because requires manual Python environment management -* Bad, because MPS support in vLLM is experimental +* Good, because Metal GPU acceleration makes Blender usable for real 3D work (sculpting, rendering, material preview) +* Good, because localhost MCP socket eliminates all network latency +* Good, because 48 GB unified memory supports complex scenes without swapping +* Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer +* Good, because waterdeep stays a dev workstation, aligning with its named role +* Bad, because local-only — no browser-based remote access (use Kasm for that) +* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP) +* Bad, because asset promotion to gravenhollow requires explicit action -### Option 2: Linux VM on Mac +### Option 2: External Ray worker on macOS (original proposal) -* Good, because would be a standard Kubernetes node -* Good, because managed by KubeRay like other workers -* Bad, because VM overhead reduces available memory (hypervisor, guest OS) -* Bad, because no MPS/Metal GPU passthrough to Linux VMs on Apple Silicon -* Bad, because complex to maintain (VM lifecycle, networking, storage) -* Bad, because wastes the primary advantage (Apple Silicon GPU) +* Good, because adds GPU compute to the Ray cluster +* Good, because training jobs gain MPS acceleration +* Bad, because vLLM MPS backend is experimental — not production-ready +* Bad, because adds a 5th GPU class (MPS) to an already complex fleet +* Bad, because Ray GCS port exposure adds security surface +* Bad, because doesn't address the actual unmet need (3D avatar creation) +* Bad, because waterdeep becomes a server, degrading its dev workstation role -### Option 3: K3s agent on macOS +### Option 3: Kasm-only workflow -* Good, because Kubernetes-native, managed by Flux -* Bad, because K3s on macOS requires Docker Desktop (resource overhead) -* Bad, because container networking on macOS is fragile -* Bad, because MPS device access from within Docker containers is unreliable -* Bad, because not a supported K3s configuration +* Good, because browser-based — usable from any device +* Good, because no local installation required +* Bad, because CPU-rendered Blender inside DinD — poor viewport performance +* Bad, because network latency between VS Code and Blender socket +* Bad, because limited memory inside Kasm container +* Bad, because no GPU acceleration for rendering or sculpting ## Architecture ``` -┌──────────────────────────────────────────────────────────────────────────┐ -│ Kubernetes Cluster (Talos) │ -│ │ -│ ┌──────────────────────────────────────────────────────────────────┐ │ -│ │ RayService (ai-inference) — KubeRay managed │ │ -│ │ │ │ -│ │ Head: wulfgar │ │ -│ │ Workers: khelben (ROCm), elminster (CUDA), │ │ -│ │ drizzt (RDNA2), danilo (Intel) │ │ -│ └──────────────────────┬───────────────────────────────────────────┘ │ -│ │ Ray GCS (port 6379) │ -│ │ │ -└─────────────────────────┼────────────────────────────────────────────────┘ - │ Home network (LAN) - │ -┌─────────────────────────┼────────────────────────────────────────────────┐ -│ waterdeep (Mac Mini M4 Pro) │ -│ │ │ -│ ┌──────────────────────▼───────────────────────────────────────────┐ │ -│ │ External Ray Worker (ray start --address=...) │ │ -│ │ │ │ -│ │ • 12-core CPU (8P + 4E) + 16-core Neural Engine │ │ -│ │ • 48 GB unified memory (shared CPU/GPU) │ │ -│ │ • MPS (Metal) GPU backend via PyTorch │ │ -│ │ • Custom resource: gpu_apple_mps: 1 │ │ -│ │ │ │ -│ │ Workloads: │ │ -│ │ ├── Inference: secondary LLM (7B–30B), overflow serving │ │ -│ │ └── Training: LoRA/QLoRA fine-tuning via Ray Train │ │ -│ └──────────────────────────────────────────────────────────────────┘ │ -│ │ -│ Model cache: ~/Library/Caches/huggingface + NFS mount (gravenhollow) │ -└──────────────────────────────────────────────────────────────────────────┘ +┌─────────────────────────────────────────────────────────────────────────┐ +│ waterdeep (Mac Mini M4 Pro · 48 GB unified · Metal GPU) │ +│ │ +│ ┌──────────────────────────────────────────────────────┐ │ +│ │ VS Code + GitHub Copilot (agent mode) │ │ +│ │ │ │ +│ │ BlenderMCP Server (uvx blender-mcp) │ │ +│ │ DISABLE_TELEMETRY=true │ │ +│ │ │ │ │ +│ │ │ TCP localhost:9876 (zero latency) │ │ +│ │ ▼ │ │ +│ └─────────┬────────────────────────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────┐ │ +│ │ Blender 4.x (native macOS) │ │ +│ │ │ │ +│ │ Renderer: Metal (Eevee real-time + Cycles GPU) │ │ +│ │ Add-ons: │ │ +│ │ • BlenderMCP (addon.py) — socket server :9876 │ │ +│ │ • VRM Add-on for Blender — import/export VRM │ │ +│ │ │ │ +│ │ Working files: ~/blender-avatars/ │ │ +│ │ ├── projects/ (.blend source files) │ │ +│ │ ├── exports/ (.vrm exported models) │ │ +│ │ └── textures/ (shared texture library) │ │ +│ └──────────────────────────────────────────────────────┘ │ +│ │ │ +│ NFS mount or rclone │ +│ (asset promotion) │ +└──────────────────────────┼──────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ gravenhollow.lab.daviestechlabs.io │ +│ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │ +│ │ +│ NFS: /mnt/gravenhollow/kubernetes/avatar-models/ │ +│ ├── Seed-san.vrm (default model) │ +│ ├── Companion-A.vrm (promoted from waterdeep) │ +│ └── animations/ (shared animation clips) │ +│ │ +│ S3 (RustFS): avatar-models bucket │ +│ (same data, served via Cloudflare Tunnel for remote users) │ +└──────────────────────────┬──────────────────────────────────────────────┘ + │ + ┌────────────┴───────────────┐ + │ │ + NFS (nfs-fast PVC) Cloudflare Tunnel + │ (assets.daviestechlabs.io) + ▼ │ +┌──────────────────────────┐ ▼ +│ companions-frontend │ ┌──────────────────────────┐ +│ (Kubernetes pod) │ │ Remote users (CDN-cached │ +│ LAN users │ │ via Cloudflare edge) │ +└──────────────────────────┘ └──────────────────────────┘ ``` -## Updated GPU Fleet - -| Node | GPU | Backend | Memory | Custom Resource | Workload | -|------|-----|---------|--------|-----------------|----------| -| khelben | Strix Halo | ROCm | 128 GB unified | `gpu_strixhalo: 1` | vLLM 70B | -| elminster | RTX 2070 | CUDA | 8 GB VRAM | `gpu_nvidia: 1` | Whisper + TTS | -| drizzt | Radeon 680M | ROCm | 12 GB VRAM | `gpu_rdna2: 1` | Embeddings | -| danilo | Intel Arc | i915/IPEX | ~6 GB shared | `gpu_intel: 1` | Reranker | -| **waterdeep** | **M4 Pro** | **MPS (Metal)** | **48 GB unified** | **`gpu_apple_mps: 1`** | **LLM (7B–30B) + Training** | - ## Implementation Plan -### 1. Network Prerequisites - -waterdeep must be able to reach the Ray head node's GCS port: +### 1. Install Blender and Add-ons ```bash -# From waterdeep, verify connectivity -nc -zv 6379 +# Install Blender via Homebrew +brew install --cask blender + +# Download BlenderMCP add-on +curl -LO https://raw.githubusercontent.com/ahujasid/blender-mcp/main/addon.py + +# Install in Blender: +# Edit > Preferences > Add-ons > Install... > select addon.py +# Enable "Interface: Blender MCP" + +# Install VRM Add-on for Blender: +# Download from https://vrm-addon-for-blender.info/en/ +# Edit > Preferences > Add-ons > Install... > select VRM add-on zip +# Enable "Import-Export: VRM" ``` -The Ray head service (`ai-inference-raycluster-head-svc`) is ClusterIP-only. Options to expose it: +### 2. VS Code MCP Configuration -| Approach | Complexity | Recommended | -|----------|-----------|-------------| -| NodePort service on port 6379 | Low | For initial setup | -| Envoy Gateway TCPRoute | Medium | For production use | -| Tailscale/WireGuard mesh | Medium | If already in use | +```json +// .vscode/mcp.json (in companions-frontend or global settings) +{ + "servers": { + "blender": { + "command": "uvx", + "args": ["blender-mcp"], + "env": { + "BLENDER_HOST": "localhost", + "BLENDER_PORT": "9876", + "DISABLE_TELEMETRY": "true" + } + } + } +} +``` -### 2. Python Environment on waterdeep +### 3. Python Environment for BlenderMCP ```bash # Install uv (per ADR-0012) curl -LsSf https://astral.sh/uv/install.sh | sh -# Create Ray worker environment -uv venv ~/ray-worker --python 3.12 -source ~/ray-worker/bin/activate - -# Install Ray with ML dependencies -uv pip install "ray[default]==2.53.0" torch torchvision torchaudio \ - transformers accelerate peft bitsandbytes \ - ray-serve-apps # internal package from Gitea PyPI - -# Verify MPS availability -python -c "import torch; print(torch.backends.mps.is_available())" +# uvx handles the BlenderMCP server environment automatically +# Verify it works: +uvx blender-mcp --help ``` -### 3. Start Ray Worker +### 4. NFS Mount for Asset Promotion + +Mount gravenhollow's avatar-models directory for direct promotion of finished VRM exports: ```bash -# Join the cluster with custom resources -ray start \ - --address=":6379" \ - --num-cpus=12 \ - --num-gpus=1 \ - --resources='{"gpu_apple_mps": 1}' \ - --block +# Create mount point +sudo mkdir -p /Volumes/avatar-models + +# Mount gravenhollow NFS (all-SSD, dual 10GbE) +sudo mount -t nfs \ + gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models \ + /Volumes/avatar-models + +# Add to /etc/auto_master for persistent mount (macOS autofs) +# /Volumes/avatar-models -fstype=nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models ``` -### 4. launchd Service (Persistent) - -```xml - - - - - - Label - io.ray.worker - ProgramArguments - - /Users/billy/ray-worker/bin/ray - start - --address=RAY_HEAD_IP:6379 - --num-cpus=12 - --num-gpus=1 - --resources={"gpu_apple_mps": 1} - --block - - RunAtLoad - - KeepAlive - - StandardOutPath - /tmp/ray-worker.log - StandardErrorPath - /tmp/ray-worker-error.log - EnvironmentVariables - - PATH - /Users/billy/ray-worker/bin:/usr/local/bin:/usr/bin:/bin - - - -``` +Alternatively, use rclone for S3-based promotion: ```bash -launchctl load ~/Library/LaunchAgents/io.ray.worker.plist +# Install rclone +brew install rclone + +# Configure gravenhollow RustFS endpoint +rclone config create gravenhollow s3 \ + provider=Other \ + endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \ + access_key_id= \ + secret_access_key= + +# Promote a finished VRM +rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/ ``` -### 5. Model Cache via NFS +### 5. Avatar Creation Workflow (waterdeep) -Mount the gravenhollow NFS share on waterdeep so models are shared with the cluster via the fast all-SSD NAS: +1. **Open Blender** on waterdeep (native Metal-accelerated) +2. **Enable BlenderMCP** → 3D View sidebar → "BlenderMCP" tab → click "Connect" +3. **Open VS Code** with Copilot agent mode — BlenderMCP server starts automatically +4. **Create avatars** using AI-assisted prompts: + - _"Create an anime-style character with silver hair and a mage outfit"_ + - _"Apply metallic blue material to the staff"_ + - _"Rig this character for VRM export with standard humanoid bones"_ + - _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_ +5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps +6. **Promote** the finished VRM to gravenhollow: + ```bash + cp ~/blender-avatars/exports/Silver-Mage-v1.vrm /Volumes/avatar-models/ + ``` +7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit -```bash -# Mount gravenhollow NFS share (all-SSD, dual 10GbE) -sudo mount -t nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models \ - /Volumes/model-cache +### 6. Workflow Comparison: waterdeep vs Kasm -# Or add to /etc/fstab for persistence -# gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models /Volumes/model-cache nfs rw 0 0 - -# Symlink to HuggingFace cache location -ln -s /Volumes/model-cache ~/.cache/huggingface/hub -``` - -### 6. Ray Serve Deployment Targeting - -To schedule a deployment specifically on waterdeep, use the `gpu_apple_mps` custom resource in the RayService config: - -```yaml -# In rayservice.yaml serveConfigV2 -- name: llm-secondary - route_prefix: /llm-secondary - import_path: ray_serve.serve_llm:app - runtime_env: - env_vars: - MODEL_ID: "Qwen/Qwen2.5-32B-Instruct-AWQ" - DEVICE: "mps" - MAX_MODEL_LEN: "4096" - deployments: - - name: LLMDeployment - num_replicas: 1 - ray_actor_options: - num_gpus: 0.95 - resources: - gpu_apple_mps: 1 -``` - -### 7. Training Integration - -Ray Train jobs from [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) will automatically discover waterdeep as an available worker. To prefer it for GPU-accelerated training: - -```python -# In cpu_training_pipeline.py — updated to prefer MPS when available -trainer = TorchTrainer( - train_func, - scaling_config=ScalingConfig( - num_workers=1, - use_gpu=True, - resources_per_worker={"gpu_apple_mps": 1}, - ), -) -``` - -## Monitoring - -Since waterdeep is not a Kubernetes node, standard Prometheus scraping won't reach it. Options: - -| Approach | Notes | -|----------|-------| -| Prometheus push gateway | Ray worker pushes metrics periodically | -| Node-exporter on macOS | Homebrew `node_exporter`, scraped by Prometheus via static target | -| Ray Dashboard | Already shows all connected workers (ray-serve.lab.daviestechlabs.io) | - -The Ray Dashboard at `ray-serve.lab.daviestechlabs.io` will automatically show waterdeep as a connected node with its resources, tasks, and memory usage — no additional configuration needed. - -## Power Management - -To prevent macOS from sleeping and disconnecting the Ray worker: - -```bash -# Disable sleep when on power adapter -sudo pmset -c sleep 0 displaysleep 0 disksleep 0 - -# Or use caffeinate for the Ray process -caffeinate -s ray start --address=... --block -``` +| Aspect | waterdeep (local) | Kasm (browser) | +|--------|-------------------|----------------| +| **GPU rendering** | Metal 16-core GPU — Eevee real-time, Cycles GPU | CPU-only software rendering | +| **Viewport FPS** | 60fps (Metal) | 5–15fps (CPU rasterisation) | +| **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container | +| **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation | +| **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound | +| **Asset promotion** | NFS mount or rclone to gravenhollow | Auto rclone to Quobyte S3 → manual promote to gravenhollow | +| **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere | +| **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image | +| **Use when** | Primary creation workflow | Remote access, quick edits, mobile | ## Security Considerations -* Ray's GCS port (6379) will be exposed outside the cluster — restrict with firewall rules to waterdeep's IP only -* The Ray worker has no RBAC — it executes whatever tasks the head assigns -* Model weights on NFS are read-only from waterdeep (mount with `ro` option if possible) -* NFS traffic to gravenhollow traverses the LAN — ensure dual 10GbE links are active -* Consider Tailscale or WireGuard for encrypted transport if the Ray GCS traffic crosses untrusted network segments +* BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender — review AI-generated code before execution, especially file I/O operations +* Telemetry disabled via `DISABLE_TELEMETRY=true` in MCP server config +* BlenderMCP socket (port 9876) bound to localhost — not exposed to the network +* NFS traffic to gravenhollow traverses the LAN — no sensitive data in VRM files +* waterdeep has no cluster access — compromise doesn't impact Kubernetes workloads +* `.blend` source files stay local on waterdeep; only finished VRM exports are promoted to gravenhollow ## Future Considerations -* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, waterdeep can shift to secondary inference while DGX Spark handles training -* **vLLM MPS maturity**: As vLLM's MPS backend matures, waterdeep could serve larger models more efficiently -* **MLX backend**: Apple's MLX framework may provide better performance than PyTorch MPS for some workloads — worth evaluating as an alternative serving backend -* **Second Mac Mini**: If another Apple Silicon node is added, the external-worker pattern scales trivially +* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation +* **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature +* **Automated promotion**: A file watcher (fswatch/launchd) could auto-promote VRM exports from `~/blender-avatars/exports/` to gravenhollow when a new file appears +* **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow ## Links -* [Ray Clusters — Adding External Workers](https://docs.ray.io/en/latest/cluster/vms/getting-started.html) -* [PyTorch MPS Backend](https://pytorch.org/docs/stable/notes/mps.html) -* [vLLM Apple Silicon Support](https://docs.vllm.ai/en/latest/) -* Related: [ADR-0005](0005-multi-gpu-strategy.md) — Multi-GPU strategy -* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend -* Related: [ADR-0024](0024-ray-repository-structure.md) — Ray repository structure -* Related: [ADR-0035](0035-arm64-worker-strategy.md) — ARM64 worker strategy -* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions -* Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy +* Related: [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) — BlenderMCP 3D avatar workflow (Kasm + deployment architecture) +* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend architecture (Three.js + VRM avatars) +* Related: [ADR-0026](0026-storage-strategy.md) — Storage strategy (gravenhollow NFS-fast) +* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions (waterdeep) +* Related: [ADR-0012](0012-use-uv-for-python-development.md) — uv for Python development +* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp) +* [Blender Metal GPU Rendering](https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html) +* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/) +* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) diff --git a/decisions/0062-blender-mcp-3d-avatar-workflow.md b/decisions/0062-blender-mcp-3d-avatar-workflow.md index 266c0ef..954b7bc 100644 --- a/decisions/0062-blender-mcp-3d-avatar-workflow.md +++ b/decisions/0062-blender-mcp-3d-avatar-workflow.md @@ -437,6 +437,7 @@ VRM files are immutable once promoted — updated models get a new filename (e.g * Related to [ADR-0026](0026-storage-strategy.md) (storage strategy — gravenhollow NFS-fast, Quobyte S3, rclone) * Related to [ADR-0044](0044-dns-and-external-access.md) (DNS and external access — Cloudflare Tunnel, split-horizon) * Related to [ADR-0049](0049-self-hosted-productivity-suite.md) (Kasm Workspaces) +* Related to [ADR-0059](0059-mac-mini-ray-worker.md) (waterdeep as local AI agent — primary 3D creation workstation with Metal GPU) * [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp) * [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/) * [VRM Specification](https://vrm.dev/en/)