ADR-0059: repurpose waterdeep from Ray worker to local AI agent
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m6s
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m6s
Replace the proposed Ray cluster worker role with a dedicated local AI agent for BlenderMCP 3D avatar creation (supporting ADR-0062). waterdeep's Metal GPU provides hardware-accelerated rendering in Blender — far superior to Kasm's CPU-only DinD environment. The Ray cluster GPU fleet is fully allocated and stable; adding MPS complexity is not justified. Also adds cross-reference from ADR-0062 to ADR-0059.
This commit is contained in:
@@ -1,338 +1,294 @@
|
|||||||
# Add Mac Mini M4 Pro (waterdeep) to Ray Cluster as External Worker
|
# Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation
|
||||||
|
|
||||||
* Status: proposed
|
* Status: proposed
|
||||||
* Date: 2026-02-16
|
* Date: 2026-02-16
|
||||||
|
* Updated: 2026-02-21
|
||||||
* Deciders: Billy
|
* Deciders: Billy
|
||||||
* Technical Story: Expand Ray cluster with Apple Silicon compute for inference and training
|
* Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role
|
||||||
|
|
||||||
## Context and Problem Statement
|
## Context and Problem Statement
|
||||||
|
|
||||||
The homelab Ray cluster currently runs entirely within Kubernetes, with GPU workers pinned to specific nodes:
|
**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see [ADR-0037](0037-node-naming-conventions.md)). The original proposal was to add it to the Ray cluster as an external inference/training worker, but:
|
||||||
|
|
||||||
| Node | GPU | Memory | Workload |
|
- All Ray inference slots are already allocated and stable — adding a 5th GPU class (MPS) increases complexity without filling a gap
|
||||||
|------|-----|--------|----------|
|
- vLLM's MPS backend remains experimental — not production-ready for serving
|
||||||
| khelben | Strix Halo (ROCm) | 128 GB unified | vLLM 70B (0.95 GPU) |
|
- The real unmet need is **3D avatar creation** for companions-frontend ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md))
|
||||||
| elminster | RTX 2070 (CUDA) | 8 GB VRAM | Whisper (0.5) + TTS (0.5) |
|
|
||||||
| drizzt | Radeon 680M (ROCm) | 12 GB VRAM | Embeddings (0.8) |
|
|
||||||
| danilo | Intel Arc (i915) | ~6 GB shared | Reranker (0.8) |
|
|
||||||
|
|
||||||
All GPUs are fully allocated to inference (see [ADR-0005](0005-multi-gpu-strategy.md), [ADR-0011](0011-kuberay-unified-gpu-backend.md)). Training is currently CPU-only and distributed across cluster nodes via Ray Train ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)).
|
[ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) describes using BlenderMCP in a Kasm Blender workstation for AI-assisted avatar creation. While Kasm works, it runs Blender inside a DinD container with **no GPU acceleration** — rendering and viewport interaction are CPU-only, which is painfully slow for sculpting, material preview, and VRM export iteration.
|
||||||
|
|
||||||
**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see [ADR-0037](0037-node-naming-conventions.md)). Its Apple Silicon GPU (MPS backend) and unified memory architecture make it a strong candidate for both inference and training workloads — but macOS cannot run Talos Linux or easily join the Kubernetes cluster as a native node.
|
waterdeep's M4 Pro has a 16-core GPU with hardware-accelerated Metal rendering and 48 GB of unified memory shared between CPU and GPU. Running Blender natively on waterdeep with BlenderMCP gives a dramatically better 3D creation experience than Kasm.
|
||||||
|
|
||||||
How do we integrate waterdeep's compute into the Ray cluster without disrupting the existing Kubernetes-managed infrastructure?
|
How should we use waterdeep to maximise the 3D avatar creation pipeline for companions-frontend?
|
||||||
|
|
||||||
## Decision Drivers
|
## Decision Drivers
|
||||||
|
|
||||||
* 48 GB unified memory is sufficient for medium-large models (e.g., 7B–30B at Q4/Q8 quantisation)
|
* Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance
|
||||||
* Apple Silicon MPS backend is supported by PyTorch and vLLM (experimental)
|
* waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting
|
||||||
* macOS cannot run Talos Linux — must integrate without Kubernetes
|
* 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping
|
||||||
* Ray natively supports heterogeneous clusters with external workers
|
* VS Code with Copilot agent mode can drive BlenderMCP locally with zero-latency socket communication (localhost:9876)
|
||||||
* Must not impact existing inference serving stability
|
* Exported VRM models must reach gravenhollow for production serving ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md))
|
||||||
* Training workloads ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) would benefit from a GPU-accelerated worker
|
* The Kasm Blender workflow from ADR-0062 remains available as a fallback (browser-based, no local install required)
|
||||||
* ARM64 architecture requires compatible Python packages and model formats
|
* ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified
|
||||||
|
|
||||||
## Considered Options
|
## Considered Options
|
||||||
|
|
||||||
1. **External Ray worker on macOS** — run a Ray worker process natively on waterdeep that connects to the cluster Ray head over the network
|
1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via NFS/rclone
|
||||||
2. **Linux VM on Mac** — run UTM/Parallels VM with Linux, join as a Kubernetes node
|
2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training
|
||||||
3. **K3s agent on macOS** — run K3s directly on macOS via Docker Desktop
|
3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062
|
||||||
|
|
||||||
## Decision Outcome
|
## Decision Outcome
|
||||||
|
|
||||||
Chosen option: **Option 1 — External Ray worker on macOS**, because Ray natively supports heterogeneous workers joining over the network. This avoids the complexity of running Kubernetes on macOS, lets waterdeep remain a development workstation, and leverages Apple Silicon MPS acceleration transparently through PyTorch.
|
Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Mini's Metal GPU makes it dramatically better for 3D work than CPU-rendered Kasm, the Ray cluster doesn't need another worker, and the local workflow eliminates network latency between VS Code, the MCP server, and Blender.
|
||||||
|
|
||||||
### Positive Consequences
|
### Positive Consequences
|
||||||
|
|
||||||
* Zero Kubernetes overhead on waterdeep — remains a usable dev workstation
|
* Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting
|
||||||
* 48 GB unified memory available for models (vs split VRAM/RAM on discrete GPUs)
|
* Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution
|
||||||
* MPS GPU acceleration for both inference and training
|
* 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure
|
||||||
* Adds a 5th GPU class to the Ray fleet (Apple MPS alongside ROCm, CUDA, Intel, RDNA2)
|
* VS Code + Copilot agent mode runs natively with full local context for both code and Blender commands
|
||||||
* Training jobs ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) gain a GPU-accelerated worker
|
* Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload
|
||||||
* Can run a secondary LLM instance for overflow or A/B testing
|
* Kasm Blender remains available as a browser-based fallback for remote/mobile access
|
||||||
* Quick to set up — single `ray start` command
|
* Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend
|
||||||
* Worker can be stopped/started without affecting the cluster
|
|
||||||
|
|
||||||
### Negative Consequences
|
### Negative Consequences
|
||||||
|
|
||||||
* Not managed by KubeRay or Flux — requires manual or launchd-based lifecycle management
|
* Blender + add-ons must be installed and maintained locally on waterdeep
|
||||||
* Network dependency — if waterdeep sleeps or disconnects, Ray tasks on it fail
|
* Assets created locally need explicit promotion to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
|
||||||
* MPS backend has limited operator coverage compared to CUDA/ROCm
|
* waterdeep is a single machine — no redundancy for the 3D creation workflow
|
||||||
* Python environment must be maintained separately (not in a container image)
|
* Not managed by Kubernetes or GitOps — relies on manual or Homebrew-managed tooling
|
||||||
* No Longhorn storage — model cache managed locally or via NFS mount from gravenhollow (nfs-fast)
|
|
||||||
* Monitoring not automatically scraped by Prometheus (needs node-exporter or push gateway)
|
|
||||||
|
|
||||||
## Pros and Cons of the Options
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
### Option 1: External Ray worker on macOS
|
### Option 1: Local AI agent on waterdeep
|
||||||
|
|
||||||
* Good, because Ray is designed for heterogeneous multi-node clusters
|
* Good, because Metal GPU acceleration makes Blender usable for real 3D work (sculpting, rendering, material preview)
|
||||||
* Good, because no VM overhead — full access to Metal/MPS and unified memory
|
* Good, because localhost MCP socket eliminates all network latency
|
||||||
* Good, because waterdeep remains a functional dev workstation
|
* Good, because 48 GB unified memory supports complex scenes without swapping
|
||||||
* Good, because trivial to start/stop (single process)
|
* Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer
|
||||||
* Bad, because not managed by Kubernetes or GitOps
|
* Good, because waterdeep stays a dev workstation, aligning with its named role
|
||||||
* Bad, because requires manual Python environment management
|
* Bad, because local-only — no browser-based remote access (use Kasm for that)
|
||||||
* Bad, because MPS support in vLLM is experimental
|
* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP)
|
||||||
|
* Bad, because asset promotion to gravenhollow requires explicit action
|
||||||
|
|
||||||
### Option 2: Linux VM on Mac
|
### Option 2: External Ray worker on macOS (original proposal)
|
||||||
|
|
||||||
* Good, because would be a standard Kubernetes node
|
* Good, because adds GPU compute to the Ray cluster
|
||||||
* Good, because managed by KubeRay like other workers
|
* Good, because training jobs gain MPS acceleration
|
||||||
* Bad, because VM overhead reduces available memory (hypervisor, guest OS)
|
* Bad, because vLLM MPS backend is experimental — not production-ready
|
||||||
* Bad, because no MPS/Metal GPU passthrough to Linux VMs on Apple Silicon
|
* Bad, because adds a 5th GPU class (MPS) to an already complex fleet
|
||||||
* Bad, because complex to maintain (VM lifecycle, networking, storage)
|
* Bad, because Ray GCS port exposure adds security surface
|
||||||
* Bad, because wastes the primary advantage (Apple Silicon GPU)
|
* Bad, because doesn't address the actual unmet need (3D avatar creation)
|
||||||
|
* Bad, because waterdeep becomes a server, degrading its dev workstation role
|
||||||
|
|
||||||
### Option 3: K3s agent on macOS
|
### Option 3: Kasm-only workflow
|
||||||
|
|
||||||
* Good, because Kubernetes-native, managed by Flux
|
* Good, because browser-based — usable from any device
|
||||||
* Bad, because K3s on macOS requires Docker Desktop (resource overhead)
|
* Good, because no local installation required
|
||||||
* Bad, because container networking on macOS is fragile
|
* Bad, because CPU-rendered Blender inside DinD — poor viewport performance
|
||||||
* Bad, because MPS device access from within Docker containers is unreliable
|
* Bad, because network latency between VS Code and Blender socket
|
||||||
* Bad, because not a supported K3s configuration
|
* Bad, because limited memory inside Kasm container
|
||||||
|
* Bad, because no GPU acceleration for rendering or sculpting
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
┌──────────────────────────────────────────────────────────────────────────┐
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
│ Kubernetes Cluster (Talos) │
|
│ waterdeep (Mac Mini M4 Pro · 48 GB unified · Metal GPU) │
|
||||||
│ │
|
│ │
|
||||||
│ ┌──────────────────────────────────────────────────────────────────┐ │
|
│ ┌──────────────────────────────────────────────────────┐ │
|
||||||
│ │ RayService (ai-inference) — KubeRay managed │ │
|
│ │ VS Code + GitHub Copilot (agent mode) │ │
|
||||||
│ │ │ │
|
│ │ │ │
|
||||||
│ │ Head: wulfgar │ │
|
│ │ BlenderMCP Server (uvx blender-mcp) │ │
|
||||||
│ │ Workers: khelben (ROCm), elminster (CUDA), │ │
|
│ │ DISABLE_TELEMETRY=true │ │
|
||||||
│ │ drizzt (RDNA2), danilo (Intel) │ │
|
│ │ │ │ │
|
||||||
│ └──────────────────────┬───────────────────────────────────────────┘ │
|
│ │ │ TCP localhost:9876 (zero latency) │ │
|
||||||
│ │ Ray GCS (port 6379) │
|
│ │ ▼ │ │
|
||||||
|
│ └─────────┬────────────────────────────────────────────┘ │
|
||||||
│ │ │
|
│ │ │
|
||||||
└─────────────────────────┼────────────────────────────────────────────────┘
|
│ ┌─────────▼────────────────────────────────────────────┐ │
|
||||||
│ Home network (LAN)
|
│ │ Blender 4.x (native macOS) │ │
|
||||||
|
│ │ │ │
|
||||||
|
│ │ Renderer: Metal (Eevee real-time + Cycles GPU) │ │
|
||||||
|
│ │ Add-ons: │ │
|
||||||
|
│ │ • BlenderMCP (addon.py) — socket server :9876 │ │
|
||||||
|
│ │ • VRM Add-on for Blender — import/export VRM │ │
|
||||||
|
│ │ │ │
|
||||||
|
│ │ Working files: ~/blender-avatars/ │ │
|
||||||
|
│ │ ├── projects/ (.blend source files) │ │
|
||||||
|
│ │ ├── exports/ (.vrm exported models) │ │
|
||||||
|
│ │ └── textures/ (shared texture library) │ │
|
||||||
|
│ └──────────────────────────────────────────────────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ NFS mount or rclone │
|
||||||
|
│ (asset promotion) │
|
||||||
|
└──────────────────────────┼──────────────────────────────────────────────┘
|
||||||
│
|
│
|
||||||
┌─────────────────────────┼────────────────────────────────────────────────┐
|
▼
|
||||||
│ waterdeep (Mac Mini M4 Pro) │
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
│ │ │
|
│ gravenhollow.lab.daviestechlabs.io │
|
||||||
│ ┌──────────────────────▼───────────────────────────────────────────┐ │
|
│ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │
|
||||||
│ │ External Ray Worker (ray start --address=...) │ │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ • 12-core CPU (8P + 4E) + 16-core Neural Engine │ │
|
|
||||||
│ │ • 48 GB unified memory (shared CPU/GPU) │ │
|
|
||||||
│ │ • MPS (Metal) GPU backend via PyTorch │ │
|
|
||||||
│ │ • Custom resource: gpu_apple_mps: 1 │ │
|
|
||||||
│ │ │ │
|
|
||||||
│ │ Workloads: │ │
|
|
||||||
│ │ ├── Inference: secondary LLM (7B–30B), overflow serving │ │
|
|
||||||
│ │ └── Training: LoRA/QLoRA fine-tuning via Ray Train │ │
|
|
||||||
│ └──────────────────────────────────────────────────────────────────┘ │
|
|
||||||
│ │
|
│ │
|
||||||
│ Model cache: ~/Library/Caches/huggingface + NFS mount (gravenhollow) │
|
│ NFS: /mnt/gravenhollow/kubernetes/avatar-models/ │
|
||||||
└──────────────────────────────────────────────────────────────────────────┘
|
│ ├── Seed-san.vrm (default model) │
|
||||||
|
│ ├── Companion-A.vrm (promoted from waterdeep) │
|
||||||
|
│ └── animations/ (shared animation clips) │
|
||||||
|
│ │
|
||||||
|
│ S3 (RustFS): avatar-models bucket │
|
||||||
|
│ (same data, served via Cloudflare Tunnel for remote users) │
|
||||||
|
└──────────────────────────┬──────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌────────────┴───────────────┐
|
||||||
|
│ │
|
||||||
|
NFS (nfs-fast PVC) Cloudflare Tunnel
|
||||||
|
│ (assets.daviestechlabs.io)
|
||||||
|
▼ │
|
||||||
|
┌──────────────────────────┐ ▼
|
||||||
|
│ companions-frontend │ ┌──────────────────────────┐
|
||||||
|
│ (Kubernetes pod) │ │ Remote users (CDN-cached │
|
||||||
|
│ LAN users │ │ via Cloudflare edge) │
|
||||||
|
└──────────────────────────┘ └──────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
## Updated GPU Fleet
|
|
||||||
|
|
||||||
| Node | GPU | Backend | Memory | Custom Resource | Workload |
|
|
||||||
|------|-----|---------|--------|-----------------|----------|
|
|
||||||
| khelben | Strix Halo | ROCm | 128 GB unified | `gpu_strixhalo: 1` | vLLM 70B |
|
|
||||||
| elminster | RTX 2070 | CUDA | 8 GB VRAM | `gpu_nvidia: 1` | Whisper + TTS |
|
|
||||||
| drizzt | Radeon 680M | ROCm | 12 GB VRAM | `gpu_rdna2: 1` | Embeddings |
|
|
||||||
| danilo | Intel Arc | i915/IPEX | ~6 GB shared | `gpu_intel: 1` | Reranker |
|
|
||||||
| **waterdeep** | **M4 Pro** | **MPS (Metal)** | **48 GB unified** | **`gpu_apple_mps: 1`** | **LLM (7B–30B) + Training** |
|
|
||||||
|
|
||||||
## Implementation Plan
|
## Implementation Plan
|
||||||
|
|
||||||
### 1. Network Prerequisites
|
### 1. Install Blender and Add-ons
|
||||||
|
|
||||||
waterdeep must be able to reach the Ray head node's GCS port:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# From waterdeep, verify connectivity
|
# Install Blender via Homebrew
|
||||||
nc -zv <ray-head-ip> 6379
|
brew install --cask blender
|
||||||
|
|
||||||
|
# Download BlenderMCP add-on
|
||||||
|
curl -LO https://raw.githubusercontent.com/ahujasid/blender-mcp/main/addon.py
|
||||||
|
|
||||||
|
# Install in Blender:
|
||||||
|
# Edit > Preferences > Add-ons > Install... > select addon.py
|
||||||
|
# Enable "Interface: Blender MCP"
|
||||||
|
|
||||||
|
# Install VRM Add-on for Blender:
|
||||||
|
# Download from https://vrm-addon-for-blender.info/en/
|
||||||
|
# Edit > Preferences > Add-ons > Install... > select VRM add-on zip
|
||||||
|
# Enable "Import-Export: VRM"
|
||||||
```
|
```
|
||||||
|
|
||||||
The Ray head service (`ai-inference-raycluster-head-svc`) is ClusterIP-only. Options to expose it:
|
### 2. VS Code MCP Configuration
|
||||||
|
|
||||||
| Approach | Complexity | Recommended |
|
```json
|
||||||
|----------|-----------|-------------|
|
// .vscode/mcp.json (in companions-frontend or global settings)
|
||||||
| NodePort service on port 6379 | Low | For initial setup |
|
{
|
||||||
| Envoy Gateway TCPRoute | Medium | For production use |
|
"servers": {
|
||||||
| Tailscale/WireGuard mesh | Medium | If already in use |
|
"blender": {
|
||||||
|
"command": "uvx",
|
||||||
|
"args": ["blender-mcp"],
|
||||||
|
"env": {
|
||||||
|
"BLENDER_HOST": "localhost",
|
||||||
|
"BLENDER_PORT": "9876",
|
||||||
|
"DISABLE_TELEMETRY": "true"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
### 2. Python Environment on waterdeep
|
### 3. Python Environment for BlenderMCP
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Install uv (per ADR-0012)
|
# Install uv (per ADR-0012)
|
||||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||||
|
|
||||||
# Create Ray worker environment
|
# uvx handles the BlenderMCP server environment automatically
|
||||||
uv venv ~/ray-worker --python 3.12
|
# Verify it works:
|
||||||
source ~/ray-worker/bin/activate
|
uvx blender-mcp --help
|
||||||
|
|
||||||
# Install Ray with ML dependencies
|
|
||||||
uv pip install "ray[default]==2.53.0" torch torchvision torchaudio \
|
|
||||||
transformers accelerate peft bitsandbytes \
|
|
||||||
ray-serve-apps # internal package from Gitea PyPI
|
|
||||||
|
|
||||||
# Verify MPS availability
|
|
||||||
python -c "import torch; print(torch.backends.mps.is_available())"
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Start Ray Worker
|
### 4. NFS Mount for Asset Promotion
|
||||||
|
|
||||||
|
Mount gravenhollow's avatar-models directory for direct promotion of finished VRM exports:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Join the cluster with custom resources
|
# Create mount point
|
||||||
ray start \
|
sudo mkdir -p /Volumes/avatar-models
|
||||||
--address="<ray-head-ip>:6379" \
|
|
||||||
--num-cpus=12 \
|
# Mount gravenhollow NFS (all-SSD, dual 10GbE)
|
||||||
--num-gpus=1 \
|
sudo mount -t nfs \
|
||||||
--resources='{"gpu_apple_mps": 1}' \
|
gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models \
|
||||||
--block
|
/Volumes/avatar-models
|
||||||
|
|
||||||
|
# Add to /etc/auto_master for persistent mount (macOS autofs)
|
||||||
|
# /Volumes/avatar-models -fstype=nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. launchd Service (Persistent)
|
Alternatively, use rclone for S3-based promotion:
|
||||||
|
|
||||||
```xml
|
|
||||||
<!-- ~/Library/LaunchAgents/io.ray.worker.plist -->
|
|
||||||
<?xml version="1.0" encoding="UTF-8"?>
|
|
||||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
|
|
||||||
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
|
||||||
<plist version="1.0">
|
|
||||||
<dict>
|
|
||||||
<key>Label</key>
|
|
||||||
<string>io.ray.worker</string>
|
|
||||||
<key>ProgramArguments</key>
|
|
||||||
<array>
|
|
||||||
<string>/Users/billy/ray-worker/bin/ray</string>
|
|
||||||
<string>start</string>
|
|
||||||
<string>--address=RAY_HEAD_IP:6379</string>
|
|
||||||
<string>--num-cpus=12</string>
|
|
||||||
<string>--num-gpus=1</string>
|
|
||||||
<string>--resources={"gpu_apple_mps": 1}</string>
|
|
||||||
<string>--block</string>
|
|
||||||
</array>
|
|
||||||
<key>RunAtLoad</key>
|
|
||||||
<true/>
|
|
||||||
<key>KeepAlive</key>
|
|
||||||
<true/>
|
|
||||||
<key>StandardOutPath</key>
|
|
||||||
<string>/tmp/ray-worker.log</string>
|
|
||||||
<key>StandardErrorPath</key>
|
|
||||||
<string>/tmp/ray-worker-error.log</string>
|
|
||||||
<key>EnvironmentVariables</key>
|
|
||||||
<dict>
|
|
||||||
<key>PATH</key>
|
|
||||||
<string>/Users/billy/ray-worker/bin:/usr/local/bin:/usr/bin:/bin</string>
|
|
||||||
</dict>
|
|
||||||
</dict>
|
|
||||||
</plist>
|
|
||||||
```
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
launchctl load ~/Library/LaunchAgents/io.ray.worker.plist
|
# Install rclone
|
||||||
|
brew install rclone
|
||||||
|
|
||||||
|
# Configure gravenhollow RustFS endpoint
|
||||||
|
rclone config create gravenhollow s3 \
|
||||||
|
provider=Other \
|
||||||
|
endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \
|
||||||
|
access_key_id=<key> \
|
||||||
|
secret_access_key=<secret>
|
||||||
|
|
||||||
|
# Promote a finished VRM
|
||||||
|
rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/
|
||||||
```
|
```
|
||||||
|
|
||||||
### 5. Model Cache via NFS
|
### 5. Avatar Creation Workflow (waterdeep)
|
||||||
|
|
||||||
Mount the gravenhollow NFS share on waterdeep so models are shared with the cluster via the fast all-SSD NAS:
|
1. **Open Blender** on waterdeep (native Metal-accelerated)
|
||||||
|
2. **Enable BlenderMCP** → 3D View sidebar → "BlenderMCP" tab → click "Connect"
|
||||||
|
3. **Open VS Code** with Copilot agent mode — BlenderMCP server starts automatically
|
||||||
|
4. **Create avatars** using AI-assisted prompts:
|
||||||
|
- _"Create an anime-style character with silver hair and a mage outfit"_
|
||||||
|
- _"Apply metallic blue material to the staff"_
|
||||||
|
- _"Rig this character for VRM export with standard humanoid bones"_
|
||||||
|
- _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_
|
||||||
|
5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps
|
||||||
|
6. **Promote** the finished VRM to gravenhollow:
|
||||||
|
```bash
|
||||||
|
cp ~/blender-avatars/exports/Silver-Mage-v1.vrm /Volumes/avatar-models/
|
||||||
|
```
|
||||||
|
7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit
|
||||||
|
|
||||||
```bash
|
### 6. Workflow Comparison: waterdeep vs Kasm
|
||||||
# Mount gravenhollow NFS share (all-SSD, dual 10GbE)
|
|
||||||
sudo mount -t nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models \
|
|
||||||
/Volumes/model-cache
|
|
||||||
|
|
||||||
# Or add to /etc/fstab for persistence
|
| Aspect | waterdeep (local) | Kasm (browser) |
|
||||||
# gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models /Volumes/model-cache nfs rw 0 0
|
|--------|-------------------|----------------|
|
||||||
|
| **GPU rendering** | Metal 16-core GPU — Eevee real-time, Cycles GPU | CPU-only software rendering |
|
||||||
# Symlink to HuggingFace cache location
|
| **Viewport FPS** | 60fps (Metal) | 5–15fps (CPU rasterisation) |
|
||||||
ln -s /Volumes/model-cache ~/.cache/huggingface/hub
|
| **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container |
|
||||||
```
|
| **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation |
|
||||||
|
| **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound |
|
||||||
### 6. Ray Serve Deployment Targeting
|
| **Asset promotion** | NFS mount or rclone to gravenhollow | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
|
||||||
|
| **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere |
|
||||||
To schedule a deployment specifically on waterdeep, use the `gpu_apple_mps` custom resource in the RayService config:
|
| **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image |
|
||||||
|
| **Use when** | Primary creation workflow | Remote access, quick edits, mobile |
|
||||||
```yaml
|
|
||||||
# In rayservice.yaml serveConfigV2
|
|
||||||
- name: llm-secondary
|
|
||||||
route_prefix: /llm-secondary
|
|
||||||
import_path: ray_serve.serve_llm:app
|
|
||||||
runtime_env:
|
|
||||||
env_vars:
|
|
||||||
MODEL_ID: "Qwen/Qwen2.5-32B-Instruct-AWQ"
|
|
||||||
DEVICE: "mps"
|
|
||||||
MAX_MODEL_LEN: "4096"
|
|
||||||
deployments:
|
|
||||||
- name: LLMDeployment
|
|
||||||
num_replicas: 1
|
|
||||||
ray_actor_options:
|
|
||||||
num_gpus: 0.95
|
|
||||||
resources:
|
|
||||||
gpu_apple_mps: 1
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7. Training Integration
|
|
||||||
|
|
||||||
Ray Train jobs from [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) will automatically discover waterdeep as an available worker. To prefer it for GPU-accelerated training:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# In cpu_training_pipeline.py — updated to prefer MPS when available
|
|
||||||
trainer = TorchTrainer(
|
|
||||||
train_func,
|
|
||||||
scaling_config=ScalingConfig(
|
|
||||||
num_workers=1,
|
|
||||||
use_gpu=True,
|
|
||||||
resources_per_worker={"gpu_apple_mps": 1},
|
|
||||||
),
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Monitoring
|
|
||||||
|
|
||||||
Since waterdeep is not a Kubernetes node, standard Prometheus scraping won't reach it. Options:
|
|
||||||
|
|
||||||
| Approach | Notes |
|
|
||||||
|----------|-------|
|
|
||||||
| Prometheus push gateway | Ray worker pushes metrics periodically |
|
|
||||||
| Node-exporter on macOS | Homebrew `node_exporter`, scraped by Prometheus via static target |
|
|
||||||
| Ray Dashboard | Already shows all connected workers (ray-serve.lab.daviestechlabs.io) |
|
|
||||||
|
|
||||||
The Ray Dashboard at `ray-serve.lab.daviestechlabs.io` will automatically show waterdeep as a connected node with its resources, tasks, and memory usage — no additional configuration needed.
|
|
||||||
|
|
||||||
## Power Management
|
|
||||||
|
|
||||||
To prevent macOS from sleeping and disconnecting the Ray worker:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Disable sleep when on power adapter
|
|
||||||
sudo pmset -c sleep 0 displaysleep 0 disksleep 0
|
|
||||||
|
|
||||||
# Or use caffeinate for the Ray process
|
|
||||||
caffeinate -s ray start --address=... --block
|
|
||||||
```
|
|
||||||
|
|
||||||
## Security Considerations
|
## Security Considerations
|
||||||
|
|
||||||
* Ray's GCS port (6379) will be exposed outside the cluster — restrict with firewall rules to waterdeep's IP only
|
* BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender — review AI-generated code before execution, especially file I/O operations
|
||||||
* The Ray worker has no RBAC — it executes whatever tasks the head assigns
|
* Telemetry disabled via `DISABLE_TELEMETRY=true` in MCP server config
|
||||||
* Model weights on NFS are read-only from waterdeep (mount with `ro` option if possible)
|
* BlenderMCP socket (port 9876) bound to localhost — not exposed to the network
|
||||||
* NFS traffic to gravenhollow traverses the LAN — ensure dual 10GbE links are active
|
* NFS traffic to gravenhollow traverses the LAN — no sensitive data in VRM files
|
||||||
* Consider Tailscale or WireGuard for encrypted transport if the Ray GCS traffic crosses untrusted network segments
|
* waterdeep has no cluster access — compromise doesn't impact Kubernetes workloads
|
||||||
|
* `.blend` source files stay local on waterdeep; only finished VRM exports are promoted to gravenhollow
|
||||||
|
|
||||||
## Future Considerations
|
## Future Considerations
|
||||||
|
|
||||||
* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, waterdeep can shift to secondary inference while DGX Spark handles training
|
* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation
|
||||||
* **vLLM MPS maturity**: As vLLM's MPS backend matures, waterdeep could serve larger models more efficiently
|
* **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature
|
||||||
* **MLX backend**: Apple's MLX framework may provide better performance than PyTorch MPS for some workloads — worth evaluating as an alternative serving backend
|
* **Automated promotion**: A file watcher (fswatch/launchd) could auto-promote VRM exports from `~/blender-avatars/exports/` to gravenhollow when a new file appears
|
||||||
* **Second Mac Mini**: If another Apple Silicon node is added, the external-worker pattern scales trivially
|
* **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow
|
||||||
|
|
||||||
## Links
|
## Links
|
||||||
|
|
||||||
* [Ray Clusters — Adding External Workers](https://docs.ray.io/en/latest/cluster/vms/getting-started.html)
|
* Related: [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) — BlenderMCP 3D avatar workflow (Kasm + deployment architecture)
|
||||||
* [PyTorch MPS Backend](https://pytorch.org/docs/stable/notes/mps.html)
|
* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend architecture (Three.js + VRM avatars)
|
||||||
* [vLLM Apple Silicon Support](https://docs.vllm.ai/en/latest/)
|
* Related: [ADR-0026](0026-storage-strategy.md) — Storage strategy (gravenhollow NFS-fast)
|
||||||
* Related: [ADR-0005](0005-multi-gpu-strategy.md) — Multi-GPU strategy
|
* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions (waterdeep)
|
||||||
* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
|
* Related: [ADR-0012](0012-use-uv-for-python-development.md) — uv for Python development
|
||||||
* Related: [ADR-0024](0024-ray-repository-structure.md) — Ray repository structure
|
* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
|
||||||
* Related: [ADR-0035](0035-arm64-worker-strategy.md) — ARM64 worker strategy
|
* [Blender Metal GPU Rendering](https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html)
|
||||||
* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions
|
* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
|
||||||
* Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy
|
* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm)
|
||||||
|
|||||||
@@ -437,6 +437,7 @@ VRM files are immutable once promoted — updated models get a new filename (e.g
|
|||||||
* Related to [ADR-0026](0026-storage-strategy.md) (storage strategy — gravenhollow NFS-fast, Quobyte S3, rclone)
|
* Related to [ADR-0026](0026-storage-strategy.md) (storage strategy — gravenhollow NFS-fast, Quobyte S3, rclone)
|
||||||
* Related to [ADR-0044](0044-dns-and-external-access.md) (DNS and external access — Cloudflare Tunnel, split-horizon)
|
* Related to [ADR-0044](0044-dns-and-external-access.md) (DNS and external access — Cloudflare Tunnel, split-horizon)
|
||||||
* Related to [ADR-0049](0049-self-hosted-productivity-suite.md) (Kasm Workspaces)
|
* Related to [ADR-0049](0049-self-hosted-productivity-suite.md) (Kasm Workspaces)
|
||||||
|
* Related to [ADR-0059](0059-mac-mini-ray-worker.md) (waterdeep as local AI agent — primary 3D creation workstation with Metal GPU)
|
||||||
* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
|
* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
|
||||||
* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
|
* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
|
||||||
* [VRM Specification](https://vrm.dev/en/)
|
* [VRM Specification](https://vrm.dev/en/)
|
||||||
|
|||||||
Reference in New Issue
Block a user