Files
homelab-design/decisions/0063-comfyui-3d-avatar-pipeline.md
Billy D. 4affddf9b4
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m5s
replacing mcp blender with reproducable flow.
2026-02-24 05:34:29 -05:00

484 lines
33 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ComfyUI Image-to-3D Avatar Pipeline with TRELLIS + UniRig
* Status: proposed
* Date: 2026-02-24
* Deciders: Billy
* Technical Story: Replace the manual BlenderMCP 3D avatar creation workflow with an automated, GPU-accelerated image-to-rigged-3D-model pipeline using ComfyUI, TRELLIS 2-4B, and UniRig — running on a personal desktop (NVIDIA RTX 4070) as an on-demand Ray worker, with direct MLflow logging and rclone asset promotion
## Context and Problem Statement
The companions-frontend serves VRM avatar models for Three.js-based 3D character rendering ([ADR-0046](0046-companions-frontend-architecture.md)). The previous approach ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md)) proposed using BlenderMCP in a Kasm workstation or on waterdeep ([ADR-0059](0059-mac-mini-ray-worker.md)) for AI-assisted avatar creation. While BlenderMCP bridges VS Code to Blender, the workflow is fundamentally **interactive and manual** — an operator must prompt the AI, review each sculpting step, and hand-tune rigging and VRM export. This is slow, non-reproducible, and doesn't scale.
Meanwhile, the state of the art in image-to-3D generation has matured significantly:
- **TRELLIS** (Microsoft, CVPR'25 Spotlight, 12k+ GitHub stars) generates high-quality textured 3D meshes from a single image in seconds using Structured 3D Latents (SLAT) — with models up to 2B parameters
- **UniRig** (Tsinghua/Tripo, SIGGRAPH'25, 1.4k+ GitHub stars) automatically generates topologically valid skeletons and skinning weights for arbitrary 3D models using autoregressive transformers — the first model to rig humans, animals, and objects with a single unified framework
- **ComfyUI-3D-Pack** (3.7k+ GitHub stars) provides battle-tested ComfyUI nodes for TRELLIS, 3D Gaussian Splatting, mesh processing, and GLB/VRM export — enabling node-graph-based automation without custom code
Together, these tools enable a fully automated **image → 3D mesh → rigged model → VRM** pipeline that eliminates manual Blender work for the common case, produces reproducible results, and integrates with the existing MLflow + Ray stack.
A personal desktop (Ryzen 9 7950X, 64 GB DDR5, NVIDIA RTX 4070 12 GB VRAM) running Arch Linux is available as an **on-demand external Ray worker** — it won't be a permanent cluster member (it's not running Talos), but can join the Ray cluster via `ray start` when 3D generation workloads need to run. This adds a 5th GPU to the fleet specifically for 3D generation, without disrupting the stable inference allocations.
How do we build an automated, reproducible image-to-VRM pipeline that leverages the desktop's CUDA GPU and integrates with the existing AI/ML platform for experiment tracking and asset serving?
## Decision Drivers
* BlenderMCP workflow from ADR-0062 is interactive and non-reproducible — every avatar requires an operator in the loop
* TRELLIS generates production-quality textured meshes from a single reference image in ~30 seconds on a 12 GB GPU
* UniRig automatically rigs arbitrary 3D models with skeleton + skinning weights — no manual weight painting
* ComfyUI-3D-Pack bundles TRELLIS, mesh processing, and GLB export as composable nodes — enabling visual pipeline authoring
* The desktop's RTX 4070 (12 GB VRAM) meets TRELLIS's 16 GB minimum when using fp16/attention optimizations, and exceeds UniRig's 8 GB requirement
* The desktop can join/leave the Ray cluster on demand — no permanent infrastructure commitment
* MLflow tracks generation parameters, quality metrics, and output artifacts for reproducibility — the desktop logs directly to the cluster's MLflow service over HTTP
* waterdeep (Mac Mini M4 Pro) remains available for interactive Blender touch-up on models that need manual refinement
* VRM export, asset promotion to gravenhollow, and serving architecture from ADR-0062 remain valid and are reused
## Considered Options
1. **ComfyUI + TRELLIS + UniRig on desktop Ray worker, with direct MLflow logging and rclone promotion**
2. **BlenderMCP interactive workflow** (ADR-0062, superseded)
3. **Cloud-hosted 3D generation (Hyper3D Rodin, Meshy, etc.)**
4. **Run TRELLIS + UniRig directly as Ray Serve deployments in-cluster**
## Decision Outcome
Chosen option: **Option 1 — ComfyUI + TRELLIS + UniRig on desktop Ray worker**, because it automates the entire image-to-rigged-model pipeline without operator interaction, leverages purpose-built state-of-the-art models (TRELLIS for generation, UniRig for rigging), and uses the desktop's RTX 4070 as on-demand GPU capacity without disrupting the stable inference cluster. ComfyUI's visual node graph provides the pipeline orchestration directly on the desktop — no Kubernetes-side orchestrator needed since all compute is local to one machine.
waterdeep retains its role as an interactive Blender workstation for manual refinement of auto-generated models when needed — but the expectation is that most avatars pass through the automated pipeline without manual touch-up.
### Positive Consequences
* **Fully automated pipeline** — image → textured mesh → rigged model → VRM with no operator in the loop
* **Reproducible** — same image + seed produces identical output; parameters tracked in MLflow
* **Fast** — TRELLIS generates a mesh in ~30s, UniRig rigs it in ~60s; end-to-end under 5 minutes including VRM export
* **On-demand GPU** — desktop joins Ray cluster only when needed; no standing resource cost
* **Composable** — ComfyUI node graph can be extended with additional 3D processing nodes (Hunyuan3D, TripoSG, Stable3DGen) without code changes
* **Quality** — TRELLIS (CVPR'25) and UniRig (SIGGRAPH'25) represent current state of the art
* **MLflow integration** — generation parameters, mesh quality metrics, and output artifacts are logged directly to the cluster's MLflow service over HTTP
* **Simple orchestration** — ComfyUI node graph handles the pipeline; no Kubernetes-side orchestrator needed for a single-GPU linear workflow
* **Reuses existing serving architecture** — gravenhollow NFS + RustFS CDN serving from ADR-0062 is unchanged
* **waterdeep fallback** — interactive Blender + BlenderMCP on waterdeep for models needing hand-tuning
### Negative Consequences
* Desktop must be powered on and `ray start` must be run manually to participate in the pipeline
* TRELLIS requires NVIDIA CUDA — cannot run on the existing AMD/Intel GPU fleet (khelben, drizzt, danilo)
* ComfyUI adds a Python dependency stack (PyTorch, CUDA, spconv, flash-attn) to maintain on the desktop
* RTX 4070 has 12 GB VRAM — large TRELLIS models (2B params) may require fp16 + attention optimization; the 1.2B image-to-3D model fits comfortably
* Auto-generated VRM models may still need manual expression/viseme morph targets for full companions-frontend lip-sync support
* Desktop is not managed by GitOps/Kubernetes — Ansible or manual setup
## Pros and Cons of the Options
### Option 1 — ComfyUI + TRELLIS + UniRig on desktop Ray worker
* Good, because fully automated image-to-VRM pipeline eliminates manual sculpting
* Good, because TRELLIS (CVPR'25) and UniRig (SIGGRAPH'25) are state-of-the-art, MIT-licensed
* Good, because ComfyUI-3D-Pack provides tested node implementations — no custom TRELLIS integration code
* Good, because desktop GPU is free/idle capacity with no cluster impact
* Good, because MLflow integration reuses existing experiment tracking infrastructure
* Good, because ComfyUI can queue and batch-generate multiple avatars unattended
* Bad, because desktop availability is not guaranteed (must be manually started)
* Bad, because CUDA-only — doesn't leverage the existing ROCm/Intel fleet
* Bad, because auto-rigging quality varies by model topology — some models may need manual refinement
### Option 2 — BlenderMCP interactive workflow (ADR-0062)
* Good, because maximum creative control via VS Code + Copilot
* Good, because Kasm provides browser-based access from anywhere
* Bad, because every avatar requires an operator in the loop — slow and non-reproducible
* Bad, because Blender sculpting from scratch is time-intensive even with AI assistance
* Bad, because Kasm runs Blender CPU-only (no GPU acceleration inside DinD)
* Bad, because no MLflow tracking or reproducibility
### Option 3 — Cloud-hosted 3D generation
* Good, because no local GPU required
* Good, because some services (Meshy, Hyper3D Rodin) offer API access
* Bad, because vendor dependency for a core asset pipeline
* Bad, because free tiers have daily limits; paid tiers add recurring cost
* Bad, because limited control over output quality, rigging, and VRM compliance
* Bad, because data leaves the homelab network
### Option 4 — TRELLIS + UniRig as in-cluster Ray Serve deployments
* Good, because fully integrated with existing Ray cluster
* Good, because no desktop dependency
* Bad, because TRELLIS requires NVIDIA CUDA — no CUDA GPUs in-cluster have enough VRAM (elminster has 8 GB, needs 1216 GB)
* Bad, because would require purchasing new in-cluster NVIDIA hardware
* Bad, because 3D generation is batch/occasional, not real-time serving — Ray Serve's always-on model is wasteful
* Bad, because TRELLIS's CUDA dependencies (spconv, flash-attn, nvdiffrast, kaolin) conflict with existing Ray worker images
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Kubeflow Pipelines (namespace: kubeflow) │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ 3d_avatar_generation_pipeline │ │
│ │ │ │
│ │ 1. prepare_reference Load/generate reference image from prompt │ │
│ │ │ (optional: use vLLM + Stable Diffusion) │ │
│ │ ▼ │ │
│ │ 2. generate_3d_mesh Submit RayJob → desktop ComfyUI worker │ │
│ │ │ TRELLIS image-large (1.2B) → GLB mesh │ │
│ │ ▼ │ │
│ │ 3. auto_rig Submit RayJob → desktop UniRig worker │ │
│ │ │ UniRig skeleton + skinning → rigged FBX/GLB │ │
│ │ ▼ │ │
│ │ 4. convert_to_vrm Blender CLI (headless) on desktop or cluster │ │
│ │ │ Import rigged GLB → configure VRM metadata │ │
│ │ ▼ → export .vrm │ │
│ │ 5. validate_vrm Check humanoid rig, expressions, visemes │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 6. promote_to_storage rclone copy → gravenhollow RustFS S3 │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 7. log_to_mlflow Parameters, metrics, artifacts → MLflow │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────┬──────────────────────────────────────┘
RayJob CR (ephemeral)
┌─────────────────────────────────────────────────────────────────────────────┐
│ desktop (Arch Linux · Ryzen 9 7950X · 64 GB DDR5 · RTX 4070 12 GB) │
│ On-demand Ray worker (ray start --address=<ray-head>:6379) │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ ComfyUI + Custom Nodes │ │
│ │ │ │
│ │ ComfyUI-3D-Pack: │ │
│ │ • TRELLIS image-large (1.2B) — image → textured GLB mesh │ │
│ │ • Mesh processing nodes — simplify, UV unwrap, texture bake │ │
│ │ • 3D preview — viewport render for quality check │ │
│ │ • GLB/OBJ/PLY export │ │
│ │ │ │
│ │ UniRig: │ │
│ │ • Skeleton prediction — autoregressive bone hierarchy │ │
│ │ • Skinning weights — bone-point cross-attention │ │
│ │ • Merge — skeleton + skin + original mesh → rigged model │ │
│ │ • Supports GLB, FBX, OBJ input/output │ │
│ │ │ │
│ │ Blender 4.x (headless CLI): │ │
│ │ • VRM Add-on for Blender — GLB → VRM conversion │ │
│ │ • Humanoid rig mapping, expression morphs, viseme config │ │
│ │ • Batch export via bpy scripting │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ GPU: NVIDIA RTX 4070 12 GB (CUDA 12.x) │
│ Ray: worker node with resource label {"nvidia_gpu": 1, "rtx4070": 1} │
│ Storage: ~/comfyui-3d/ (working dir), rclone → gravenhollow S3 │
└──────────────────────────────────┬──────────────────────────────────────────┘
rclone (S3)
┌─────────────────────────────────────────────────────────────────────────────┐
│ gravenhollow.lab.daviestechlabs.io │
│ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │
│ │
│ NFS: /mnt/gravenhollow/kubernetes/avatar-models/ │
│ ├── Seed-san.vrm (default model) │
│ ├── Generated-A-v1.vrm (auto-generated via pipeline) │
│ └── animations/ (shared animation clips) │
│ │
│ S3 (RustFS): avatar-models bucket │
│ (same data, served via Cloudflare Tunnel for remote users) │
└──────────────────────────┬──────────────────────────────────────────────────┘
┌────────────┴───────────────┐
│ │
NFS (nfs-fast PVC) Cloudflare Tunnel
│ (assets.daviestechlabs.io)
▼ │
┌──────────────────────────┐ ▼
│ companions-frontend │ ┌──────────────────────────┐
│ (Kubernetes pod) │ │ Remote users (CDN-cached │
│ LAN users │ │ via Cloudflare edge) │
└──────────────────────────┘ └──────────────────────────┘
```
### Ray Cluster Integration
The desktop joins the existing KubeRay-managed cluster as an external worker. It is **not** a Talos node and not managed by Kubernetes — it connects to the Ray head node's GCS port directly:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Ray Cluster (KubeRay RayService) │
│ │
│ Head: Ray head pod (in-cluster) │
│ GCS port: 6379 (exposed via NodePort or LoadBalancer) │
│ │
│ In-Cluster Workers (permanent, managed by KubeRay): │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ khelben │ │elminster │ │ drizzt │ │ danilo │ │
│ │Strix Halo│ │RTX 2070 │ │Radeon 680│ │Intel Arc │ │
│ │ ROCm │ │ CUDA │ │ ROCm │ │ Intel │ │
│ │ /llm │ │/whisper │ │/embeddings│ │/reranker │ │
│ │ │ │ /tts │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ External Worker (on-demand, self-managed): │
│ ┌──────────────────────────────────────────────────┐ │
│ │ desktop (Arch Linux, external) │ │
│ │ RTX 4070 12 GB · CUDA │ │
│ │ ComfyUI + TRELLIS + UniRig + Blender CLI │ │
│ │ Resource labels: {"nvidia_gpu": 1, "3d_gen": 1} │ │
│ │ Joins via: ray start --address=<head>:6379 │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
The existing inference deployments (`/llm`, `/whisper`, `/tts`, `/embeddings`, `/reranker`) are unaffected — they are pinned to their respective in-cluster GPU nodes via Ray resource labels. The desktop's `3d_gen` resource label ensures only 3D generation RayJobs get scheduled there.
### Ray Service Multiplexing
The desktop's RTX 4070 can **time-share between inference overflow and 3D generation** when idle. When no 3D generation jobs are queued, the desktop can optionally serve as overflow capacity for inference workloads:
| Mode | When | What runs on desktop |
|------|------|---------------------|
| **3D generation** | ComfyUI workflow triggered (manually or via API) | ComfyUI + TRELLIS → UniRig → Blender VRM export |
| **Inference overflow** | Manually enabled, high-traffic periods | vLLM (secondary), Whisper, or TTS replica |
| **Idle** | Desktop powered on, no jobs | Ray worker connected but idle (0 resource cost) |
Mode switching is managed by Ray's resource scheduling — 3D jobs request `{"3d_gen": 1}` and inference jobs request their specific GPU labels. When the desktop is off, all workloads continue on the existing in-cluster fleet with no impact.
## Implementation Plan
### 1. Desktop Environment Setup
```bash
# Install NVIDIA drivers + CUDA toolkit (Arch Linux)
sudo pacman -S nvidia nvidia-utils cuda cudnn
# Install Python environment (uv per ADR-0012)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create project directory
mkdir -p ~/comfyui-3d && cd ~/comfyui-3d
# Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
uv venv --python 3.11
source .venv/bin/activate
uv pip install -r requirements.txt
# Install ComfyUI-3D-Pack (includes TRELLIS nodes)
cd custom_nodes
git clone https://github.com/MrForExample/ComfyUI-3D-Pack.git
cd ComfyUI-3D-Pack
uv pip install -r requirements.txt
python install.py
# Install UniRig
cd ~/comfyui-3d
git clone https://github.com/VAST-AI-Research/UniRig.git
cd UniRig
uv pip install torch torchvision
uv pip install -r requirements.txt
uv pip install spconv-cu124 # Match CUDA version
uv pip install flash-attn --no-build-isolation
# Install Blender (headless CLI for VRM export)
sudo pacman -S blender
# Install VRM Add-on
python -c "import bpy, os; bpy.ops.preferences.addon_install(filepath=os.path.abspath('UniRig/blender/add-on-vrm-v2.20.77_modified.zip'))"
# Install rclone for asset promotion
sudo pacman -S rclone
rclone config create gravenhollow s3 \
provider=Other \
endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \
access_key_id=<key> \
secret_access_key=<secret>
# Install Ray for cluster joining
uv pip install "ray[default]"
```
### 2. Ray Worker Configuration
```bash
# Join the Ray cluster on demand
# Ray head GCS port must be exposed (NodePort 30637 or similar)
ray start \
--address=<ray-head-external-ip>:6379 \
--num-cpus=16 \
--num-gpus=1 \
--resources='{"3d_gen": 1, "rtx4070": 1}' \
--node-name=desktop
# Verify connection
ray status # Should show desktop as a connected worker
```
The Ray head's GCS port needs to be reachable from the desktop. Options:
- **NodePort**: Expose port 6379 as a NodePort (e.g., 30637) on a cluster node
- **Tailscale/WireGuard**: If the desktop is on a different network segment
- **Direct LAN**: If desktop and cluster are on the same 192.168.100.0/24 subnet
### 3. ComfyUI Workflow (Node Graph)
The ComfyUI workflow JSON defines the image-to-GLB pipeline:
```
[Load Image] → [TRELLIS Image-to-3D] → [Mesh Simplify] → [Texture Bake]
[Save GLB]
[UniRig Skeleton Prediction]
[UniRig Skinning Weights]
[UniRig Merge (rigged model)]
[Blender VRM Export (CLI)]
[Save VRM → ~/comfyui-3d/exports/]
```
Key TRELLIS parameters exposed:
- `sparse_structure_sampler_params.steps`: 12 (default)
- `sparse_structure_sampler_params.cfg_strength`: 7.5
- `slat_sampler_params.steps`: 12
- `slat_sampler_params.cfg_strength`: 3.0
- `simplify`: 0.95 (triangle reduction ratio)
- `texture_size`: 1024
### 4. MLflow Experiment Tracking
The desktop logs directly to the cluster's MLflow service over HTTP. Set `MLFLOW_TRACKING_URI` in the ComfyUI environment or in a post-generation logging script:
```bash
export MLFLOW_TRACKING_URI=http://<mlflow-service>:5000
```
Each generation run logs to a dedicated MLflow experiment:
| What | MLflow Concept | Content |
|------|---------------|---------|
| Reference image | Artifact | `reference.png` |
| TRELLIS parameters | Params | seed, cfg_strength, steps, simplify, texture_size |
| UniRig parameters | Params | skeleton_seed |
| Raw mesh | Artifact | `{name}_raw.glb` (pre-rigging) |
| Rigged model | Artifact | `{name}_rigged.glb` (post-rigging) |
| Final VRM | Artifact | `{name}.vrm` |
| Mesh quality | Metrics | vertex_count, face_count, texture_resolution |
| Rig quality | Metrics | bone_count, skinning_weight_coverage |
| Pipeline duration | Metrics | trellis_time_s, unirig_time_s, total_time_s |
### 5. VRM Export Script (Blender CLI)
```python
#!/usr/bin/env python3
"""vrm_export.py — Headless Blender script for GLB→VRM conversion."""
import bpy
import sys
argv = sys.argv[sys.argv.index("--") + 1:]
input_glb = argv[0]
output_vrm = argv[1]
avatar_name = argv[2] if len(argv) > 2 else "Generated Avatar"
# Clear scene
bpy.ops.wm.read_factory_settings(use_empty=True)
# Import rigged GLB
bpy.ops.import_scene.gltf(filepath=input_glb)
# Select armature
armature = next(obj for obj in bpy.data.objects if obj.type == 'ARMATURE')
bpy.context.view_layer.objects.active = armature
# Configure VRM metadata
armature["vrm_addon_extension"] = {
"spec_version": "1.0",
"vrm0": {
"meta": {
"title": avatar_name,
"author": "DaviesTechLabs Pipeline",
"allowedUserName": "Everyone",
}
}
}
# Export VRM
bpy.ops.export_scene.vrm(filepath=output_vrm)
print(f"Exported VRM: {output_vrm}")
```
Invoked via:
```bash
blender --background --python vrm_export.py -- input.glb output.vrm "Avatar Name"
```
### 6. Asset Promotion (Reuses ADR-0062 Architecture)
The VRM serving architecture from ADR-0062 is preserved unchanged:
| Stage | Action |
|-------|--------|
| **Generate** | Automated pipeline: image → TRELLIS → UniRig → VRM |
| **Promote** | `rclone copy ~/comfyui-3d/exports/{name}.vrm gravenhollow:avatar-models/` |
| **Register** | Add model path to `AllowedAvatarModels` in companions-frontend Go + JS allowlists |
| **Deploy** | Flux rolls out config; model already on NFS PVC — no image rebuild |
| **CDN** | Cloudflare Tunnel → RustFS → CDN cache at 300+ edge PoPs |
## Model Requirements and VRAM Budget
| Component | Model Size | VRAM Required | Notes |
|-----------|-----------|---------------|-------|
| TRELLIS image-large | 1.2B params | ~10 GB (fp16) | Image-to-3D, best quality |
| TRELLIS text-xlarge | 2.0B params | ~14 GB (fp16) | Text-to-3D, optional |
| UniRig skeleton | ~350M params | ~4 GB | Autoregressive skeleton prediction |
| UniRig skinning | ~350M params | ~4 GB | Bone-point cross-attention |
| Blender CLI | N/A | CPU only | Headless VRM export |
**RTX 4070 budget (12 GB):** Models are loaded sequentially (not concurrently) — TRELLIS runs first, output is saved to disk, then UniRig loads for rigging. Peak VRAM usage is ~10 GB during TRELLIS inference. The desktop's 64 GB system RAM provides ample buffer for model loading and mesh processing.
## Security Considerations
* **Ray GCS port exposure**: The Ray head's port 6379 must be reachable from the desktop. Use a NodePort with network policy restricting source IPs to the desktop's address, or use a WireGuard/Tailscale tunnel.
* **No cluster credentials on desktop**: The desktop runs Ray worker processes and ComfyUI only — it has no `kubeconfig` or Kubernetes API access. Generation is triggered locally via ComfyUI's UI or API, not from the cluster.
* **Model provenance**: TRELLIS and UniRig checkpoints are downloaded from Hugging Face (Microsoft and VAST-AI orgs respectively). Pin checkpoint hashes in the setup script.
* **ComfyUI network**: ComfyUI's web UI (port 8188) should be bound to localhost only when not in use. It is not exposed to the cluster.
* **rclone credentials**: gravenhollow RustFS write credentials stored in `~/.config/rclone/rclone.conf` with `600` permissions.
* **Generated content**: Auto-generated 3D models inherit no licensing restrictions (TRELLIS and UniRig are both MIT-licensed).
## Future Considerations
* **Kubeflow pipeline for model refinement**: When iterating on existing models (re-rigging, parameter sweeps, A/B testing generation backends), a Kubeflow pipeline can orchestrate multi-step refinement workflows with artifact lineage, caching, and retries — submitting RayJobs to the desktop worker via the existing KFP + RayJob pattern from [ADR-0058](0058-training-strategy-cpu-dgx-spark.md)
* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, could run TRELLIS + UniRig in-cluster with dedicated GPU, eliminating desktop dependency
* **Stable3DGen / Hunyuan3D alternatives**: ComfyUI-3D-Pack supports multiple generation backends — can A/B test quality via MLflow metrics
* **VRM expression morphs**: Investigate automated viseme and expression blendshape generation for full lip-sync support without manual Blender work
* **ComfyUI API mode**: ComfyUI supports headless API-only execution (`--listen 0.0.0.0 --port 8188`) — a script or future Kubeflow pipeline can submit workflows via HTTP POST to `/prompt`
* **Text-to-3D**: Use the cluster's vLLM instance to generate a character description, then Stable Diffusion (on desktop) to create a reference image, feeding into TRELLIS — fully text-to-avatar pipeline
* **Batch generation**: Schedule overnight batch runs via CronWorkflow to generate avatar libraries from curated reference images
* **In-cluster migration**: If a 16+ GB NVIDIA GPU is added to the cluster (e.g., via DGX Spark or RTX 5070), migrate TRELLIS + UniRig to a dedicated Ray Serve deployment for always-available generation
## Links
* Supersedes: [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) — BlenderMCP for 3D avatar creation (interactive workflow)
* Updates: [ADR-0059](0059-mac-mini-ray-worker.md) — waterdeep retains Blender role for manual refinement only
* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend architecture (Three.js + VRM avatars)
* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
* Related: [ADR-0005](0005-multi-gpu-strategy.md) — Multi-GPU heterogeneous strategy
* Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy (Kubeflow + RayJob pattern for future pipeline work)
* Related: [ADR-0047](0047-mlflow-experiment-tracking.md) — MLflow experiment tracking
* Related: [ADR-0026](0026-storage-strategy.md) — Storage strategy (gravenhollow NFS-fast, RustFS S3)
* [Microsoft TRELLIS](https://github.com/microsoft/TRELLIS) — Structured 3D Latents for Scalable 3D Generation (CVPR'25 Spotlight)
* [VAST-AI UniRig](https://github.com/VAST-AI-Research/UniRig) — One Model to Rig Them All (SIGGRAPH'25)
* [ComfyUI-3D-Pack](https://github.com/MrForExample/ComfyUI-3D-Pack) — Extensive 3D node suite for ComfyUI
* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader in companions-frontend)