Files
homelab-design/decisions/0063-comfyui-3d-avatar-pipeline.md
Billy D. 4affddf9b4
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m5s
replacing mcp blender with reproducable flow.
2026-02-24 05:34:29 -05:00

33 KiB
Raw Blame History

ComfyUI Image-to-3D Avatar Pipeline with TRELLIS + UniRig

  • Status: proposed
  • Date: 2026-02-24
  • Deciders: Billy
  • Technical Story: Replace the manual BlenderMCP 3D avatar creation workflow with an automated, GPU-accelerated image-to-rigged-3D-model pipeline using ComfyUI, TRELLIS 2-4B, and UniRig — running on a personal desktop (NVIDIA RTX 4070) as an on-demand Ray worker, with direct MLflow logging and rclone asset promotion

Context and Problem Statement

The companions-frontend serves VRM avatar models for Three.js-based 3D character rendering (ADR-0046). The previous approach (ADR-0062) proposed using BlenderMCP in a Kasm workstation or on waterdeep (ADR-0059) for AI-assisted avatar creation. While BlenderMCP bridges VS Code to Blender, the workflow is fundamentally interactive and manual — an operator must prompt the AI, review each sculpting step, and hand-tune rigging and VRM export. This is slow, non-reproducible, and doesn't scale.

Meanwhile, the state of the art in image-to-3D generation has matured significantly:

  • TRELLIS (Microsoft, CVPR'25 Spotlight, 12k+ GitHub stars) generates high-quality textured 3D meshes from a single image in seconds using Structured 3D Latents (SLAT) — with models up to 2B parameters
  • UniRig (Tsinghua/Tripo, SIGGRAPH'25, 1.4k+ GitHub stars) automatically generates topologically valid skeletons and skinning weights for arbitrary 3D models using autoregressive transformers — the first model to rig humans, animals, and objects with a single unified framework
  • ComfyUI-3D-Pack (3.7k+ GitHub stars) provides battle-tested ComfyUI nodes for TRELLIS, 3D Gaussian Splatting, mesh processing, and GLB/VRM export — enabling node-graph-based automation without custom code

Together, these tools enable a fully automated image → 3D mesh → rigged model → VRM pipeline that eliminates manual Blender work for the common case, produces reproducible results, and integrates with the existing MLflow + Ray stack.

A personal desktop (Ryzen 9 7950X, 64 GB DDR5, NVIDIA RTX 4070 12 GB VRAM) running Arch Linux is available as an on-demand external Ray worker — it won't be a permanent cluster member (it's not running Talos), but can join the Ray cluster via ray start when 3D generation workloads need to run. This adds a 5th GPU to the fleet specifically for 3D generation, without disrupting the stable inference allocations.

How do we build an automated, reproducible image-to-VRM pipeline that leverages the desktop's CUDA GPU and integrates with the existing AI/ML platform for experiment tracking and asset serving?

Decision Drivers

  • BlenderMCP workflow from ADR-0062 is interactive and non-reproducible — every avatar requires an operator in the loop
  • TRELLIS generates production-quality textured meshes from a single reference image in ~30 seconds on a 12 GB GPU
  • UniRig automatically rigs arbitrary 3D models with skeleton + skinning weights — no manual weight painting
  • ComfyUI-3D-Pack bundles TRELLIS, mesh processing, and GLB export as composable nodes — enabling visual pipeline authoring
  • The desktop's RTX 4070 (12 GB VRAM) meets TRELLIS's 16 GB minimum when using fp16/attention optimizations, and exceeds UniRig's 8 GB requirement
  • The desktop can join/leave the Ray cluster on demand — no permanent infrastructure commitment
  • MLflow tracks generation parameters, quality metrics, and output artifacts for reproducibility — the desktop logs directly to the cluster's MLflow service over HTTP
  • waterdeep (Mac Mini M4 Pro) remains available for interactive Blender touch-up on models that need manual refinement
  • VRM export, asset promotion to gravenhollow, and serving architecture from ADR-0062 remain valid and are reused

Considered Options

  1. ComfyUI + TRELLIS + UniRig on desktop Ray worker, with direct MLflow logging and rclone promotion
  2. BlenderMCP interactive workflow (ADR-0062, superseded)
  3. Cloud-hosted 3D generation (Hyper3D Rodin, Meshy, etc.)
  4. Run TRELLIS + UniRig directly as Ray Serve deployments in-cluster

Decision Outcome

Chosen option: Option 1 — ComfyUI + TRELLIS + UniRig on desktop Ray worker, because it automates the entire image-to-rigged-model pipeline without operator interaction, leverages purpose-built state-of-the-art models (TRELLIS for generation, UniRig for rigging), and uses the desktop's RTX 4070 as on-demand GPU capacity without disrupting the stable inference cluster. ComfyUI's visual node graph provides the pipeline orchestration directly on the desktop — no Kubernetes-side orchestrator needed since all compute is local to one machine.

waterdeep retains its role as an interactive Blender workstation for manual refinement of auto-generated models when needed — but the expectation is that most avatars pass through the automated pipeline without manual touch-up.

Positive Consequences

  • Fully automated pipeline — image → textured mesh → rigged model → VRM with no operator in the loop
  • Reproducible — same image + seed produces identical output; parameters tracked in MLflow
  • Fast — TRELLIS generates a mesh in ~30s, UniRig rigs it in ~60s; end-to-end under 5 minutes including VRM export
  • On-demand GPU — desktop joins Ray cluster only when needed; no standing resource cost
  • Composable — ComfyUI node graph can be extended with additional 3D processing nodes (Hunyuan3D, TripoSG, Stable3DGen) without code changes
  • Quality — TRELLIS (CVPR'25) and UniRig (SIGGRAPH'25) represent current state of the art
  • MLflow integration — generation parameters, mesh quality metrics, and output artifacts are logged directly to the cluster's MLflow service over HTTP
  • Simple orchestration — ComfyUI node graph handles the pipeline; no Kubernetes-side orchestrator needed for a single-GPU linear workflow
  • Reuses existing serving architecture — gravenhollow NFS + RustFS CDN serving from ADR-0062 is unchanged
  • waterdeep fallback — interactive Blender + BlenderMCP on waterdeep for models needing hand-tuning

Negative Consequences

  • Desktop must be powered on and ray start must be run manually to participate in the pipeline
  • TRELLIS requires NVIDIA CUDA — cannot run on the existing AMD/Intel GPU fleet (khelben, drizzt, danilo)
  • ComfyUI adds a Python dependency stack (PyTorch, CUDA, spconv, flash-attn) to maintain on the desktop
  • RTX 4070 has 12 GB VRAM — large TRELLIS models (2B params) may require fp16 + attention optimization; the 1.2B image-to-3D model fits comfortably
  • Auto-generated VRM models may still need manual expression/viseme morph targets for full companions-frontend lip-sync support
  • Desktop is not managed by GitOps/Kubernetes — Ansible or manual setup

Pros and Cons of the Options

Option 1 — ComfyUI + TRELLIS + UniRig on desktop Ray worker

  • Good, because fully automated image-to-VRM pipeline eliminates manual sculpting
  • Good, because TRELLIS (CVPR'25) and UniRig (SIGGRAPH'25) are state-of-the-art, MIT-licensed
  • Good, because ComfyUI-3D-Pack provides tested node implementations — no custom TRELLIS integration code
  • Good, because desktop GPU is free/idle capacity with no cluster impact
  • Good, because MLflow integration reuses existing experiment tracking infrastructure
  • Good, because ComfyUI can queue and batch-generate multiple avatars unattended
  • Bad, because desktop availability is not guaranteed (must be manually started)
  • Bad, because CUDA-only — doesn't leverage the existing ROCm/Intel fleet
  • Bad, because auto-rigging quality varies by model topology — some models may need manual refinement

Option 2 — BlenderMCP interactive workflow (ADR-0062)

  • Good, because maximum creative control via VS Code + Copilot
  • Good, because Kasm provides browser-based access from anywhere
  • Bad, because every avatar requires an operator in the loop — slow and non-reproducible
  • Bad, because Blender sculpting from scratch is time-intensive even with AI assistance
  • Bad, because Kasm runs Blender CPU-only (no GPU acceleration inside DinD)
  • Bad, because no MLflow tracking or reproducibility

Option 3 — Cloud-hosted 3D generation

  • Good, because no local GPU required
  • Good, because some services (Meshy, Hyper3D Rodin) offer API access
  • Bad, because vendor dependency for a core asset pipeline
  • Bad, because free tiers have daily limits; paid tiers add recurring cost
  • Bad, because limited control over output quality, rigging, and VRM compliance
  • Bad, because data leaves the homelab network

Option 4 — TRELLIS + UniRig as in-cluster Ray Serve deployments

  • Good, because fully integrated with existing Ray cluster
  • Good, because no desktop dependency
  • Bad, because TRELLIS requires NVIDIA CUDA — no CUDA GPUs in-cluster have enough VRAM (elminster has 8 GB, needs 1216 GB)
  • Bad, because would require purchasing new in-cluster NVIDIA hardware
  • Bad, because 3D generation is batch/occasional, not real-time serving — Ray Serve's always-on model is wasteful
  • Bad, because TRELLIS's CUDA dependencies (spconv, flash-attn, nvdiffrast, kaolin) conflict with existing Ray worker images

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                  Kubeflow Pipelines (namespace: kubeflow)                     │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │              3d_avatar_generation_pipeline                              │  │
│  │                                                                        │  │
│  │  1. prepare_reference    Load/generate reference image from prompt     │  │
│  │         │                (optional: use vLLM + Stable Diffusion)       │  │
│  │         ▼                                                              │  │
│  │  2. generate_3d_mesh     Submit RayJob → desktop ComfyUI worker        │  │
│  │         │                TRELLIS image-large (1.2B) → GLB mesh         │  │
│  │         ▼                                                              │  │
│  │  3. auto_rig             Submit RayJob → desktop UniRig worker         │  │
│  │         │                UniRig skeleton + skinning → rigged FBX/GLB   │  │
│  │         ▼                                                              │  │
│  │  4. convert_to_vrm       Blender CLI (headless) on desktop or cluster  │  │
│  │         │                Import rigged GLB → configure VRM metadata    │  │
│  │         ▼                → export .vrm                                 │  │
│  │  5. validate_vrm         Check humanoid rig, expressions, visemes      │  │
│  │         │                                                              │  │
│  │         ▼                                                              │  │
│  │  6. promote_to_storage   rclone copy → gravenhollow RustFS S3          │  │
│  │         │                                                              │  │
│  │         ▼                                                              │  │
│  │  7. log_to_mlflow        Parameters, metrics, artifacts → MLflow       │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────┬──────────────────────────────────────┘
                                       │
                           RayJob CR (ephemeral)
                                       │
                                       ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│  desktop (Arch Linux · Ryzen 9 7950X · 64 GB DDR5 · RTX 4070 12 GB)        │
│  On-demand Ray worker (ray start --address=<ray-head>:6379)                 │
│                                                                              │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                     ComfyUI + Custom Nodes                            │  │
│  │                                                                       │  │
│  │  ComfyUI-3D-Pack:                                                     │  │
│  │   • TRELLIS image-large (1.2B) — image → textured GLB mesh           │  │
│  │   • Mesh processing nodes — simplify, UV unwrap, texture bake         │  │
│  │   • 3D preview — viewport render for quality check                    │  │
│  │   • GLB/OBJ/PLY export                                               │  │
│  │                                                                       │  │
│  │  UniRig:                                                              │  │
│  │   • Skeleton prediction — autoregressive bone hierarchy               │  │
│  │   • Skinning weights — bone-point cross-attention                     │  │
│  │   • Merge — skeleton + skin + original mesh → rigged model            │  │
│  │   • Supports GLB, FBX, OBJ input/output                              │  │
│  │                                                                       │  │
│  │  Blender 4.x (headless CLI):                                          │  │
│  │   • VRM Add-on for Blender — GLB → VRM conversion                    │  │
│  │   • Humanoid rig mapping, expression morphs, viseme config            │  │
│  │   • Batch export via bpy scripting                                    │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  GPU: NVIDIA RTX 4070 12 GB (CUDA 12.x)                                    │
│  Ray: worker node with resource label {"nvidia_gpu": 1, "rtx4070": 1}       │
│  Storage: ~/comfyui-3d/ (working dir), rclone → gravenhollow S3             │
└──────────────────────────────────┬──────────────────────────────────────────┘
                                   │
                             rclone (S3)
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│            gravenhollow.lab.daviestechlabs.io                                │
│            (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB)                │
│                                                                              │
│  NFS: /mnt/gravenhollow/kubernetes/avatar-models/                            │
│  ├── Seed-san.vrm          (default model)                                  │
│  ├── Generated-A-v1.vrm    (auto-generated via pipeline)                    │
│  └── animations/           (shared animation clips)                          │
│                                                                              │
│  S3 (RustFS): avatar-models bucket                                          │
│  (same data, served via Cloudflare Tunnel for remote users)                 │
└──────────────────────────┬──────────────────────────────────────────────────┘
                           │
              ┌────────────┴───────────────┐
              │                            │
        NFS (nfs-fast PVC)          Cloudflare Tunnel
              │                     (assets.daviestechlabs.io)
              ▼                            │
┌──────────────────────────┐               ▼
│  companions-frontend     │   ┌──────────────────────────┐
│  (Kubernetes pod)        │   │  Remote users (CDN-cached │
│  LAN users               │   │  via Cloudflare edge)     │
└──────────────────────────┘   └──────────────────────────┘

Ray Cluster Integration

The desktop joins the existing KubeRay-managed cluster as an external worker. It is not a Talos node and not managed by Kubernetes — it connects to the Ray head node's GCS port directly:

┌─────────────────────────────────────────────────────────────────────────────┐
│                     Ray Cluster (KubeRay RayService)                         │
│                                                                              │
│  Head: Ray head pod (in-cluster)                                            │
│  GCS port: 6379 (exposed via NodePort or LoadBalancer)                      │
│                                                                              │
│  In-Cluster Workers (permanent, managed by KubeRay):                        │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐                      │
│  │ khelben  │ │elminster │ │  drizzt  │ │  danilo  │                      │
│  │Strix Halo│ │RTX 2070  │ │Radeon 680│ │Intel Arc │                      │
│  │  ROCm    │ │  CUDA    │ │  ROCm    │ │  Intel   │                      │
│  │ /llm     │ │/whisper  │ │/embeddings│ │/reranker │                      │
│  │          │ │  /tts    │ │          │ │          │                      │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘                      │
│                                                                              │
│  External Worker (on-demand, self-managed):                                 │
│  ┌──────────────────────────────────────────────────┐                      │
│  │  desktop (Arch Linux, external)                   │                      │
│  │  RTX 4070 12 GB · CUDA                            │                      │
│  │  ComfyUI + TRELLIS + UniRig + Blender CLI         │                      │
│  │  Resource labels: {"nvidia_gpu": 1, "3d_gen": 1}  │                      │
│  │  Joins via: ray start --address=<head>:6379       │                      │
│  └──────────────────────────────────────────────────┘                      │
└─────────────────────────────────────────────────────────────────────────────┘

The existing inference deployments (/llm, /whisper, /tts, /embeddings, /reranker) are unaffected — they are pinned to their respective in-cluster GPU nodes via Ray resource labels. The desktop's 3d_gen resource label ensures only 3D generation RayJobs get scheduled there.

Ray Service Multiplexing

The desktop's RTX 4070 can time-share between inference overflow and 3D generation when idle. When no 3D generation jobs are queued, the desktop can optionally serve as overflow capacity for inference workloads:

Mode When What runs on desktop
3D generation ComfyUI workflow triggered (manually or via API) ComfyUI + TRELLIS → UniRig → Blender VRM export
Inference overflow Manually enabled, high-traffic periods vLLM (secondary), Whisper, or TTS replica
Idle Desktop powered on, no jobs Ray worker connected but idle (0 resource cost)

Mode switching is managed by Ray's resource scheduling — 3D jobs request {"3d_gen": 1} and inference jobs request their specific GPU labels. When the desktop is off, all workloads continue on the existing in-cluster fleet with no impact.

Implementation Plan

1. Desktop Environment Setup

# Install NVIDIA drivers + CUDA toolkit (Arch Linux)
sudo pacman -S nvidia nvidia-utils cuda cudnn

# Install Python environment (uv per ADR-0012)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create project directory
mkdir -p ~/comfyui-3d && cd ~/comfyui-3d

# Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
uv venv --python 3.11
source .venv/bin/activate
uv pip install -r requirements.txt

# Install ComfyUI-3D-Pack (includes TRELLIS nodes)
cd custom_nodes
git clone https://github.com/MrForExample/ComfyUI-3D-Pack.git
cd ComfyUI-3D-Pack
uv pip install -r requirements.txt
python install.py

# Install UniRig
cd ~/comfyui-3d
git clone https://github.com/VAST-AI-Research/UniRig.git
cd UniRig
uv pip install torch torchvision
uv pip install -r requirements.txt
uv pip install spconv-cu124  # Match CUDA version
uv pip install flash-attn --no-build-isolation

# Install Blender (headless CLI for VRM export)
sudo pacman -S blender
# Install VRM Add-on
python -c "import bpy, os; bpy.ops.preferences.addon_install(filepath=os.path.abspath('UniRig/blender/add-on-vrm-v2.20.77_modified.zip'))"

# Install rclone for asset promotion
sudo pacman -S rclone
rclone config create gravenhollow s3 \
    provider=Other \
    endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \
    access_key_id=<key> \
    secret_access_key=<secret>

# Install Ray for cluster joining
uv pip install "ray[default]"

2. Ray Worker Configuration

# Join the Ray cluster on demand
# Ray head GCS port must be exposed (NodePort 30637 or similar)
ray start \
    --address=<ray-head-external-ip>:6379 \
    --num-cpus=16 \
    --num-gpus=1 \
    --resources='{"3d_gen": 1, "rtx4070": 1}' \
    --node-name=desktop

# Verify connection
ray status  # Should show desktop as a connected worker

The Ray head's GCS port needs to be reachable from the desktop. Options:

  • NodePort: Expose port 6379 as a NodePort (e.g., 30637) on a cluster node
  • Tailscale/WireGuard: If the desktop is on a different network segment
  • Direct LAN: If desktop and cluster are on the same 192.168.100.0/24 subnet

3. ComfyUI Workflow (Node Graph)

The ComfyUI workflow JSON defines the image-to-GLB pipeline:

[Load Image] → [TRELLIS Image-to-3D] → [Mesh Simplify] → [Texture Bake]
                                                              │
                                                              ▼
                                                       [Save GLB]
                                                              │
                                                              ▼
                                               [UniRig Skeleton Prediction]
                                                              │
                                                              ▼
                                               [UniRig Skinning Weights]
                                                              │
                                                              ▼
                                               [UniRig Merge (rigged model)]
                                                              │
                                                              ▼
                                               [Blender VRM Export (CLI)]
                                                              │
                                                              ▼
                                               [Save VRM → ~/comfyui-3d/exports/]

Key TRELLIS parameters exposed:

  • sparse_structure_sampler_params.steps: 12 (default)
  • sparse_structure_sampler_params.cfg_strength: 7.5
  • slat_sampler_params.steps: 12
  • slat_sampler_params.cfg_strength: 3.0
  • simplify: 0.95 (triangle reduction ratio)
  • texture_size: 1024

4. MLflow Experiment Tracking

The desktop logs directly to the cluster's MLflow service over HTTP. Set MLFLOW_TRACKING_URI in the ComfyUI environment or in a post-generation logging script:

export MLFLOW_TRACKING_URI=http://<mlflow-service>:5000

Each generation run logs to a dedicated MLflow experiment:

What MLflow Concept Content
Reference image Artifact reference.png
TRELLIS parameters Params seed, cfg_strength, steps, simplify, texture_size
UniRig parameters Params skeleton_seed
Raw mesh Artifact {name}_raw.glb (pre-rigging)
Rigged model Artifact {name}_rigged.glb (post-rigging)
Final VRM Artifact {name}.vrm
Mesh quality Metrics vertex_count, face_count, texture_resolution
Rig quality Metrics bone_count, skinning_weight_coverage
Pipeline duration Metrics trellis_time_s, unirig_time_s, total_time_s

5. VRM Export Script (Blender CLI)

#!/usr/bin/env python3
"""vrm_export.py — Headless Blender script for GLB→VRM conversion."""
import bpy
import sys

argv = sys.argv[sys.argv.index("--") + 1:]
input_glb = argv[0]
output_vrm = argv[1]
avatar_name = argv[2] if len(argv) > 2 else "Generated Avatar"

# Clear scene
bpy.ops.wm.read_factory_settings(use_empty=True)

# Import rigged GLB
bpy.ops.import_scene.gltf(filepath=input_glb)

# Select armature
armature = next(obj for obj in bpy.data.objects if obj.type == 'ARMATURE')
bpy.context.view_layer.objects.active = armature

# Configure VRM metadata
armature["vrm_addon_extension"] = {
    "spec_version": "1.0",
    "vrm0": {
        "meta": {
            "title": avatar_name,
            "author": "DaviesTechLabs Pipeline",
            "allowedUserName": "Everyone",
        }
    }
}

# Export VRM
bpy.ops.export_scene.vrm(filepath=output_vrm)
print(f"Exported VRM: {output_vrm}")

Invoked via:

blender --background --python vrm_export.py -- input.glb output.vrm "Avatar Name"

6. Asset Promotion (Reuses ADR-0062 Architecture)

The VRM serving architecture from ADR-0062 is preserved unchanged:

Stage Action
Generate Automated pipeline: image → TRELLIS → UniRig → VRM
Promote rclone copy ~/comfyui-3d/exports/{name}.vrm gravenhollow:avatar-models/
Register Add model path to AllowedAvatarModels in companions-frontend Go + JS allowlists
Deploy Flux rolls out config; model already on NFS PVC — no image rebuild
CDN Cloudflare Tunnel → RustFS → CDN cache at 300+ edge PoPs

Model Requirements and VRAM Budget

Component Model Size VRAM Required Notes
TRELLIS image-large 1.2B params ~10 GB (fp16) Image-to-3D, best quality
TRELLIS text-xlarge 2.0B params ~14 GB (fp16) Text-to-3D, optional
UniRig skeleton ~350M params ~4 GB Autoregressive skeleton prediction
UniRig skinning ~350M params ~4 GB Bone-point cross-attention
Blender CLI N/A CPU only Headless VRM export

RTX 4070 budget (12 GB): Models are loaded sequentially (not concurrently) — TRELLIS runs first, output is saved to disk, then UniRig loads for rigging. Peak VRAM usage is ~10 GB during TRELLIS inference. The desktop's 64 GB system RAM provides ample buffer for model loading and mesh processing.

Security Considerations

  • Ray GCS port exposure: The Ray head's port 6379 must be reachable from the desktop. Use a NodePort with network policy restricting source IPs to the desktop's address, or use a WireGuard/Tailscale tunnel.
  • No cluster credentials on desktop: The desktop runs Ray worker processes and ComfyUI only — it has no kubeconfig or Kubernetes API access. Generation is triggered locally via ComfyUI's UI or API, not from the cluster.
  • Model provenance: TRELLIS and UniRig checkpoints are downloaded from Hugging Face (Microsoft and VAST-AI orgs respectively). Pin checkpoint hashes in the setup script.
  • ComfyUI network: ComfyUI's web UI (port 8188) should be bound to localhost only when not in use. It is not exposed to the cluster.
  • rclone credentials: gravenhollow RustFS write credentials stored in ~/.config/rclone/rclone.conf with 600 permissions.
  • Generated content: Auto-generated 3D models inherit no licensing restrictions (TRELLIS and UniRig are both MIT-licensed).

Future Considerations

  • Kubeflow pipeline for model refinement: When iterating on existing models (re-rigging, parameter sweeps, A/B testing generation backends), a Kubeflow pipeline can orchestrate multi-step refinement workflows with artifact lineage, caching, and retries — submitting RayJobs to the desktop worker via the existing KFP + RayJob pattern from ADR-0058
  • DGX Spark (ADR-0058): When acquired, could run TRELLIS + UniRig in-cluster with dedicated GPU, eliminating desktop dependency
  • Stable3DGen / Hunyuan3D alternatives: ComfyUI-3D-Pack supports multiple generation backends — can A/B test quality via MLflow metrics
  • VRM expression morphs: Investigate automated viseme and expression blendshape generation for full lip-sync support without manual Blender work
  • ComfyUI API mode: ComfyUI supports headless API-only execution (--listen 0.0.0.0 --port 8188) — a script or future Kubeflow pipeline can submit workflows via HTTP POST to /prompt
  • Text-to-3D: Use the cluster's vLLM instance to generate a character description, then Stable Diffusion (on desktop) to create a reference image, feeding into TRELLIS — fully text-to-avatar pipeline
  • Batch generation: Schedule overnight batch runs via CronWorkflow to generate avatar libraries from curated reference images
  • In-cluster migration: If a 16+ GB NVIDIA GPU is added to the cluster (e.g., via DGX Spark or RTX 5070), migrate TRELLIS + UniRig to a dedicated Ray Serve deployment for always-available generation
  • Supersedes: ADR-0062 — BlenderMCP for 3D avatar creation (interactive workflow)
  • Updates: ADR-0059 — waterdeep retains Blender role for manual refinement only
  • Related: ADR-0046 — Companions frontend architecture (Three.js + VRM avatars)
  • Related: ADR-0011 — KubeRay unified GPU backend
  • Related: ADR-0005 — Multi-GPU heterogeneous strategy
  • Related: ADR-0058 — Training strategy (Kubeflow + RayJob pattern for future pipeline work)
  • Related: ADR-0047 — MLflow experiment tracking
  • Related: ADR-0026 — Storage strategy (gravenhollow NFS-fast, RustFS S3)
  • Microsoft TRELLIS — Structured 3D Latents for Scalable 3D Generation (CVPR'25 Spotlight)
  • VAST-AI UniRig — One Model to Rig Them All (SIGGRAPH'25)
  • ComfyUI-3D-Pack — Extensive 3D node suite for ComfyUI
  • VRM Add-on for Blender
  • @pixiv/three-vrm (runtime loader in companions-frontend)