13 Commits

Author SHA1 Message Date
Gitea Actions
b64102d853 docs: auto-update ADR index in README [skip ci] 2026-02-24 10:35:37 +00:00
4affddf9b4 replacing mcp blender with reproducable flow.
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m5s
2026-02-24 05:34:29 -05:00
Gitea Actions
e28382a765 docs: auto-update ADR index in README [skip ci] 2026-02-23 11:15:32 +00:00
100ba21eba updates to adrs and fixing to reflect go refactor.
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m2s
2026-02-23 06:14:30 -05:00
Gitea Actions
f19fa3e969 docs: auto-update ADR index in README [skip ci] 2026-02-21 21:28:33 +00:00
50b14b2a75 ADR-0059: repurpose waterdeep from Ray worker to local AI agent
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m6s
Replace the proposed Ray cluster worker role with a dedicated local
AI agent for BlenderMCP 3D avatar creation (supporting ADR-0062).

waterdeep's Metal GPU provides hardware-accelerated rendering in
Blender — far superior to Kasm's CPU-only DinD environment. The
Ray cluster GPU fleet is fully allocated and stable; adding MPS
complexity is not justified.

Also adds cross-reference from ADR-0062 to ADR-0059.
2026-02-21 16:27:25 -05:00
Gitea Actions
32e370401f docs: auto-update ADR index in README [skip ci] 2026-02-21 21:18:40 +00:00
654b7ae774 ADR-0062: replace Cloudflare R2 with RustFS via Cloudflare Tunnel
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m8s
gravenhollow RustFS is already S3-compatible — expose it through
the existing Cloudflare Tunnel with a dedicated HTTPRoute at
assets.daviestechlabs.io. Cloudflare CDN caches at edge PoPs.

Eliminates: R2 bucket, rclone sync CronJob, R2 API token, and
6-hour sync delay. Single source of truth on gravenhollow.
2026-02-21 16:17:20 -05:00
9fe12e0cff ADR-0062: BlenderMCP 3D avatar workflow with Kasm, gravenhollow NFS, and Cloudflare R2
Some checks failed
Update README with ADR Index / update-readme (push) Has been cancelled
2026-02-21 16:13:36 -05:00
Gitea Actions
defbd5b2f9 docs: auto-update ADR index in README [skip ci] 2026-02-21 20:58:43 +00:00
555b70b9d9 docs: accept ADR-0061 (Go handler refactor), supersede ADR-0004 (msgpack→protobuf)
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m5s
All 5 handler services + companions-frontend migrated to handler-base v1.0.0
with protobuf wire format. golangci-lint clean across all repos.
2026-02-21 15:46:43 -05:00
Gitea Actions
f650b4bd22 docs: auto-update ADR index in README [skip ci] 2026-02-20 12:38:33 +00:00
cbd892c7c9 Merge pull request 'docs(adr): add ADR-0061 Go handler refactor' (#1) from feature/go-handler-refactor into main
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m2s
Reviewed-on: #1
2026-02-20 12:32:57 +00:00
10 changed files with 1311 additions and 368 deletions

View File

@@ -22,13 +22,13 @@ You are working on a **homelab Kubernetes cluster** running:
| Repo | Purpose |
|------|---------|
| `handler-base` | Shared Python library for NATS handlers |
| `chat-handler` | Text chat with RAG pipeline |
| `voice-assistant` | Voice pipeline (STT → RAG → LLM → TTS) |
| `handler-base` | Shared Go module for NATS handlers (protobuf, health, OTel, clients) |
| `chat-handler` | Text chat with RAG pipeline (Go) |
| `voice-assistant` | Voice pipeline: STT → RAG → LLM → TTS (Go) |
| `kuberay-images` | GPU-specific Ray worker Docker images |
| `pipeline-bridge` | Bridge between pipelines and services |
| `stt-module` | Speech-to-text service |
| `tts-module` | Text-to-speech service |
| `pipeline-bridge` | Bridge between pipelines and services (Go) |
| `stt-module` | Speech-to-text service (Go) |
| `tts-module` | Text-to-speech service (Go) |
| `ray-serve` | Ray Serve inference services |
| `argo` | Argo Workflows (training, batch inference) |
| `kubeflow` | Kubeflow Pipeline definitions |
@@ -48,7 +48,7 @@ You are working on a **homelab Kubernetes cluster** running:
┌─────────────────────────────────────────────────────────────────┐
│ NATS MESSAGE BUS │
│ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │
│ Format: MessagePack (binary)
│ Format: Protocol Buffers (binary, see ADR-0061)
└───────────────────────────┬─────────────────────────────────────┘
┌───────────────────┼───────────────────┐
@@ -93,19 +93,23 @@ talos/
### AI/ML Services (Gitea daviestechlabs org)
```
handler-base/ # Shared handler library
├── handler_base/ # Core classes
│ ├── handler.py # Base Handler class
│ ├── nats_client.py # NATS wrapper
│ └── clients/ # Service clients (STT, TTS, LLM, etc.)
handler-base/ # Shared Go module (NATS, health, OTel, protobuf)
├── clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker)
├── config/ # Env-based configuration (struct tags)
├── gen/messagespb/ # Generated protobuf stubs
├── handler/ # Typed NATS message handler
├── health/ # HTTP health + readiness server
└── natsutil/ # NATS publish/request with protobuf
chat-handler/ # RAG chat service
├── chat_handler_v2.py # Handler-base version
── Dockerfile.v2
chat-handler/ # RAG chat service (Go)
├── main.go
── main_test.go
└── Dockerfile
voice-assistant/ # Voice pipeline service
├── voice_assistant_v2.py # Handler-base version
── pipelines/voice_pipeline.py
voice-assistant/ # Voice pipeline service (Go)
├── main.go
── main_test.go
└── Dockerfile
argo/ # Argo WorkflowTemplates
├── batch-inference.yaml
@@ -127,8 +131,23 @@ kuberay-images/ # GPU worker images
## 🔌 Service Endpoints (Internal)
```go
// Copy-paste ready for Go handler services
const (
NATSUrl = "nats://nats.ai-ml.svc.cluster.local:4222"
VLLMUrl = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
WhisperUrl = "http://whisper-predictor.ai-ml.svc.cluster.local"
TTSUrl = "http://tts-predictor.ai-ml.svc.cluster.local"
EmbeddingsUrl = "http://embeddings-predictor.ai-ml.svc.cluster.local"
RerankerUrl = "http://reranker-predictor.ai-ml.svc.cluster.local"
MilvusHost = "milvus.ai-ml.svc.cluster.local"
MilvusPort = 19530
ValkeyUrl = "redis://valkey.ai-ml.svc.cluster.local:6379"
)
```
```python
# Copy-paste ready for Python code
# For Python services (Ray Serve, Kubeflow pipelines, Gradio UIs)
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
@@ -175,7 +194,7 @@ f"ai.pipeline.status.{request_id}" # Status updates
### Add a New NATS Handler
1. Create handler repo or add to existing (use `handler-base` library)
1. Create Go handler repo using `handler-base` module (see [ADR-0061](decisions/0061-go-handler-refactor.md))
2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
3. Push to main → Flux deploys automatically

View File

@@ -44,7 +44,7 @@ The homelab is a production-grade Kubernetes cluster running on bare-metal hardw
│ │ • AI_PIPELINE (24h, file) - Workflow triggers │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Message Format: MessagePack (binary, not JSON)
│ Message Format: Protocol Buffers (binary, see ADR-0061)
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────┼─────────────────────────┐
@@ -312,12 +312,12 @@ Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafan
|----------|-----------|-----|
| Talos Linux | Immutable, API-driven, secure | [ADR-0002](decisions/0002-use-talos-linux.md) |
| NATS over Kafka | Simpler ops, sufficient throughput | [ADR-0003](decisions/0003-use-nats-for-messaging.md) |
| MessagePack over JSON | Binary efficiency for audio | [ADR-0004](decisions/0004-use-messagepack-for-nats.md) |
| Protocol Buffers over MessagePack | Type-safe, schema-driven, Go-native | [ADR-0061](decisions/0061-go-handler-refactor.md) |
| Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) |
| GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
| KServe for inference | Standardized API, autoscaling | [ADR-0007](decisions/0007-use-kserve-for-inference.md) |
| KubeRay unified backend | Fractional GPU, single endpoint | [ADR-0011](decisions/0011-kuberay-unified-gpu-backend.md) |
| Go handler refactor | Slim images for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) |
| Go handler refactor | Slim images, type-safe protobuf for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) |
## Related Documents

View File

@@ -28,27 +28,29 @@ kubernetes/
### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
```
handler-base/ # Shared library for all handlers
├── handler_base/
│ ├── handler.py # Base Handler class
│ ├── nats_client.py # NATS wrapper
│ ├── config.py # Pydantic Settings
├── health.py # K8s probes
│ ├── telemetry.py # OpenTelemetry
│ └── clients/ # Service clients
├── tests/
── pyproject.toml
handler-base/ # Shared Go module for all NATS handlers
├── clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker)
├── config/ # Env-based configuration (struct tags)
├── gen/messagespb/ # Generated protobuf stubs
├── handler/ # Typed NATS message handler with OTel + health wiring
├── health/ # HTTP health + readiness server
├── messages/ # Type aliases from generated protobuf stubs
├── natsutil/ # NATS publish/request with protobuf encoding
├── proto/messages/v1/ # .proto schema source
── go.mod
└── buf.yaml # buf protobuf toolchain config
chat-handler/ # Text chat service
voice-assistant/ # Voice pipeline service
pipeline-bridge/ # Workflow engine bridge
├── {name}.py # Handler implementation (uses handler-base)
├── pyproject.toml # PEP 621 project metadata (see ADR-0012)
├── uv.lock # Deterministic lock file
├── tests/
│ ├── conftest.py
│ └── test_{name}.py
── Dockerfile
chat-handler/ # Text chat service (Go)
voice-assistant/ # Voice pipeline service (Go)
pipeline-bridge/ # Workflow engine bridge (Go)
stt-module/ # Speech-to-text bridge (Go)
tts-module/ # Text-to-speech bridge (Go)
├── main.go # Service entry point
├── main_test.go # Unit tests
├── e2e_test.go # End-to-end tests
├── go.mod # Go module (depends on handler-base)
── Dockerfile # Distroless container (~20 MB)
└── renovate.json # Dependency update config
argo/ # Argo WorkflowTemplates
├── {workflow-name}.yaml
@@ -138,7 +140,20 @@ tts_task = synthesize_speech(text=llm_task.output) # noqa: F841
### Project Structure
```go
// Go handler services use handler-base shared module
import (
"git.daviestechlabs.io/daviestechlabs/handler-base/clients"
"git.daviestechlabs.io/daviestechlabs/handler-base/config"
"git.daviestechlabs.io/daviestechlabs/handler-base/handler"
"git.daviestechlabs.io/daviestechlabs/handler-base/health"
"git.daviestechlabs.io/daviestechlabs/handler-base/messages"
"git.daviestechlabs.io/daviestechlabs/handler-base/natsutil"
)
```
```python
# Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs
# Use async/await for I/O
async def handle_message(msg: Msg) -> None:
...
@@ -149,10 +164,6 @@ class ChatRequest:
user_id: str
message: str
enable_rag: bool = True
# Use msgpack for NATS messages
import msgpack
data = msgpack.packb({"key": "value"})
```
### Naming
@@ -200,31 +211,36 @@ except Exception as e:
### NATS Message Handling
```python
import nats
import msgpack
All NATS handler services use Go with Protocol Buffers encoding (see [ADR-0061](decisions/0061-go-handler-refactor.md)):
async def message_handler(msg: Msg) -> None:
try:
# Decode MessagePack
data = msgpack.unpackb(msg.data, raw=False)
```go
// Go NATS handler (production pattern)
func (h *Handler) handleMessage(msg *nats.Msg) {
var req messages.ChatRequest
if err := proto.Unmarshal(msg.Data, &req); err != nil {
h.logger.Error("failed to unmarshal", "error", err)
return
}
# Process
result = await process(data)
// Process
result, err := h.process(ctx, &req)
if err != nil {
h.logger.Error("handler error", "error", err)
msg.Nak()
return
}
# Reply if request-reply pattern
if msg.reply:
await msg.respond(msgpack.packb(result))
# Acknowledge for JetStream
await msg.ack()
except Exception as e:
logger.error(f"Handler error: {e}")
# NAK for retry (JetStream)
await msg.nak()
// Reply if request-reply pattern
if msg.Reply != "" {
data, _ := proto.Marshal(result)
msg.Respond(data)
}
msg.Ack()
}
```
> **Python NATS** is still used in Ray Serve `runtime_env` and Kubeflow pipeline components where needed, but all dedicated NATS handler services are Go.
---
## Kubernetes Manifest Conventions
@@ -499,8 +515,9 @@ Each application should have a README with:
| Use `latest` image tags | Pin to specific versions |
| Skip health checks | Always define liveness/readiness |
| Ignore resource limits | Set appropriate requests/limits |
| Use JSON for NATS messages | Use MessagePack (binary) |
| Synchronous I/O in handlers | Use async/await |
| Use JSON for NATS messages | Use Protocol Buffers (see ADR-0061) |
| Write handler services in Python | Use Go with handler-base module (ADR-0061) |
| Synchronous I/O in handlers | Use goroutines / async patterns |
---

View File

@@ -8,7 +8,7 @@
[![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)
<!-- ADR-BADGES-START -->
![ADR Count](https://img.shields.io/badge/ADRs-60_total-blue?logo=bookstack) ![Accepted](https://img.shields.io/badge/accepted-58-brightgreen) ![Proposed](https://img.shields.io/badge/proposed-1-yellow)
![ADR Count](https://img.shields.io/badge/ADRs-63_total-blue?logo=bookstack) ![Accepted](https://img.shields.io/badge/accepted-58-brightgreen) ![Proposed](https://img.shields.io/badge/proposed-1-yellow)
<!-- ADR-BADGES-END -->
## 📖 Quick Navigation
@@ -94,7 +94,7 @@ homelab-design/
| 0001 | [Record Architecture Decisions](decisions/0001-record-architecture-decisions.md) | ✅ accepted | 2025-11-30 |
| 0002 | [Use Talos Linux for Kubernetes Nodes](decisions/0002-use-talos-linux.md) | ✅ accepted | 2025-11-30 |
| 0003 | [Use NATS for AI/ML Messaging](decisions/0003-use-nats-for-messaging.md) | ✅ accepted | 2025-12-01 |
| 0004 | [Use MessagePack for NATS Messages](decisions/0004-use-messagepack-for-nats.md) | ✅ accepted | 2025-12-01 |
| 0004 | [Use MessagePack for NATS Messages](decisions/0004-use-messagepack-for-nats.md) | ♻️ superseded by [ADR-0061](0061-go-handler-refactor.md) (Protocol Buffers) | 2025-12-01 |
| 0005 | [Multi-GPU Heterogeneous Strategy](decisions/0005-multi-gpu-strategy.md) | ✅ accepted | 2025-12-01 |
| 0006 | [GitOps with Flux CD](decisions/0006-gitops-with-flux.md) | ✅ accepted | 2025-11-30 |
| 0007 | [Use KServe for ML Model Serving](decisions/0007-use-kserve-for-inference.md) | ♻️ superseded by [ADR-0011](0011-kuberay-unified-gpu-backend.md) | 2025-12-15 (Updated: 2026-02-02) |
@@ -109,7 +109,7 @@ homelab-design/
| 0016 | [Affine Email Verification Strategy for Authentik OIDC](decisions/0016-affine-email-verification-strategy.md) | ✅ accepted | 2026-02-04 |
| 0017 | [Secrets Management Strategy](decisions/0017-secrets-management-strategy.md) | ✅ accepted | 2026-02-04 |
| 0018 | [Security Policy Enforcement](decisions/0018-security-policy-enforcement.md) | ✅ accepted | 2026-02-04 |
| 0019 | [Python Module Deployment Strategy](decisions/0019-handler-deployment-strategy.md) | ✅ accepted | 2026-02-02 |
| 0019 | [Python Module Deployment Strategy](decisions/0019-handler-deployment-strategy.md) | ♻️ superseded by [ADR-0061](0061-go-handler-refactor.md) | 2026-02-02 |
| 0020 | [Internal Registry URLs for CI/CD](decisions/0020-internal-registry-for-cicd.md) | ✅ accepted | 2026-02-02 |
| 0021 | [Notification Architecture](decisions/0021-notification-architecture.md) | ✅ accepted | 2026-02-04 |
| 0022 | [ntfy-Discord Bridge Service](decisions/0022-ntfy-discord-bridge.md) | ✅ accepted | 2026-02-04 |
@@ -149,8 +149,11 @@ homelab-design/
| 0056 | [Custom Trained Voice Support in TTS Module](decisions/0056-custom-voice-support-tts-module.md) | ✅ accepted | 2026-02-13 |
| 0057 | [Per-Repository Renovate Configurations](decisions/0057-renovate-per-repo-configs.md) | ✅ accepted | 2026-02-13 |
| 0058 | [Training Strategy Distributed CPU Now, DGX Spark Later](decisions/0058-training-strategy-cpu-dgx-spark.md) | ✅ accepted | 2026-02-14 |
| 0059 | [Add Mac Mini M4 Pro (waterdeep) to Ray Cluster as External Worker](decisions/0059-mac-mini-ray-worker.md) | 📝 proposed | 2026-02-16 |
| 0059 | [Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation](decisions/0059-mac-mini-ray-worker.md) | ✅ accepted | 2026-02-16 |
| 0060 | [Internal PKI with Vault and cert-manager](decisions/0060-internal-pki-vault.md) | ✅ accepted | 2026-02-16 |
| 0061 | [Refactor NATS Handler Services from Python to Go](decisions/0061-go-handler-refactor.md) | ✅ accepted | 2026-02-19 |
| 0062 | [BlenderMCP for 3D Avatar Creation via Kasm Workstation](decisions/0062-blender-mcp-3d-avatar-workflow.md) | ♻️ superseded by [ADR-0063](0063-comfyui-3d-avatar-pipeline.md) | 2026-02-21 |
| 0063 | [ComfyUI Image-to-3D Avatar Pipeline with TRELLIS + UniRig](decisions/0063-comfyui-3d-avatar-pipeline.md) | 📝 proposed | 2026-02-24 |
<!-- ADR-TABLE-END -->
## 🔗 Related Repositories
@@ -188,4 +191,4 @@ The former monolithic `llm-workflows` repo has been archived and decomposed into
---
*Last updated: 2026-02-17*
*Last updated: 2026-02-24*

View File

@@ -117,9 +117,14 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca
| Application | Language | Framework | Purpose |
|-------------|----------|-----------|---------|
| Companions | Go | net/http + HTMX | AI chat interface |
| Voice WebApp | Python | Gradio | Voice assistant UI |
| Various handlers | Python | asyncio + nats.py | NATS event handlers |
| Companions | Go | net/http + HTMX | AI chat interface (SSR) |
| Chat Handler | Go | handler-base | RAG + LLM text pipeline |
| Voice Assistant | Go | handler-base | STT → RAG → LLM → TTS pipeline |
| Pipeline Bridge | Go | handler-base | Kubeflow/Argo workflow triggers |
| STT Module | Go | handler-base | Speech-to-text bridge |
| TTS Module | Go | handler-base | Text-to-speech bridge |
| Voice WebApp | Python | Gradio | Voice assistant UI (dev/testing) |
| Ray Serve | Python | Ray Serve | GPU inference endpoints |
### Frontend
@@ -242,27 +247,41 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca
---
## Python Dependencies (handler-base)
## Go Dependencies (handler-base)
Core library for all NATS handlers: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
Shared Go module for all NATS handler services: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
```go
// go.mod (handler-base v1.0.0)
require (
github.com/nats-io/nats.go // NATS client
google.golang.org/protobuf // Protocol Buffers encoding
github.com/zitadel/oidc/v3 // OIDC client
go.opentelemetry.io/otel // OpenTelemetry traces + metrics
github.com/milvus-io/milvus-sdk-go // Milvus vector search
)
```
See [ADR-0061](decisions/0061-go-handler-refactor.md) for the full refactoring rationale.
## Python Dependencies (ML/AI only)
Python is retained for ML inference, pipeline orchestration, and dev tools:
```toml
# Core
nats-py>=2.7.0 # NATS client
msgpack>=1.0.0 # Binary serialization
httpx>=0.27.0 # HTTP client
# ray-serve (GPU inference)
ray[serve]>=2.53.0
vllm>=0.8.0
faster-whisper>=1.0.0
TTS>=0.22.0
sentence-transformers>=3.0.0
# ML/AI
pymilvus>=2.4.0 # Milvus client
openai>=1.0.0 # vLLM OpenAI API
# kubeflow (pipeline definitions)
kfp>=2.12.1
# Observability
opentelemetry-api>=1.20.0
opentelemetry-sdk>=1.20.0
mlflow>=2.10.0 # Experiment tracking
# Kubeflow (kubeflow repo)
kfp>=2.12.1 # Pipeline SDK
# mlflow (experiment tracking)
mlflow>=3.7.0
pymilvus>=2.4.0
```
---

View File

@@ -1,10 +1,12 @@
# Python Module Deployment Strategy
* Status: accepted
* Status: superseded by [ADR-0061](0061-go-handler-refactor.md)
* Date: 2026-02-02
* Deciders: Billy
* Technical Story: Define how Python handler modules are packaged and deployed to Kubernetes
> **Note (2026-02-23):** This ADR described deploying Python handlers as Ray Serve applications inside the Ray cluster. [ADR-0061](0061-go-handler-refactor.md) supersedes this approach — all five handler services (chat-handler, voice-assistant, pipeline-bridge, tts-module, stt-module) have been rewritten in Go and now deploy as standalone Kubernetes Deployments with distroless container images (~20 MB each). The Ray cluster is exclusively used for GPU inference workloads. The handler-base shared library is now a Go module published at `git.daviestechlabs.io/daviestechlabs/handler-base` using Protocol Buffers for NATS message encoding.
## Context
We have Python modules for AI/ML workflows that need to run on our unified GPU cluster:

View File

@@ -14,7 +14,7 @@ How do we build a performant, maintainable frontend that integrates with the NAT
## Decision Drivers
* Real-time streaming for chat and voice (WebSocket required)
* Direct integration with NATS JetStream (binary MessagePack protocol)
* Direct integration with NATS JetStream (Protocol Buffers encoding, see [ADR-0061](0061-go-handler-refactor.md))
* Minimal client-side JavaScript (~20KB gzipped target)
* No frontend build step (no webpack/vite/node required)
* 3D avatar rendering for immersive experience
@@ -39,8 +39,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
* No npm, no webpack, no build step — assets served directly
* Server-side rendering via Go templates
* WebSocket handled natively in Go (gorilla/websocket)
* NATS integration with MessagePack in the same binary
* NATS integration with Protocol Buffers in the same binary
* Distroless container image for minimal attack surface
* Type-safe NATS messages via handler-base shared Go module (protobuf stubs)
### Negative Consequences
@@ -58,8 +59,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
| Client state | Alpine.js 3 | Lightweight reactive UI for local state |
| 3D Avatars | Three.js + VRM | 3D character rendering with lip-sync |
| Styling | Tailwind CSS 4 + DaisyUI | Utility-first CSS with component library |
| Messaging | NATS JetStream | Real-time pub/sub with MessagePack encoding |
| Messaging | NATS JetStream | Real-time pub/sub with Protocol Buffers encoding |
| Auth | golang-jwt/jwt/v5 | JWT token handling for OAuth flows |
| Shared lib | handler-base (Go module) | NATS client, protobuf messages, health, OTel, HTTP clients |
| Database | PostgreSQL (lib/pq) + SQLite | Persistent + local session storage |
| Observability | OpenTelemetry SDK | Traces, metrics via OTLP gRPC |
@@ -88,7 +90,7 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
│ ┌─────────┴─────────┐ │
│ │ NATS Client │ │
│ │ (JetStream + │ │
│ │ MessagePack) │ │
│ │ Protobuf) │ │
│ └─────────┬─────────┘ │
└────────────────────────┼────────────────────────────────────────┘
@@ -130,8 +132,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
## Links
* Related to [ADR-0003](0003-use-nats-for-messaging.md) (NATS messaging)
* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding)
* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding — superseded by Protocol Buffers, see [ADR-0061](0061-go-handler-refactor.md))
* Related to [ADR-0011](0011-kuberay-unified-gpu-backend.md) (Ray Serve backend)
* Related to [ADR-0028](0028-authentik-sso-strategy.md) (OAuth/OIDC)
* Related to [ADR-0061](0061-go-handler-refactor.md) (Go handler refactor — handler-base shared module, protobuf wire format)
* [HTMX Documentation](https://htmx.org/docs/)
* [VRM Specification](https://vrm.dev/en/)

View File

@@ -1,338 +1,287 @@
# Add Mac Mini M4 Pro (waterdeep) to Ray Cluster as External Worker
# Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation
* Status: proposed
* Status: accepted
* Date: 2026-02-16
* Updated: 2026-02-23
* Deciders: Billy
* Technical Story: Expand Ray cluster with Apple Silicon compute for inference and training
* Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role
## Context and Problem Statement
The homelab Ray cluster currently runs entirely within Kubernetes, with GPU workers pinned to specific nodes:
**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see [ADR-0037](0037-node-naming-conventions.md)). The original proposal was to add it to the Ray cluster as an external inference/training worker, but:
| Node | GPU | Memory | Workload |
|------|-----|--------|----------|
| khelben | Strix Halo (ROCm) | 128 GB unified | vLLM 70B (0.95 GPU) |
| elminster | RTX 2070 (CUDA) | 8 GB VRAM | Whisper (0.5) + TTS (0.5) |
| drizzt | Radeon 680M (ROCm) | 12 GB VRAM | Embeddings (0.8) |
| danilo | Intel Arc (i915) | ~6 GB shared | Reranker (0.8) |
- All Ray inference slots are already allocated and stable — adding a 5th GPU class (MPS) increases complexity without filling a gap
- vLLM's MPS backend remains experimental — not production-ready for serving
- The real unmet need is **3D avatar creation** for companions-frontend ([ADR-0063](0063-comfyui-3d-avatar-pipeline.md))
All GPUs are fully allocated to inference (see [ADR-0005](0005-multi-gpu-strategy.md), [ADR-0011](0011-kuberay-unified-gpu-backend.md)). Training is currently CPU-only and distributed across cluster nodes via Ray Train ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)).
[ADR-0063](0063-comfyui-3d-avatar-pipeline.md) describes an automated ComfyUI + TRELLIS + UniRig pipeline for image-to-VRM avatar generation, running on a personal desktop as an on-demand Ray worker. This supersedes the manual BlenderMCP Kasm workflow from [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md). waterdeep retains its role as an interactive Blender workstation for manual refinement of auto-generated models.
**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see [ADR-0037](0037-node-naming-conventions.md)). Its Apple Silicon GPU (MPS backend) and unified memory architecture make it a strong candidate for both inference and training workloads — but macOS cannot run Talos Linux or easily join the Kubernetes cluster as a native node.
waterdeep's M4 Pro has a 16-core GPU with hardware-accelerated Metal rendering and 48 GB of unified memory shared between CPU and GPU. Running Blender natively on waterdeep with BlenderMCP gives a dramatically better 3D creation experience than Kasm.
How do we integrate waterdeep's compute into the Ray cluster without disrupting the existing Kubernetes-managed infrastructure?
How should we use waterdeep to maximise the 3D avatar creation pipeline for companions-frontend?
## Decision Drivers
* 48 GB unified memory is sufficient for medium-large models (e.g., 7B30B at Q4/Q8 quantisation)
* Apple Silicon MPS backend is supported by PyTorch and vLLM (experimental)
* macOS cannot run Talos Linux — must integrate without Kubernetes
* Ray natively supports heterogeneous clusters with external workers
* Must not impact existing inference serving stability
* Training workloads ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) would benefit from a GPU-accelerated worker
* ARM64 architecture requires compatible Python packages and model formats
* Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance
* waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting
* 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping
* VS Code with Copilot agent mode and BlenderMCP server are installed on waterdeep — VS Code drives Blender via localhost:9876 with zero-latency socket communication
* Exported VRM models must reach gravenhollow for production serving ([ADR-0063](0063-comfyui-3d-avatar-pipeline.md))
* **rclone** chosen for asset promotion to gravenhollow's RustFS S3 endpoint — simpler than NFS mounts on macOS, consistent with existing Kasm rclone patterns, and avoids autofs/NFS fstab complexity
* The automated ComfyUI pipeline from [ADR-0063](0063-comfyui-3d-avatar-pipeline.md) handles most avatar generation; waterdeep serves as the manual refinement station
* ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified
## Considered Options
1. **External Ray worker on macOS** — run a Ray worker process natively on waterdeep that connects to the cluster Ray head over the network
2. **Linux VM on Mac** — run UTM/Parallels VM with Linux, join as a Kubernetes node
3. **K3s agent on macOS** — run K3s directly on macOS via Docker Desktop
1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via rclone (S3)
2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training
3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062
## Decision Outcome
Chosen option: **Option 1 — External Ray worker on macOS**, because Ray natively supports heterogeneous workers joining over the network. This avoids the complexity of running Kubernetes on macOS, lets waterdeep remain a development workstation, and leverages Apple Silicon MPS acceleration transparently through PyTorch.
Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Mini's Metal GPU makes it dramatically better for 3D work than CPU-rendered Kasm, the Ray cluster doesn't need another worker, and the local workflow eliminates network latency between VS Code, the MCP server, and Blender.
### Positive Consequences
* Zero Kubernetes overhead on waterdeep — remains a usable dev workstation
* 48 GB unified memory available for models (vs split VRAM/RAM on discrete GPUs)
* MPS GPU acceleration for both inference and training
* Adds a 5th GPU class to the Ray fleet (Apple MPS alongside ROCm, CUDA, Intel, RDNA2)
* Training jobs ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) gain a GPU-accelerated worker
* Can run a secondary LLM instance for overflow or A/B testing
* Quick to set up — single `ray start` command
* Worker can be stopped/started without affecting the cluster
* Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting
* Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution
* 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure
* VS Code + Copilot agent mode + BlenderMCP server installed natively — single editor drives both code and Blender commands
* rclone for asset promotion — consistent with Kasm rclone patterns, avoids macOS NFS/autofs complexity
* Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload
* Kasm Blender remains available as a browser-based fallback for remote/mobile access
* Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend
### Negative Consequences
* Not managed by KubeRay or Flux — requires manual or launchd-based lifecycle management
* Network dependency — if waterdeep sleeps or disconnects, Ray tasks on it fail
* MPS backend has limited operator coverage compared to CUDA/ROCm
* Python environment must be maintained separately (not in a container image)
* No Longhorn storage — model cache managed locally or via NFS mount from gravenhollow (nfs-fast)
* Monitoring not automatically scraped by Prometheus (needs node-exporter or push gateway)
* Blender, VS Code, and add-ons must be installed and maintained locally on waterdeep via Homebrew
* Assets created locally need explicit `rclone copy` to promote to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
* waterdeep is a single machine — no redundancy for the 3D creation workflow
* Not managed by Kubernetes or GitOps — relies on Homebrew-managed tooling
## Pros and Cons of the Options
### Option 1: External Ray worker on macOS
### Option 1: Local AI agent on waterdeep
* Good, because Ray is designed for heterogeneous multi-node clusters
* Good, because no VM overhead — full access to Metal/MPS and unified memory
* Good, because waterdeep remains a functional dev workstation
* Good, because trivial to start/stop (single process)
* Bad, because not managed by Kubernetes or GitOps
* Bad, because requires manual Python environment management
* Bad, because MPS support in vLLM is experimental
* Good, because Metal GPU acceleration makes Blender usable for real 3D work (sculpting, rendering, material preview)
* Good, because localhost MCP socket eliminates all network latency
* Good, because 48 GB unified memory supports complex scenes without swapping
* Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer
* Good, because waterdeep stays a dev workstation, aligning with its named role
* Bad, because local-only — no browser-based remote access (use Kasm for that)
* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP, VS Code)
* Bad, because asset promotion to gravenhollow requires explicit rclone command
### Option 2: Linux VM on Mac
### Option 2: External Ray worker on macOS (original proposal)
* Good, because would be a standard Kubernetes node
* Good, because managed by KubeRay like other workers
* Bad, because VM overhead reduces available memory (hypervisor, guest OS)
* Bad, because no MPS/Metal GPU passthrough to Linux VMs on Apple Silicon
* Bad, because complex to maintain (VM lifecycle, networking, storage)
* Bad, because wastes the primary advantage (Apple Silicon GPU)
* Good, because adds GPU compute to the Ray cluster
* Good, because training jobs gain MPS acceleration
* Bad, because vLLM MPS backend is experimental — not production-ready
* Bad, because adds a 5th GPU class (MPS) to an already complex fleet
* Bad, because Ray GCS port exposure adds security surface
* Bad, because doesn't address the actual unmet need (3D avatar creation)
* Bad, because waterdeep becomes a server, degrading its dev workstation role
### Option 3: K3s agent on macOS
### Option 3: Kasm-only workflow
* Good, because Kubernetes-native, managed by Flux
* Bad, because K3s on macOS requires Docker Desktop (resource overhead)
* Bad, because container networking on macOS is fragile
* Bad, because MPS device access from within Docker containers is unreliable
* Bad, because not a supported K3s configuration
* Good, because browser-based — usable from any device
* Good, because no local installation required
* Bad, because CPU-rendered Blender inside DinD — poor viewport performance
* Bad, because network latency between VS Code and Blender socket
* Bad, because limited memory inside Kasm container
* Bad, because no GPU acceleration for rendering or sculpting
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────
Kubernetes Cluster (Talos)
│ ┌──────────────────────────────────────────────────────────────────┐
│ │ RayService (ai-inference) — KubeRay managed
│ │
│ │ Head: wulfgar
│ │ Workers: khelben (ROCm), elminster (CUDA), │ │
│ │ drizzt (RDNA2), danilo (Intel) │ │
└──────────────────────┬───────────────────────────────────────────┘
│ Ray GCS (port 6379)
└─────────────────────────┼────────────────────────────────────────────────┘
│ Home network (LAN)
┌─────────────────────────┼────────────────────────────────────────────────┐
waterdeep (Mac Mini M4 Pro)
┌──────────────────────▼───────────────────────────────────────────┐
│ │ External Ray Worker (ray start --address=...)
│ │
│ │ • 12-core CPU (8P + 4E) + 16-core Neural Engine │ │
│ │ • 48 GB unified memory (shared CPU/GPU)
│ │ • MPS (Metal) GPU backend via PyTorch
│ │ • Custom resource: gpu_apple_mps: 1
│ │
│ Workloads:
├── Inference: secondary LLM (7B30B), overflow serving
│ └── Training: LoRA/QLoRA fine-tuning via Ray Train
└──────────────────────────────────────────────────────────────────┘
Model cache: ~/Library/Caches/huggingface + NFS mount (gravenhollow) │
└──────────────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────────────────────────────────────┐
waterdeep (Mac Mini M4 Pro · 48 GB unified · Metal GPU)
│ │
│ ┌──────────────────────────────────────────────────────
│ │ VS Code + GitHub Copilot (agent mode) │
│ │
│ │ BlenderMCP Server (uvx blender-mcp)
│ │ DISABLE_TELEMETRY=true
│ │
│ │ TCP localhost:9876 (zero latency) │
└─────────┬────────────────────────────────────────────┘
│ │ │
┌─────────▼────────────────────────────────────────────┐ │
│ │ Blender 4.x (native macOS) │
│ │ │ │
│ Renderer: Metal (Eevee real-time + Cycles GPU)
│ Add-ons:
│ • BlenderMCP (addon.py) — socket server :9876 │
│ │ • VRM Add-on for Blender — import/export VRM │
│ │
│ │ Working files: ~/blender-avatars/
│ │ ├── projects/ (.blend source files)
│ │ ├── exports/ (.vrm exported models)
│ │ └── textures/ (shared texture library) │
└──────────────────────────────────────────────────────┘
rclone (S3 asset promotion)
gravenhollow RustFS :30292
└──────────────────────────┼──────────────────────────────────────────────┘
─────────────────────────────────────────────────────────────────────────
│ gravenhollow.lab.daviestechlabs.io │
│ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │
│ │
│ NFS: /mnt/gravenhollow/kubernetes/avatar-models/ │
│ ├── Seed-san.vrm (default model) │
│ ├── Companion-A.vrm (promoted from waterdeep) │
│ └── animations/ (shared animation clips) │
│ │
│ S3 (RustFS): avatar-models bucket │
│ (same data, served via Cloudflare Tunnel for remote users) │
└──────────────────────────┬──────────────────────────────────────────────┘
┌────────────┴───────────────┐
│ │
NFS (nfs-fast PVC) Cloudflare Tunnel
│ (assets.daviestechlabs.io)
▼ │
┌──────────────────────────┐ ▼
│ companions-frontend │ ┌──────────────────────────┐
│ (Kubernetes pod) │ │ Remote users (CDN-cached │
│ LAN users │ │ via Cloudflare edge) │
└──────────────────────────┘ └──────────────────────────┘
```
## Updated GPU Fleet
| Node | GPU | Backend | Memory | Custom Resource | Workload |
|------|-----|---------|--------|-----------------|----------|
| khelben | Strix Halo | ROCm | 128 GB unified | `gpu_strixhalo: 1` | vLLM 70B |
| elminster | RTX 2070 | CUDA | 8 GB VRAM | `gpu_nvidia: 1` | Whisper + TTS |
| drizzt | Radeon 680M | ROCm | 12 GB VRAM | `gpu_rdna2: 1` | Embeddings |
| danilo | Intel Arc | i915/IPEX | ~6 GB shared | `gpu_intel: 1` | Reranker |
| **waterdeep** | **M4 Pro** | **MPS (Metal)** | **48 GB unified** | **`gpu_apple_mps: 1`** | **LLM (7B30B) + Training** |
## Implementation Plan
### 1. Network Prerequisites
waterdeep must be able to reach the Ray head node's GCS port:
### 1. Install Blender and Add-ons
```bash
# From waterdeep, verify connectivity
nc -zv <ray-head-ip> 6379
# Install Blender via Homebrew
brew install --cask blender
# Download BlenderMCP add-on
curl -LO https://raw.githubusercontent.com/ahujasid/blender-mcp/main/addon.py
# Install in Blender:
# Edit > Preferences > Add-ons > Install... > select addon.py
# Enable "Interface: Blender MCP"
# Install VRM Add-on for Blender:
# Download from https://vrm-addon-for-blender.info/en/
# Edit > Preferences > Add-ons > Install... > select VRM add-on zip
# Enable "Import-Export: VRM"
```
The Ray head service (`ai-inference-raycluster-head-svc`) is ClusterIP-only. Options to expose it:
### 2. VS Code MCP Configuration
| Approach | Complexity | Recommended |
|----------|-----------|-------------|
| NodePort service on port 6379 | Low | For initial setup |
| Envoy Gateway TCPRoute | Medium | For production use |
| Tailscale/WireGuard mesh | Medium | If already in use |
```json
// .vscode/mcp.json (in companions-frontend or global settings)
{
"servers": {
"blender": {
"command": "uvx",
"args": ["blender-mcp"],
"env": {
"BLENDER_HOST": "localhost",
"BLENDER_PORT": "9876",
"DISABLE_TELEMETRY": "true"
}
}
}
}
```
### 2. Python Environment on waterdeep
### 3. Python Environment for BlenderMCP
```bash
# Install uv (per ADR-0012)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create Ray worker environment
uv venv ~/ray-worker --python 3.12
source ~/ray-worker/bin/activate
# Install Ray with ML dependencies
uv pip install "ray[default]==2.53.0" torch torchvision torchaudio \
transformers accelerate peft bitsandbytes \
ray-serve-apps # internal package from Gitea PyPI
# Verify MPS availability
python -c "import torch; print(torch.backends.mps.is_available())"
# uvx handles the BlenderMCP server environment automatically
# Verify it works:
uvx blender-mcp --help
```
### 3. Start Ray Worker
### 4. rclone for Asset Promotion
Use rclone to promote finished VRM exports to gravenhollow's RustFS S3 endpoint. This is consistent with the promotion pattern from [ADR-0063](0063-comfyui-3d-avatar-pipeline.md) and avoids macOS NFS/autofs complexity.
```bash
# Join the cluster with custom resources
ray start \
--address="<ray-head-ip>:6379" \
--num-cpus=12 \
--num-gpus=1 \
--resources='{"gpu_apple_mps": 1}' \
--block
# Install rclone
brew install rclone
# Configure gravenhollow RustFS endpoint
rclone config create gravenhollow s3 \
provider=Other \
endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \
access_key_id=<key> \
secret_access_key=<secret>
# Promote a finished VRM
rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/
# Sync all exports (idempotent)
rclone sync ~/blender-avatars/exports/ gravenhollow:avatar-models/ --exclude "*.blend"
```
### 4. launchd Service (Persistent)
> **Why rclone over NFS?** macOS autofs/NFS mounts are fragile across reboots and network changes. rclone is a single binary, works over HTTPS, and matches the promotion pattern already used in Kasm workflows. The explicit `rclone copy` command also serves as a deliberate promotion gate — only intentionally promoted models reach production.
```xml
<!-- ~/Library/LaunchAgents/io.ray.worker.plist -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>io.ray.worker</string>
<key>ProgramArguments</key>
<array>
<string>/Users/billy/ray-worker/bin/ray</string>
<string>start</string>
<string>--address=RAY_HEAD_IP:6379</string>
<string>--num-cpus=12</string>
<string>--num-gpus=1</string>
<string>--resources={"gpu_apple_mps": 1}</string>
<string>--block</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/ray-worker.log</string>
<key>StandardErrorPath</key>
<string>/tmp/ray-worker-error.log</string>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>/Users/billy/ray-worker/bin:/usr/local/bin:/usr/bin:/bin</string>
</dict>
</dict>
</plist>
```
### 5. Avatar Creation Workflow (waterdeep)
```bash
launchctl load ~/Library/LaunchAgents/io.ray.worker.plist
```
1. **Open Blender** on waterdeep (native Metal-accelerated)
2. **Enable BlenderMCP** → 3D View sidebar → "BlenderMCP" tab → click "Connect"
3. **Open VS Code** with Copilot agent mode — BlenderMCP server starts automatically
4. **Create avatars** using AI-assisted prompts:
- _"Create an anime-style character with silver hair and a mage outfit"_
- _"Apply metallic blue material to the staff"_
- _"Rig this character for VRM export with standard humanoid bones"_
- _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_
5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps
6. **Promote** the finished VRM to gravenhollow via rclone:
```bash
rclone copy ~/blender-avatars/exports/Silver-Mage-v1.vrm gravenhollow:avatar-models/
```
7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit
### 5. Model Cache via NFS
### 6. Workflow Comparison: waterdeep vs Kasm
Mount the gravenhollow NFS share on waterdeep so models are shared with the cluster via the fast all-SSD NAS:
```bash
# Mount gravenhollow NFS share (all-SSD, dual 10GbE)
sudo mount -t nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models \
/Volumes/model-cache
# Or add to /etc/fstab for persistence
# gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models /Volumes/model-cache nfs rw 0 0
# Symlink to HuggingFace cache location
ln -s /Volumes/model-cache ~/.cache/huggingface/hub
```
### 6. Ray Serve Deployment Targeting
To schedule a deployment specifically on waterdeep, use the `gpu_apple_mps` custom resource in the RayService config:
```yaml
# In rayservice.yaml serveConfigV2
- name: llm-secondary
route_prefix: /llm-secondary
import_path: ray_serve.serve_llm:app
runtime_env:
env_vars:
MODEL_ID: "Qwen/Qwen2.5-32B-Instruct-AWQ"
DEVICE: "mps"
MAX_MODEL_LEN: "4096"
deployments:
- name: LLMDeployment
num_replicas: 1
ray_actor_options:
num_gpus: 0.95
resources:
gpu_apple_mps: 1
```
### 7. Training Integration
Ray Train jobs from [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) will automatically discover waterdeep as an available worker. To prefer it for GPU-accelerated training:
```python
# In cpu_training_pipeline.py — updated to prefer MPS when available
trainer = TorchTrainer(
train_func,
scaling_config=ScalingConfig(
num_workers=1,
use_gpu=True,
resources_per_worker={"gpu_apple_mps": 1},
),
)
```
## Monitoring
Since waterdeep is not a Kubernetes node, standard Prometheus scraping won't reach it. Options:
| Approach | Notes |
|----------|-------|
| Prometheus push gateway | Ray worker pushes metrics periodically |
| Node-exporter on macOS | Homebrew `node_exporter`, scraped by Prometheus via static target |
| Ray Dashboard | Already shows all connected workers (ray-serve.lab.daviestechlabs.io) |
The Ray Dashboard at `ray-serve.lab.daviestechlabs.io` will automatically show waterdeep as a connected node with its resources, tasks, and memory usage — no additional configuration needed.
## Power Management
To prevent macOS from sleeping and disconnecting the Ray worker:
```bash
# Disable sleep when on power adapter
sudo pmset -c sleep 0 displaysleep 0 disksleep 0
# Or use caffeinate for the Ray process
caffeinate -s ray start --address=... --block
```
| Aspect | waterdeep (local) | Kasm (browser) |
|--------|-------------------|----------------|
| **GPU rendering** | Metal 16-core GPU — Eevee real-time, Cycles GPU | CPU-only software rendering |
| **Viewport FPS** | 60fps (Metal) | 515fps (CPU rasterisation) |
| **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container |
| **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation |
| **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound |
| **Asset promotion** | rclone to gravenhollow RustFS S3 | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
| **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere |
| **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image |
| **Use when** | Primary creation workflow | Remote access, quick edits, mobile |
## Security Considerations
* Ray's GCS port (6379) will be exposed outside the cluster — restrict with firewall rules to waterdeep's IP only
* The Ray worker has no RBAC — it executes whatever tasks the head assigns
* Model weights on NFS are read-only from waterdeep (mount with `ro` option if possible)
* NFS traffic to gravenhollow traverses the LAN — ensure dual 10GbE links are active
* Consider Tailscale or WireGuard for encrypted transport if the Ray GCS traffic crosses untrusted network segments
* BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender — review AI-generated code before execution, especially file I/O operations
* Telemetry disabled via `DISABLE_TELEMETRY=true` in MCP server config
* BlenderMCP socket (port 9876) bound to localhost — not exposed to the network
* NFS traffic to gravenhollow traverses the LAN — no sensitive data in VRM files
* waterdeep has no cluster access — compromise doesn't impact Kubernetes workloads
* `.blend` source files stay local on waterdeep; only finished VRM exports are promoted to gravenhollow
## Future Considerations
* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, waterdeep can shift to secondary inference while DGX Spark handles training
* **vLLM MPS maturity**: As vLLM's MPS backend matures, waterdeep could serve larger models more efficiently
* **MLX backend**: Apple's MLX framework may provide better performance than PyTorch MPS for some workloads — worth evaluating as an alternative serving backend
* **Second Mac Mini**: If another Apple Silicon node is added, the external-worker pattern scales trivially
* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation
* **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature
* **Automated promotion**: A file watcher (fswatch/launchd) could auto-run `rclone sync` when a new VRM appears in `~/blender-avatars/exports/`
* **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow
## Links
* [Ray Clusters — Adding External Workers](https://docs.ray.io/en/latest/cluster/vms/getting-started.html)
* [PyTorch MPS Backend](https://pytorch.org/docs/stable/notes/mps.html)
* [vLLM Apple Silicon Support](https://docs.vllm.ai/en/latest/)
* Related: [ADR-0005](0005-multi-gpu-strategy.md) — Multi-GPU strategy
* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
* Related: [ADR-0024](0024-ray-repository-structure.md) — Ray repository structure
* Related: [ADR-0035](0035-arm64-worker-strategy.md) — ARM64 worker strategy
* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions
* Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy
* Related: [ADR-0063](0063-comfyui-3d-avatar-pipeline.md) — ComfyUI image-to-3D avatar pipeline (supersedes ADR-0062)
* Related: [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) — BlenderMCP 3D avatar workflow (superseded)
* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend architecture (Three.js + VRM avatars)
* Related: [ADR-0026](0026-storage-strategy.md) — Storage strategy (gravenhollow NFS-fast)
* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions (waterdeep)
* Related: [ADR-0012](0012-use-uv-for-python-development.md) — uv for Python development
* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
* [Blender Metal GPU Rendering](https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html)
* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm)

View File

@@ -0,0 +1,448 @@
# BlenderMCP for 3D Avatar Creation via Kasm Workstation
* Status: superseded by [ADR-0063](0063-comfyui-3d-avatar-pipeline.md)
* Date: 2026-02-21
* Deciders: Billy
* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare-cached RustFS
## Context and Problem Statement
The companions-frontend serves VRM avatar models for its Three.js-based 3D character rendering (see [ADR-0046](0046-companions-frontend-architecture.md)). Today the avatar library is limited to three models (`Seed-san.vrm`, `Aka.vrm`, `Midori.vrm`) — only one of which actually ships in the repo — and every model must be sourced or hand-sculpted externally.
Creating custom VRM avatars is a manual, time-intensive process: open Blender, sculpt/rig a character, export to VRM, iterate. There is no integration between the AI coding workflow (VS Code / Copilot) and Blender, so context switching between the editor and the 3D tool is constant.
How do we streamline custom 3D avatar creation for companions-frontend with AI assistance, while keeping assets durable and accessible across workstations?
## Decision Drivers
* The existing avatar pipeline is manual and disconnected from the development workflow
* BlenderMCP (v1.5.5, 17k+ GitHub stars) bridges AI assistants to Blender via the Model Context Protocol — enabling prompt-driven 3D modelling, material control, scene manipulation, and code execution inside Blender
* Kasm Workspaces already run in the cluster (`productivity` namespace) and support Docker-in-Docker with volume plugins for persistent storage
* VS Code supports MCP servers natively (GitHub Copilot agent mode), meaning the same editor used for code can drive Blender scene creation
* Custom volume mounts in Kasm map `/s3` to S3-compatible storage via the rclone Docker volume plugin — providing durable, off-node persistence
* Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend
* VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL
* Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS
* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-cached path for multi-MB VRM downloads
* Models are write-once/read-many — ideal for aggressive caching
* gravenhollow already runs RustFS (S3-compatible) — exposing it via Cloudflare Tunnel gives CDN caching without a separate storage tier
## Considered Options
1. **BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3 (`kasm` bucket)**
2. **Local Blender + BlenderMCP on a developer laptop**
3. **Hyper3D / Rodin cloud generation only (no Blender)**
4. **Manual Blender workflow (status quo)**
## Decision Outcome
Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3**, because it integrates AI-assisted modelling directly into the existing Kasm + VS Code workflow, stores assets durably in S3, and requires no additional infrastructure beyond what is already deployed.
### Positive Consequences
* AI-assisted 3D modelling — prompt-driven creation, material application, and scene manipulation inside Blender via MCP
* Zero context switching — VS Code agent mode drives Blender commands through the same editor used for code
* Persistent storage — VRM exports written to `/s3` survive session teardown and are available from any Kasm session or CI pipeline
* Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed
* No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist
* LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms
* Remote performance — RustFS exposed through Cloudflare Tunnel with CDN caching at 300+ global PoPs; no separate storage tier needed
* Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library
* VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend
* Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked
### Negative Consequences
* BlenderMCP `execute_blender_code` tool runs arbitrary Python in Blender — must trust AI-generated code or review before execution
* Socket-based communication (TCP 9876) between the MCP server and Blender add-on adds a failure mode
* VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed
* Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed
* Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy
* Cache misses from remote users hit gravenhollow via the tunnel — negligible with immutable files and long TTLs
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Developer Workstation │
│ │
│ ┌──────────────────────────────────┐ │
│ │ VS Code (local) │ │
│ │ │ │
│ │ GitHub Copilot (agent mode) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ BlenderMCP Server (MCP) │ │
│ │ (uvx blender-mcp) │ │
│ │ │ │ │
│ └─────────┼────────────────────────┘ │
│ │ TCP :9876 (JSON over socket) │
└────────────┼────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Kasm Blender Workstation (browser session) │
│ kasm.daviestechlabs.io │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Blender 4.x │ │
│ │ │ │
│ │ Add-ons: │ │
│ │ • BlenderMCP (addon.py) — socket server :9876 │ │
│ │ • VRM Add-on for Blender — import/export VRM │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────┐ │ │
│ │ │ /s3/blender-avatars/ │ │ │
│ │ │ ├── projects/ (.blend source files) │ │ │
│ │ │ ├── exports/ (.vrm exported models) │ │ │
│ │ │ └── textures/ (shared texture lib) │ │ │
│ │ └────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ rclone volume │
│ plugin (S3) │
└──────────────────────────┼──────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Quobyte S3 Endpoint │
│ Bucket: kasm │
│ │
│ kasm/blender-avatars/projects/Companion-A.blend │
│ kasm/blender-avatars/exports/Companion-A.vrm │
│ kasm/blender-avatars/textures/skin-tone-01.png │
└──────────────────────────┬──────────────────────────────────────────────┘
rclone sync (promotion)
┌─────────────────────────────────────────────────────────────────────────┐
│ gravenhollow.lab.daviestechlabs.io │
│ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │
│ │
│ NFS: /mnt/gravenhollow/kubernetes/avatar-models/ │
│ ├── Seed-san.vrm (default model) │
│ ├── Aka.vrm (Legend tier) │
│ ├── Midori.vrm (Legend tier) │
│ ├── Companion-A.vrm (custom, promoted from Kasm S3) │
│ └── animations/ (shared animation clips) │
│ │
│ S3 (RustFS): avatar-models bucket │
│ (same data as NFS dir, served via S3 API for Cloudflare Tunnel) │
└──────────┬─────────────────────────────────┬────────────────────────────┘
│ │
NFS mount (nfs-fast) S3 API (RustFS :30292)
for pod volume via Cloudflare Tunnel
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────────────────┐
│ companions-frontend │ │ Cloudflare Tunnel + CDN │
│ (Kubernetes pod) │ │ │
│ │ │ assets.daviestechlabs.io │
│ /models/ volume mount │ │ → envoy-external │
│ (nfs-fast PVC, RO) │ │ → avatar-assets-svc (in-cluster) │
│ │ │ → gravenhollow RustFS :30292 │
│ Go FileServer: │ │ │
│ /assets/models/ → │ │ Cloudflare CDN caches at 300+ PoPs │
│ serves from PVC │ │ Cache-Control: public, max-age=31536000 │
│ │ │ (immutable, versioned filenames) │
└──────────┬───────────────┘ └──────────────────────┬───────────────────┘
│ │
LAN clients Remote clients
companions-chat.lab... companions-chat via
(envoy-internal, direct) Cloudflare Tunnel
│ │
└──────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ Browser (Three.js) │
│ AvatarManager.loadModel('/assets/models/Companion-A.vrm') │
│ │
│ LAN: fetch from companions-frontend pod (NFS-backed, ~10GbE) │
│ Remote: fetch from assets.daviestechlabs.io (Cloudflare CDN-cached) │
└─────────────────────────────────────────────────────────────────────────┘
```
## Workflow
### 1. Kasm Workspace Setup
The Kasm Blender workspace image is configured with:
| Component | Version | Purpose |
|-----------|---------|---------|
| Blender | 4.x | 3D modelling and sculpting |
| BlenderMCP add-on (`addon.py`) | 1.5.5 | Socket server for MCP commands |
| VRM Add-on for Blender | latest | Import/export VRM format |
| Python | 3.10+ | Blender scripting runtime |
The Kasm storage mapping mounts `/s3` via the rclone Docker volume plugin to the Quobyte S3 endpoint (`kasm` bucket). The sub-path `blender-avatars/` is used for all 3D asset work.
### 2. VS Code MCP Configuration
Add BlenderMCP as an MCP server in VS Code (`.vscode/mcp.json` or user settings):
```json
{
"servers": {
"blender": {
"command": "uvx",
"args": ["blender-mcp"],
"env": {
"BLENDER_HOST": "localhost",
"BLENDER_PORT": "9876",
"DISABLE_TELEMETRY": "true"
}
}
}
}
```
When the Kasm session is accessed remotely, set `BLENDER_HOST` to the Kasm workstation's reachable address.
### 3. Avatar Creation Workflow
1. **Launch** the Kasm Blender workspace via `kasm.daviestechlabs.io`
2. **Enable** the BlenderMCP add-on in Blender → 3D View sidebar → "BlenderMCP" tab → "Connect to Claude"
3. **Open VS Code** with Copilot agent mode and the BlenderMCP MCP server running
4. **Prompt** the AI to create or modify avatars:
- _"Create a humanoid character with anime-style proportions, blue hair, and a fantasy outfit"_
- _"Apply a metallic gold material to the armor pieces"_
- _"Set up the lighting for a character showcase render"_
- _"Rig this character for VRM export with standard humanoid bones"_
5. **Export** the finished model to VRM via the VRM add-on (or via BlenderMCP `execute_blender_code` calling the VRM export operator)
6. **Save** the `.vrm` to `/s3/blender-avatars/exports/` and the `.blend` source to `/s3/blender-avatars/projects/`
7. **Import** the VRM into companions-frontend — copy to `assets/models/`, update the allowlists in `internal/database/database.go` and `static/js/avatar.js`
### 4. Asset Pipeline (Kasm S3 → gravenhollow → production)
| Stage | Action |
|-------|--------|
| **Create** | AI-assisted modelling + VRM export in Kasm Blender → `/s3/blender-avatars/exports/*.vrm` |
| **Store** | rclone syncs `/s3` to Quobyte S3 `kasm` bucket automatically |
| **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) |
| **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo |
| **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed |
| **CDN** | Model immediately available via `assets.daviestechlabs.io` — Cloudflare Tunnel proxies to RustFS, CDN caches at edge |
### 5. Deployment and Storage Architecture
#### Local Serving (LAN users)
Companions-frontend currently serves VRM models via `http.FileServer(http.Dir("assets"))` from the container filesystem. This bakes models into the image and requires a rebuild to add new avatars.
The new approach mounts avatar models from gravenhollow via an `nfs-fast` PVC:
```yaml
# PersistentVolumeClaim for avatar models
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: avatar-models
namespace: ai-ml
spec:
storageClassName: nfs-fast
accessModes: [ReadOnlyMany]
resources:
requests:
storage: 10Gi
```
The pod mounts this PVC at `/models` and the Go server serves it at `/assets/models/`:
```go
// Replace embedded assets with NFS-backed volume
mux.Handle("/assets/models/", http.StripPrefix("/assets/models/",
http.FileServer(http.Dir("/models"))))
```
Benefits:
- **No image rebuild** to add/update models — write to gravenhollow NFS, pod sees it immediately (with `actimeo=600` cache, within 10 minutes)
- **All-SSD + dual 10GbE** — VRM files (typically 530 MB) load in <100ms on LAN
- **ReadOnlyMany** — multiple replicas can share the same PVC
- Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow
#### Remote Serving (Cloudflare-cached RustFS)
Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Rather than duplicating assets to a separate storage tier (e.g., Cloudflare R2), gravenhollow's RustFS S3 endpoint is exposed directly through the Cloudflare Tunnel with a dedicated hostname. Cloudflare's CDN automatically caches responses at edge PoPs — since VRM files are immutable with year-long TTLs, virtually all requests are served from cache.
| | |
|---|---|
| **Origin** | gravenhollow RustFS `avatar-models` bucket (`:30292`, same data as NFS dir) |
| **Public hostname** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) |
| **Tunnel routing** | Cloudflare Tunnel → `envoy-external``avatar-assets-svc` → gravenhollow RustFS |
| **CDN caching** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` |
| **Egress** | Cloudflare-proxied traffic has no bandwidth surcharge |
| **Auth** | Public read (models are not sensitive); RustFS write credentials stay internal |
| **No sync needed** | Single source of truth — NFS and RustFS serve the same data from gravenhollow |
##### In-Cluster Proxy Service
An ExternalName or Endpoints service proxies cluster traffic to gravenhollow's RustFS endpoint so the HTTPRoute can reference it:
```yaml
# Service pointing to gravenhollow RustFS for avatar assets
apiVersion: v1
kind: Service
metadata:
name: avatar-assets
namespace: ai-ml
spec:
type: ExternalName
externalName: gravenhollow.lab.daviestechlabs.io
ports:
- port: 30292
protocol: TCP
```
##### HTTPRoute (Cloudflare Tunnel → RustFS)
```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: avatar-assets
namespace: ai-ml
annotations:
external-dns.alpha.kubernetes.io/hostname: assets.daviestechlabs.io
spec:
hostnames:
- assets.daviestechlabs.io
parentRefs:
- name: envoy-external
namespace: network
rules:
- matches:
- path:
type: PathPrefix
value: /avatar-models/
backendRefs:
- name: avatar-assets
port: 30292
filters:
- type: ResponseHeaderModifier
responseHeaderModifier:
set:
- name: Cache-Control
value: "public, max-age=31536000, immutable"
- name: Access-Control-Allow-Origin
value: "https://companions-chat.daviestechlabs.io"
```
Cloudflare Tunnel picks up `assets.daviestechlabs.io` via the existing wildcard ingress rule (`*.daviestechlabs.io → envoy-external`). The CDN caches based on the `Cache-Control` header — after the first request per PoP, all subsequent loads are served from Cloudflare's edge.
##### Client-Side Routing
The frontend detects whether the user is on LAN or remote and routes model fetches accordingly:
```javascript
// avatar.js — model URL resolution
function resolveModelURL(path) {
// LAN users: serve from the Go server (NFS-backed, same origin)
// Remote users: serve from Cloudflare-cached RustFS
const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io');
if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm
return `https://assets.daviestechlabs.io/avatar-models/${path.split('/').pop()}`;
// → https://assets.daviestechlabs.io/avatar-models/Companion-A.vrm
}
```
Alternatively, the Go server can set the model base URL via a template variable based on the `Host` header, keeping the logic server-side.
#### Versioning Strategy
VRM files are immutable once promoted — updated models get a new filename (e.g., `Companion-A-v2.vrm`) rather than overwriting. This ensures:
- Cloudflare CDN cache never serves stale content
- Rollback is trivial — point the allowlist back to the previous version
- Browser `Cache-Control: immutable` works correctly
#### Storage Tier Summary
| Location | Purpose | Tier | Access |
|----------|---------|------|--------|
| Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only |
| gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN |
| gravenhollow RustFS S3 (`avatar-models`) | Same data as NFS, exposed to Cloudflare Tunnel for remote users | S3 API via HTTPRoute | Cloudflare CDN-cached, global |
## BlenderMCP Capabilities Used
| MCP Tool | Avatar Workflow Use |
|----------|-------------------|
| `get_scene_info` | Inspect current scene before modifications |
| `create_object` | Scaffold base meshes for characters |
| `modify_object` | Adjust proportions, positions, bone placement |
| `set_material` | Apply skin, hair, clothing materials |
| `execute_blender_code` | Run VRM export scripts, batch operations, custom rigging |
| `get_screenshot` | AI reviews viewport to understand current state |
| `poly_haven_download` | Fetch HDRIs, textures for environment/materials |
| `hyper3d_generate` | Generate base 3D models from text prompts via Hyper3D Rodin |
## Security Considerations
* **Code execution:** BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender. The Kasm session is sandboxed (DinD container with no cluster access), limiting blast radius. Always save before executing AI-generated code.
* **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config.
* **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted.
* **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required.
* **RustFS exposure:** The `avatar-models` RustFS bucket is exposed read-only through Cloudflare Tunnel. RustFS write credentials remain internal. The HTTPRoute only routes GET requests to the bucket path — no write operations are reachable externally.
* **Public assets:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. CORS restricts to `companions-chat.daviestechlabs.io` origin.
* **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change.
## Pros and Cons of the Options
### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow (NFS + RustFS via Cloudflare)
* Good, because AI-assisted modelling reduces manual effort for avatar creation
* Good, because assets persist in S3 across sessions and are accessible from CI
* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, Cloudflare Tunnel are all already deployed
* Good, because VS Code MCP integration means one editor for code and 3D work
* Good, because Kasm sandboxes Blender execution away from the cluster
* Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models)
* Good, because RustFS through Cloudflare Tunnel provides CDN caching with zero additional storage tiers — no R2 bucket, no sync CronJob, no extra credentials
* Good, because single source of truth — gravenhollow serves both LAN (NFS) and remote (RustFS → Cloudflare CDN) from the same data
* Good, because immutable versioned filenames enable aggressive caching and trivial rollback
* Good, because models are available to remote users immediately after promotion (no sync delay)
* Bad, because BlenderMCP is a third-party tool with arbitrary code execution
* Bad, because socket communication adds latency for remote Kasm sessions
* Bad, because VRM rigging quality may require manual adjustment after AI scaffolding
* Bad, because cache misses hit gravenhollow via the tunnel (negligible with immutable files + long TTLs)
### Option 2 — Local Blender + BlenderMCP on developer laptop
* Good, because lowest latency (everything local)
* Good, because no Kasm dependency
* Bad, because assets are local — no durable S3 storage without manual sync
* Bad, because Blender + add-ons must be installed on every dev machine
* Bad, because not reproducible across machines
### Option 3 — Hyper3D / Rodin cloud generation only
* Good, because no Blender installation needed
* Good, because fully prompt-driven model generation
* Bad, because limited control over output — no fine-tuning materials, rigging, or proportions
* Bad, because Hyper3D free tier has daily generation limits
* Bad, because generated models require post-processing for VRM compliance (humanoid rig, expressions, visemes)
* Bad, because vendor dependency for a core asset pipeline
### Option 4 — Manual Blender workflow (status quo)
* Good, because full manual control
* Good, because no new tooling
* Bad, because slow — no AI assistance for repetitive modelling tasks
* Bad, because no integration with the development workflow
* Bad, because assets stored ad-hoc with no structured pipeline to companions-frontend
## Links
* Related to [ADR-0046](0046-companions-frontend-architecture.md) (companions-frontend architecture — Three.js + VRM avatars)
* Related to [ADR-0026](0026-storage-strategy.md) (storage strategy — gravenhollow NFS-fast, Quobyte S3, rclone)
* Related to [ADR-0044](0044-dns-and-external-access.md) (DNS and external access — Cloudflare Tunnel, split-horizon)
* Related to [ADR-0049](0049-self-hosted-productivity-suite.md) (Kasm Workspaces)
* Related to [ADR-0059](0059-mac-mini-ray-worker.md) (waterdeep as local AI agent — primary 3D creation workstation with Metal GPU)
* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
* [VRM Specification](https://vrm.dev/en/)
* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend)
* [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures)
* [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation)
* [Cloudflare Tunnel Docs](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/)
* [Cloudflare CDN Cache Rules](https://developers.cloudflare.com/cache/)

View File

@@ -0,0 +1,483 @@
# ComfyUI Image-to-3D Avatar Pipeline with TRELLIS + UniRig
* Status: proposed
* Date: 2026-02-24
* Deciders: Billy
* Technical Story: Replace the manual BlenderMCP 3D avatar creation workflow with an automated, GPU-accelerated image-to-rigged-3D-model pipeline using ComfyUI, TRELLIS 2-4B, and UniRig — running on a personal desktop (NVIDIA RTX 4070) as an on-demand Ray worker, with direct MLflow logging and rclone asset promotion
## Context and Problem Statement
The companions-frontend serves VRM avatar models for Three.js-based 3D character rendering ([ADR-0046](0046-companions-frontend-architecture.md)). The previous approach ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md)) proposed using BlenderMCP in a Kasm workstation or on waterdeep ([ADR-0059](0059-mac-mini-ray-worker.md)) for AI-assisted avatar creation. While BlenderMCP bridges VS Code to Blender, the workflow is fundamentally **interactive and manual** — an operator must prompt the AI, review each sculpting step, and hand-tune rigging and VRM export. This is slow, non-reproducible, and doesn't scale.
Meanwhile, the state of the art in image-to-3D generation has matured significantly:
- **TRELLIS** (Microsoft, CVPR'25 Spotlight, 12k+ GitHub stars) generates high-quality textured 3D meshes from a single image in seconds using Structured 3D Latents (SLAT) — with models up to 2B parameters
- **UniRig** (Tsinghua/Tripo, SIGGRAPH'25, 1.4k+ GitHub stars) automatically generates topologically valid skeletons and skinning weights for arbitrary 3D models using autoregressive transformers — the first model to rig humans, animals, and objects with a single unified framework
- **ComfyUI-3D-Pack** (3.7k+ GitHub stars) provides battle-tested ComfyUI nodes for TRELLIS, 3D Gaussian Splatting, mesh processing, and GLB/VRM export — enabling node-graph-based automation without custom code
Together, these tools enable a fully automated **image → 3D mesh → rigged model → VRM** pipeline that eliminates manual Blender work for the common case, produces reproducible results, and integrates with the existing MLflow + Ray stack.
A personal desktop (Ryzen 9 7950X, 64 GB DDR5, NVIDIA RTX 4070 12 GB VRAM) running Arch Linux is available as an **on-demand external Ray worker** — it won't be a permanent cluster member (it's not running Talos), but can join the Ray cluster via `ray start` when 3D generation workloads need to run. This adds a 5th GPU to the fleet specifically for 3D generation, without disrupting the stable inference allocations.
How do we build an automated, reproducible image-to-VRM pipeline that leverages the desktop's CUDA GPU and integrates with the existing AI/ML platform for experiment tracking and asset serving?
## Decision Drivers
* BlenderMCP workflow from ADR-0062 is interactive and non-reproducible — every avatar requires an operator in the loop
* TRELLIS generates production-quality textured meshes from a single reference image in ~30 seconds on a 12 GB GPU
* UniRig automatically rigs arbitrary 3D models with skeleton + skinning weights — no manual weight painting
* ComfyUI-3D-Pack bundles TRELLIS, mesh processing, and GLB export as composable nodes — enabling visual pipeline authoring
* The desktop's RTX 4070 (12 GB VRAM) meets TRELLIS's 16 GB minimum when using fp16/attention optimizations, and exceeds UniRig's 8 GB requirement
* The desktop can join/leave the Ray cluster on demand — no permanent infrastructure commitment
* MLflow tracks generation parameters, quality metrics, and output artifacts for reproducibility — the desktop logs directly to the cluster's MLflow service over HTTP
* waterdeep (Mac Mini M4 Pro) remains available for interactive Blender touch-up on models that need manual refinement
* VRM export, asset promotion to gravenhollow, and serving architecture from ADR-0062 remain valid and are reused
## Considered Options
1. **ComfyUI + TRELLIS + UniRig on desktop Ray worker, with direct MLflow logging and rclone promotion**
2. **BlenderMCP interactive workflow** (ADR-0062, superseded)
3. **Cloud-hosted 3D generation (Hyper3D Rodin, Meshy, etc.)**
4. **Run TRELLIS + UniRig directly as Ray Serve deployments in-cluster**
## Decision Outcome
Chosen option: **Option 1 — ComfyUI + TRELLIS + UniRig on desktop Ray worker**, because it automates the entire image-to-rigged-model pipeline without operator interaction, leverages purpose-built state-of-the-art models (TRELLIS for generation, UniRig for rigging), and uses the desktop's RTX 4070 as on-demand GPU capacity without disrupting the stable inference cluster. ComfyUI's visual node graph provides the pipeline orchestration directly on the desktop — no Kubernetes-side orchestrator needed since all compute is local to one machine.
waterdeep retains its role as an interactive Blender workstation for manual refinement of auto-generated models when needed — but the expectation is that most avatars pass through the automated pipeline without manual touch-up.
### Positive Consequences
* **Fully automated pipeline** — image → textured mesh → rigged model → VRM with no operator in the loop
* **Reproducible** — same image + seed produces identical output; parameters tracked in MLflow
* **Fast** — TRELLIS generates a mesh in ~30s, UniRig rigs it in ~60s; end-to-end under 5 minutes including VRM export
* **On-demand GPU** — desktop joins Ray cluster only when needed; no standing resource cost
* **Composable** — ComfyUI node graph can be extended with additional 3D processing nodes (Hunyuan3D, TripoSG, Stable3DGen) without code changes
* **Quality** — TRELLIS (CVPR'25) and UniRig (SIGGRAPH'25) represent current state of the art
* **MLflow integration** — generation parameters, mesh quality metrics, and output artifacts are logged directly to the cluster's MLflow service over HTTP
* **Simple orchestration** — ComfyUI node graph handles the pipeline; no Kubernetes-side orchestrator needed for a single-GPU linear workflow
* **Reuses existing serving architecture** — gravenhollow NFS + RustFS CDN serving from ADR-0062 is unchanged
* **waterdeep fallback** — interactive Blender + BlenderMCP on waterdeep for models needing hand-tuning
### Negative Consequences
* Desktop must be powered on and `ray start` must be run manually to participate in the pipeline
* TRELLIS requires NVIDIA CUDA — cannot run on the existing AMD/Intel GPU fleet (khelben, drizzt, danilo)
* ComfyUI adds a Python dependency stack (PyTorch, CUDA, spconv, flash-attn) to maintain on the desktop
* RTX 4070 has 12 GB VRAM — large TRELLIS models (2B params) may require fp16 + attention optimization; the 1.2B image-to-3D model fits comfortably
* Auto-generated VRM models may still need manual expression/viseme morph targets for full companions-frontend lip-sync support
* Desktop is not managed by GitOps/Kubernetes — Ansible or manual setup
## Pros and Cons of the Options
### Option 1 — ComfyUI + TRELLIS + UniRig on desktop Ray worker
* Good, because fully automated image-to-VRM pipeline eliminates manual sculpting
* Good, because TRELLIS (CVPR'25) and UniRig (SIGGRAPH'25) are state-of-the-art, MIT-licensed
* Good, because ComfyUI-3D-Pack provides tested node implementations — no custom TRELLIS integration code
* Good, because desktop GPU is free/idle capacity with no cluster impact
* Good, because MLflow integration reuses existing experiment tracking infrastructure
* Good, because ComfyUI can queue and batch-generate multiple avatars unattended
* Bad, because desktop availability is not guaranteed (must be manually started)
* Bad, because CUDA-only — doesn't leverage the existing ROCm/Intel fleet
* Bad, because auto-rigging quality varies by model topology — some models may need manual refinement
### Option 2 — BlenderMCP interactive workflow (ADR-0062)
* Good, because maximum creative control via VS Code + Copilot
* Good, because Kasm provides browser-based access from anywhere
* Bad, because every avatar requires an operator in the loop — slow and non-reproducible
* Bad, because Blender sculpting from scratch is time-intensive even with AI assistance
* Bad, because Kasm runs Blender CPU-only (no GPU acceleration inside DinD)
* Bad, because no MLflow tracking or reproducibility
### Option 3 — Cloud-hosted 3D generation
* Good, because no local GPU required
* Good, because some services (Meshy, Hyper3D Rodin) offer API access
* Bad, because vendor dependency for a core asset pipeline
* Bad, because free tiers have daily limits; paid tiers add recurring cost
* Bad, because limited control over output quality, rigging, and VRM compliance
* Bad, because data leaves the homelab network
### Option 4 — TRELLIS + UniRig as in-cluster Ray Serve deployments
* Good, because fully integrated with existing Ray cluster
* Good, because no desktop dependency
* Bad, because TRELLIS requires NVIDIA CUDA — no CUDA GPUs in-cluster have enough VRAM (elminster has 8 GB, needs 1216 GB)
* Bad, because would require purchasing new in-cluster NVIDIA hardware
* Bad, because 3D generation is batch/occasional, not real-time serving — Ray Serve's always-on model is wasteful
* Bad, because TRELLIS's CUDA dependencies (spconv, flash-attn, nvdiffrast, kaolin) conflict with existing Ray worker images
## Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Kubeflow Pipelines (namespace: kubeflow) │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ 3d_avatar_generation_pipeline │ │
│ │ │ │
│ │ 1. prepare_reference Load/generate reference image from prompt │ │
│ │ │ (optional: use vLLM + Stable Diffusion) │ │
│ │ ▼ │ │
│ │ 2. generate_3d_mesh Submit RayJob → desktop ComfyUI worker │ │
│ │ │ TRELLIS image-large (1.2B) → GLB mesh │ │
│ │ ▼ │ │
│ │ 3. auto_rig Submit RayJob → desktop UniRig worker │ │
│ │ │ UniRig skeleton + skinning → rigged FBX/GLB │ │
│ │ ▼ │ │
│ │ 4. convert_to_vrm Blender CLI (headless) on desktop or cluster │ │
│ │ │ Import rigged GLB → configure VRM metadata │ │
│ │ ▼ → export .vrm │ │
│ │ 5. validate_vrm Check humanoid rig, expressions, visemes │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 6. promote_to_storage rclone copy → gravenhollow RustFS S3 │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ 7. log_to_mlflow Parameters, metrics, artifacts → MLflow │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────┬──────────────────────────────────────┘
RayJob CR (ephemeral)
┌─────────────────────────────────────────────────────────────────────────────┐
│ desktop (Arch Linux · Ryzen 9 7950X · 64 GB DDR5 · RTX 4070 12 GB) │
│ On-demand Ray worker (ray start --address=<ray-head>:6379) │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ ComfyUI + Custom Nodes │ │
│ │ │ │
│ │ ComfyUI-3D-Pack: │ │
│ │ • TRELLIS image-large (1.2B) — image → textured GLB mesh │ │
│ │ • Mesh processing nodes — simplify, UV unwrap, texture bake │ │
│ │ • 3D preview — viewport render for quality check │ │
│ │ • GLB/OBJ/PLY export │ │
│ │ │ │
│ │ UniRig: │ │
│ │ • Skeleton prediction — autoregressive bone hierarchy │ │
│ │ • Skinning weights — bone-point cross-attention │ │
│ │ • Merge — skeleton + skin + original mesh → rigged model │ │
│ │ • Supports GLB, FBX, OBJ input/output │ │
│ │ │ │
│ │ Blender 4.x (headless CLI): │ │
│ │ • VRM Add-on for Blender — GLB → VRM conversion │ │
│ │ • Humanoid rig mapping, expression morphs, viseme config │ │
│ │ • Batch export via bpy scripting │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ GPU: NVIDIA RTX 4070 12 GB (CUDA 12.x) │
│ Ray: worker node with resource label {"nvidia_gpu": 1, "rtx4070": 1} │
│ Storage: ~/comfyui-3d/ (working dir), rclone → gravenhollow S3 │
└──────────────────────────────────┬──────────────────────────────────────────┘
rclone (S3)
┌─────────────────────────────────────────────────────────────────────────────┐
│ gravenhollow.lab.daviestechlabs.io │
│ (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB) │
│ │
│ NFS: /mnt/gravenhollow/kubernetes/avatar-models/ │
│ ├── Seed-san.vrm (default model) │
│ ├── Generated-A-v1.vrm (auto-generated via pipeline) │
│ └── animations/ (shared animation clips) │
│ │
│ S3 (RustFS): avatar-models bucket │
│ (same data, served via Cloudflare Tunnel for remote users) │
└──────────────────────────┬──────────────────────────────────────────────────┘
┌────────────┴───────────────┐
│ │
NFS (nfs-fast PVC) Cloudflare Tunnel
│ (assets.daviestechlabs.io)
▼ │
┌──────────────────────────┐ ▼
│ companions-frontend │ ┌──────────────────────────┐
│ (Kubernetes pod) │ │ Remote users (CDN-cached │
│ LAN users │ │ via Cloudflare edge) │
└──────────────────────────┘ └──────────────────────────┘
```
### Ray Cluster Integration
The desktop joins the existing KubeRay-managed cluster as an external worker. It is **not** a Talos node and not managed by Kubernetes — it connects to the Ray head node's GCS port directly:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Ray Cluster (KubeRay RayService) │
│ │
│ Head: Ray head pod (in-cluster) │
│ GCS port: 6379 (exposed via NodePort or LoadBalancer) │
│ │
│ In-Cluster Workers (permanent, managed by KubeRay): │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ khelben │ │elminster │ │ drizzt │ │ danilo │ │
│ │Strix Halo│ │RTX 2070 │ │Radeon 680│ │Intel Arc │ │
│ │ ROCm │ │ CUDA │ │ ROCm │ │ Intel │ │
│ │ /llm │ │/whisper │ │/embeddings│ │/reranker │ │
│ │ │ │ /tts │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ External Worker (on-demand, self-managed): │
│ ┌──────────────────────────────────────────────────┐ │
│ │ desktop (Arch Linux, external) │ │
│ │ RTX 4070 12 GB · CUDA │ │
│ │ ComfyUI + TRELLIS + UniRig + Blender CLI │ │
│ │ Resource labels: {"nvidia_gpu": 1, "3d_gen": 1} │ │
│ │ Joins via: ray start --address=<head>:6379 │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
The existing inference deployments (`/llm`, `/whisper`, `/tts`, `/embeddings`, `/reranker`) are unaffected — they are pinned to their respective in-cluster GPU nodes via Ray resource labels. The desktop's `3d_gen` resource label ensures only 3D generation RayJobs get scheduled there.
### Ray Service Multiplexing
The desktop's RTX 4070 can **time-share between inference overflow and 3D generation** when idle. When no 3D generation jobs are queued, the desktop can optionally serve as overflow capacity for inference workloads:
| Mode | When | What runs on desktop |
|------|------|---------------------|
| **3D generation** | ComfyUI workflow triggered (manually or via API) | ComfyUI + TRELLIS → UniRig → Blender VRM export |
| **Inference overflow** | Manually enabled, high-traffic periods | vLLM (secondary), Whisper, or TTS replica |
| **Idle** | Desktop powered on, no jobs | Ray worker connected but idle (0 resource cost) |
Mode switching is managed by Ray's resource scheduling — 3D jobs request `{"3d_gen": 1}` and inference jobs request their specific GPU labels. When the desktop is off, all workloads continue on the existing in-cluster fleet with no impact.
## Implementation Plan
### 1. Desktop Environment Setup
```bash
# Install NVIDIA drivers + CUDA toolkit (Arch Linux)
sudo pacman -S nvidia nvidia-utils cuda cudnn
# Install Python environment (uv per ADR-0012)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create project directory
mkdir -p ~/comfyui-3d && cd ~/comfyui-3d
# Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
uv venv --python 3.11
source .venv/bin/activate
uv pip install -r requirements.txt
# Install ComfyUI-3D-Pack (includes TRELLIS nodes)
cd custom_nodes
git clone https://github.com/MrForExample/ComfyUI-3D-Pack.git
cd ComfyUI-3D-Pack
uv pip install -r requirements.txt
python install.py
# Install UniRig
cd ~/comfyui-3d
git clone https://github.com/VAST-AI-Research/UniRig.git
cd UniRig
uv pip install torch torchvision
uv pip install -r requirements.txt
uv pip install spconv-cu124 # Match CUDA version
uv pip install flash-attn --no-build-isolation
# Install Blender (headless CLI for VRM export)
sudo pacman -S blender
# Install VRM Add-on
python -c "import bpy, os; bpy.ops.preferences.addon_install(filepath=os.path.abspath('UniRig/blender/add-on-vrm-v2.20.77_modified.zip'))"
# Install rclone for asset promotion
sudo pacman -S rclone
rclone config create gravenhollow s3 \
provider=Other \
endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \
access_key_id=<key> \
secret_access_key=<secret>
# Install Ray for cluster joining
uv pip install "ray[default]"
```
### 2. Ray Worker Configuration
```bash
# Join the Ray cluster on demand
# Ray head GCS port must be exposed (NodePort 30637 or similar)
ray start \
--address=<ray-head-external-ip>:6379 \
--num-cpus=16 \
--num-gpus=1 \
--resources='{"3d_gen": 1, "rtx4070": 1}' \
--node-name=desktop
# Verify connection
ray status # Should show desktop as a connected worker
```
The Ray head's GCS port needs to be reachable from the desktop. Options:
- **NodePort**: Expose port 6379 as a NodePort (e.g., 30637) on a cluster node
- **Tailscale/WireGuard**: If the desktop is on a different network segment
- **Direct LAN**: If desktop and cluster are on the same 192.168.100.0/24 subnet
### 3. ComfyUI Workflow (Node Graph)
The ComfyUI workflow JSON defines the image-to-GLB pipeline:
```
[Load Image] → [TRELLIS Image-to-3D] → [Mesh Simplify] → [Texture Bake]
[Save GLB]
[UniRig Skeleton Prediction]
[UniRig Skinning Weights]
[UniRig Merge (rigged model)]
[Blender VRM Export (CLI)]
[Save VRM → ~/comfyui-3d/exports/]
```
Key TRELLIS parameters exposed:
- `sparse_structure_sampler_params.steps`: 12 (default)
- `sparse_structure_sampler_params.cfg_strength`: 7.5
- `slat_sampler_params.steps`: 12
- `slat_sampler_params.cfg_strength`: 3.0
- `simplify`: 0.95 (triangle reduction ratio)
- `texture_size`: 1024
### 4. MLflow Experiment Tracking
The desktop logs directly to the cluster's MLflow service over HTTP. Set `MLFLOW_TRACKING_URI` in the ComfyUI environment or in a post-generation logging script:
```bash
export MLFLOW_TRACKING_URI=http://<mlflow-service>:5000
```
Each generation run logs to a dedicated MLflow experiment:
| What | MLflow Concept | Content |
|------|---------------|---------|
| Reference image | Artifact | `reference.png` |
| TRELLIS parameters | Params | seed, cfg_strength, steps, simplify, texture_size |
| UniRig parameters | Params | skeleton_seed |
| Raw mesh | Artifact | `{name}_raw.glb` (pre-rigging) |
| Rigged model | Artifact | `{name}_rigged.glb` (post-rigging) |
| Final VRM | Artifact | `{name}.vrm` |
| Mesh quality | Metrics | vertex_count, face_count, texture_resolution |
| Rig quality | Metrics | bone_count, skinning_weight_coverage |
| Pipeline duration | Metrics | trellis_time_s, unirig_time_s, total_time_s |
### 5. VRM Export Script (Blender CLI)
```python
#!/usr/bin/env python3
"""vrm_export.py — Headless Blender script for GLB→VRM conversion."""
import bpy
import sys
argv = sys.argv[sys.argv.index("--") + 1:]
input_glb = argv[0]
output_vrm = argv[1]
avatar_name = argv[2] if len(argv) > 2 else "Generated Avatar"
# Clear scene
bpy.ops.wm.read_factory_settings(use_empty=True)
# Import rigged GLB
bpy.ops.import_scene.gltf(filepath=input_glb)
# Select armature
armature = next(obj for obj in bpy.data.objects if obj.type == 'ARMATURE')
bpy.context.view_layer.objects.active = armature
# Configure VRM metadata
armature["vrm_addon_extension"] = {
"spec_version": "1.0",
"vrm0": {
"meta": {
"title": avatar_name,
"author": "DaviesTechLabs Pipeline",
"allowedUserName": "Everyone",
}
}
}
# Export VRM
bpy.ops.export_scene.vrm(filepath=output_vrm)
print(f"Exported VRM: {output_vrm}")
```
Invoked via:
```bash
blender --background --python vrm_export.py -- input.glb output.vrm "Avatar Name"
```
### 6. Asset Promotion (Reuses ADR-0062 Architecture)
The VRM serving architecture from ADR-0062 is preserved unchanged:
| Stage | Action |
|-------|--------|
| **Generate** | Automated pipeline: image → TRELLIS → UniRig → VRM |
| **Promote** | `rclone copy ~/comfyui-3d/exports/{name}.vrm gravenhollow:avatar-models/` |
| **Register** | Add model path to `AllowedAvatarModels` in companions-frontend Go + JS allowlists |
| **Deploy** | Flux rolls out config; model already on NFS PVC — no image rebuild |
| **CDN** | Cloudflare Tunnel → RustFS → CDN cache at 300+ edge PoPs |
## Model Requirements and VRAM Budget
| Component | Model Size | VRAM Required | Notes |
|-----------|-----------|---------------|-------|
| TRELLIS image-large | 1.2B params | ~10 GB (fp16) | Image-to-3D, best quality |
| TRELLIS text-xlarge | 2.0B params | ~14 GB (fp16) | Text-to-3D, optional |
| UniRig skeleton | ~350M params | ~4 GB | Autoregressive skeleton prediction |
| UniRig skinning | ~350M params | ~4 GB | Bone-point cross-attention |
| Blender CLI | N/A | CPU only | Headless VRM export |
**RTX 4070 budget (12 GB):** Models are loaded sequentially (not concurrently) — TRELLIS runs first, output is saved to disk, then UniRig loads for rigging. Peak VRAM usage is ~10 GB during TRELLIS inference. The desktop's 64 GB system RAM provides ample buffer for model loading and mesh processing.
## Security Considerations
* **Ray GCS port exposure**: The Ray head's port 6379 must be reachable from the desktop. Use a NodePort with network policy restricting source IPs to the desktop's address, or use a WireGuard/Tailscale tunnel.
* **No cluster credentials on desktop**: The desktop runs Ray worker processes and ComfyUI only — it has no `kubeconfig` or Kubernetes API access. Generation is triggered locally via ComfyUI's UI or API, not from the cluster.
* **Model provenance**: TRELLIS and UniRig checkpoints are downloaded from Hugging Face (Microsoft and VAST-AI orgs respectively). Pin checkpoint hashes in the setup script.
* **ComfyUI network**: ComfyUI's web UI (port 8188) should be bound to localhost only when not in use. It is not exposed to the cluster.
* **rclone credentials**: gravenhollow RustFS write credentials stored in `~/.config/rclone/rclone.conf` with `600` permissions.
* **Generated content**: Auto-generated 3D models inherit no licensing restrictions (TRELLIS and UniRig are both MIT-licensed).
## Future Considerations
* **Kubeflow pipeline for model refinement**: When iterating on existing models (re-rigging, parameter sweeps, A/B testing generation backends), a Kubeflow pipeline can orchestrate multi-step refinement workflows with artifact lineage, caching, and retries — submitting RayJobs to the desktop worker via the existing KFP + RayJob pattern from [ADR-0058](0058-training-strategy-cpu-dgx-spark.md)
* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, could run TRELLIS + UniRig in-cluster with dedicated GPU, eliminating desktop dependency
* **Stable3DGen / Hunyuan3D alternatives**: ComfyUI-3D-Pack supports multiple generation backends — can A/B test quality via MLflow metrics
* **VRM expression morphs**: Investigate automated viseme and expression blendshape generation for full lip-sync support without manual Blender work
* **ComfyUI API mode**: ComfyUI supports headless API-only execution (`--listen 0.0.0.0 --port 8188`) — a script or future Kubeflow pipeline can submit workflows via HTTP POST to `/prompt`
* **Text-to-3D**: Use the cluster's vLLM instance to generate a character description, then Stable Diffusion (on desktop) to create a reference image, feeding into TRELLIS — fully text-to-avatar pipeline
* **Batch generation**: Schedule overnight batch runs via CronWorkflow to generate avatar libraries from curated reference images
* **In-cluster migration**: If a 16+ GB NVIDIA GPU is added to the cluster (e.g., via DGX Spark or RTX 5070), migrate TRELLIS + UniRig to a dedicated Ray Serve deployment for always-available generation
## Links
* Supersedes: [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) — BlenderMCP for 3D avatar creation (interactive workflow)
* Updates: [ADR-0059](0059-mac-mini-ray-worker.md) — waterdeep retains Blender role for manual refinement only
* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend architecture (Three.js + VRM avatars)
* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
* Related: [ADR-0005](0005-multi-gpu-strategy.md) — Multi-GPU heterogeneous strategy
* Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy (Kubeflow + RayJob pattern for future pipeline work)
* Related: [ADR-0047](0047-mlflow-experiment-tracking.md) — MLflow experiment tracking
* Related: [ADR-0026](0026-storage-strategy.md) — Storage strategy (gravenhollow NFS-fast, RustFS S3)
* [Microsoft TRELLIS](https://github.com/microsoft/TRELLIS) — Structured 3D Latents for Scalable 3D Generation (CVPR'25 Spotlight)
* [VAST-AI UniRig](https://github.com/VAST-AI-Research/UniRig) — One Model to Rig Them All (SIGGRAPH'25)
* [ComfyUI-3D-Pack](https://github.com/MrForExample/ComfyUI-3D-Pack) — Extensive 3D node suite for ComfyUI
* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader in companions-frontend)