docs: auto-update ADR index in README [skip ci]

adding in proto for mr beaker.
2026-02-26 11:17:22 +00:00 · 2026-02-26 06:17:10 -05:00 · 2026-02-24 10:35:37 +00:00 · 2026-02-24 05:34:29 -05:00 · 2026-02-23 11:15:32 +00:00 · 2026-02-23 06:14:30 -05:00
13 changed files with 1905 additions and 368 deletions
--- a/AGENT-ONBOARDING.md
+++ b/AGENT-ONBOARDING.md
@@ -22,13 +22,13 @@ You are working on a **homelab Kubernetes cluster** running:

 | Repo | Purpose |
 |------|---------|
-| `handler-base` | Shared Python library for NATS handlers |
-| `chat-handler` | Text chat with RAG pipeline |
-| `voice-assistant` | Voice pipeline (STT → RAG → LLM → TTS) |
+| `handler-base` | Shared Go module for NATS handlers (protobuf, health, OTel, clients) |
+| `chat-handler` | Text chat with RAG pipeline (Go) |
+| `voice-assistant` | Voice pipeline: STT → RAG → LLM → TTS (Go) |
 | `kuberay-images` | GPU-specific Ray worker Docker images |
-| `pipeline-bridge` | Bridge between pipelines and services |
-| `stt-module` | Speech-to-text service |
-| `tts-module` | Text-to-speech service |
+| `pipeline-bridge` | Bridge between pipelines and services (Go) |
+| `stt-module` | Speech-to-text service (Go) |
+| `tts-module` | Text-to-speech service (Go) |
 | `ray-serve` | Ray Serve inference services |
 | `argo` | Argo Workflows (training, batch inference) |
 | `kubeflow` | Kubeflow Pipeline definitions |
@@ -48,7 +48,7 @@ You are working on a **homelab Kubernetes cluster** running:
 ┌─────────────────────────────────────────────────────────────────┐
 │                      NATS MESSAGE BUS                            │
 │  Subjects: ai.chat.*, ai.voice.*, ai.pipeline.*                 │
-│  Format: MessagePack (binary)                                   │
+│  Format: Protocol Buffers (binary, see ADR-0061)                │
 └───────────────────────────┬─────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
@@ -93,19 +93,23 @@ talos/
 ### AI/ML Services (Gitea daviestechlabs org)

 ```
-handler-base/                 # Shared handler library
-├── handler_base/             #   Core classes
-│   ├── handler.py            #   Base Handler class
-│   ├── nats_client.py        #   NATS wrapper
-│   └── clients/              #   Service clients (STT, TTS, LLM, etc.)
+handler-base/                 # Shared Go module (NATS, health, OTel, protobuf)
+├── clients/                  #   HTTP clients (LLM, STT, TTS, embeddings, reranker)
+├── config/                   #   Env-based configuration (struct tags)
+├── gen/messagespb/           #   Generated protobuf stubs
+├── handler/                  #   Typed NATS message handler
+├── health/                   #   HTTP health + readiness server
+└── natsutil/                 #   NATS publish/request with protobuf

-chat-handler/                 # RAG chat service
-├── chat_handler_v2.py        #   Handler-base version
-└── Dockerfile.v2
+chat-handler/                 # RAG chat service (Go)
+├── main.go
+├── main_test.go
+└── Dockerfile

-voice-assistant/              # Voice pipeline service
-├── voice_assistant_v2.py     #   Handler-base version
-└── pipelines/voice_pipeline.py
+voice-assistant/              # Voice pipeline service (Go)
+├── main.go
+├── main_test.go
+└── Dockerfile

 argo/                         # Argo WorkflowTemplates
 ├── batch-inference.yaml
@@ -127,8 +131,23 @@ kuberay-images/               # GPU worker images

 ## 🔌 Service Endpoints (Internal)

+```go
+// Copy-paste ready for Go handler services
+const (
+    NATSUrl        = "nats://nats.ai-ml.svc.cluster.local:4222"
+    VLLMUrl        = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
+    WhisperUrl     = "http://whisper-predictor.ai-ml.svc.cluster.local"
+    TTSUrl         = "http://tts-predictor.ai-ml.svc.cluster.local"
+    EmbeddingsUrl  = "http://embeddings-predictor.ai-ml.svc.cluster.local"
+    RerankerUrl    = "http://reranker-predictor.ai-ml.svc.cluster.local"
+    MilvusHost     = "milvus.ai-ml.svc.cluster.local"
+    MilvusPort     = 19530
+    ValkeyUrl      = "redis://valkey.ai-ml.svc.cluster.local:6379"
+)
+```
+
 ```python
-# Copy-paste ready for Python code
+# For Python services (Ray Serve, Kubeflow pipelines, Gradio UIs)
 NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
 VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
 WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
@@ -175,7 +194,7 @@ f"ai.pipeline.status.{request_id}"     # Status updates

 ### Add a New NATS Handler

-1. Create handler repo or add to existing (use `handler-base` library)
+1. Create Go handler repo using `handler-base` module (see [ADR-0061](decisions/0061-go-handler-refactor.md))
 2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
 3. Push to main → Flux deploys automatically

--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -44,7 +44,7 @@ The homelab is a production-grade Kubernetes cluster running on bare-metal hardw
 │  │  • AI_PIPELINE (24h, file)           - Workflow triggers            │    │
 │  └─────────────────────────────────────────────────────────────────────┘    │
 │                                                                              │
-│  Message Format: MessagePack (binary, not JSON)                             │
+│  Message Format: Protocol Buffers (binary, see ADR-0061)                │
 └─────────────────────────────────────────────────────────────────────────────┘
                                  │
        ┌─────────────────────────┼─────────────────────────┐
@@ -312,11 +312,12 @@ Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafan
 |----------|-----------|-----|
 | Talos Linux | Immutable, API-driven, secure | [ADR-0002](decisions/0002-use-talos-linux.md) |
 | NATS over Kafka | Simpler ops, sufficient throughput | [ADR-0003](decisions/0003-use-nats-for-messaging.md) |
-| MessagePack over JSON | Binary efficiency for audio | [ADR-0004](decisions/0004-use-messagepack-for-nats.md) |
+| Protocol Buffers over MessagePack | Type-safe, schema-driven, Go-native | [ADR-0061](decisions/0061-go-handler-refactor.md) |
 | Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) |
 | GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
 | KServe for inference | Standardized API, autoscaling | [ADR-0007](decisions/0007-use-kserve-for-inference.md) |
 | KubeRay unified backend | Fractional GPU, single endpoint | [ADR-0011](decisions/0011-kuberay-unified-gpu-backend.md) |
+| Go handler refactor | Slim images, type-safe protobuf for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) |

 ## Related Documents

--- a/CODING-CONVENTIONS.md
+++ b/CODING-CONVENTIONS.md
@@ -28,27 +28,29 @@ kubernetes/
 ### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)

 ```
-handler-base/                # Shared library for all handlers
-├── handler_base/
-│   ├── handler.py           # Base Handler class
-│   ├── nats_client.py       # NATS wrapper
-│   ├── config.py            # Pydantic Settings
-│   ├── health.py            # K8s probes
-│   ├── telemetry.py         # OpenTelemetry
-│   └── clients/             # Service clients
-├── tests/
-└── pyproject.toml
+handler-base/                # Shared Go module for all NATS handlers
+├── clients/                 #   HTTP clients (LLM, STT, TTS, embeddings, reranker)
+├── config/                  #   Env-based configuration (struct tags)
+├── gen/messagespb/          #   Generated protobuf stubs
+├── handler/                 #   Typed NATS message handler with OTel + health wiring
+├── health/                  #   HTTP health + readiness server
+├── messages/                #   Type aliases from generated protobuf stubs
+├── natsutil/                #   NATS publish/request with protobuf encoding
+├── proto/messages/v1/       #   .proto schema source
+├── go.mod
+└── buf.yaml                 #   buf protobuf toolchain config

-chat-handler/                # Text chat service
-voice-assistant/             # Voice pipeline service
-pipeline-bridge/             # Workflow engine bridge
-├── {name}.py                # Handler implementation (uses handler-base)
-├── pyproject.toml           # PEP 621 project metadata (see ADR-0012)
-├── uv.lock                  # Deterministic lock file
-├── tests/
-│   ├── conftest.py
-│   └── test_{name}.py
-└── Dockerfile
+chat-handler/                # Text chat service (Go)
+voice-assistant/             # Voice pipeline service (Go)
+pipeline-bridge/             # Workflow engine bridge (Go)
+stt-module/                  # Speech-to-text bridge (Go)
+tts-module/                  # Text-to-speech bridge (Go)
+├── main.go                  # Service entry point
+├── main_test.go             # Unit tests
+├── e2e_test.go              # End-to-end tests
+├── go.mod                   # Go module (depends on handler-base)
+├── Dockerfile               # Distroless container (~20 MB)
+└── renovate.json            # Dependency update config

 argo/                        # Argo WorkflowTemplates
 ├── {workflow-name}.yaml
@@ -138,7 +140,20 @@ tts_task = synthesize_speech(text=llm_task.output)  # noqa: F841

 ### Project Structure

+```go
+// Go handler services use handler-base shared module
+import (
+    "git.daviestechlabs.io/daviestechlabs/handler-base/clients"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/config"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/handler"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/health"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/messages"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/natsutil"
+)
+```
+
 ```python
+# Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs
 # Use async/await for I/O
 async def handle_message(msg: Msg) -> None:
    ...
@@ -149,10 +164,6 @@ class ChatRequest:
    user_id: str
    message: str
    enable_rag: bool = True
-
-# Use msgpack for NATS messages
-import msgpack
-data = msgpack.packb({"key": "value"})
 ```

 ### Naming
@@ -200,31 +211,36 @@ except Exception as e:

 ### NATS Message Handling

-```python
-import nats
-import msgpack
+All NATS handler services use Go with Protocol Buffers encoding (see [ADR-0061](decisions/0061-go-handler-refactor.md)):

-async def message_handler(msg: Msg) -> None:
-    try:
-        # Decode MessagePack
-        data = msgpack.unpackb(msg.data, raw=False)
+```go
+// Go NATS handler (production pattern)
+func (h *Handler) handleMessage(msg *nats.Msg) {
+    var req messages.ChatRequest
+    if err := proto.Unmarshal(msg.Data, &req); err != nil {
+        h.logger.Error("failed to unmarshal", "error", err)
+        return
+    }

-        # Process
-        result = await process(data)
+    // Process
+    result, err := h.process(ctx, &req)
+    if err != nil {
+        h.logger.Error("handler error", "error", err)
+        msg.Nak()
+        return
+    }

-        # Reply if request-reply pattern
-        if msg.reply:
-            await msg.respond(msgpack.packb(result))
-        
-        # Acknowledge for JetStream
-        await msg.ack()
-        
-    except Exception as e:
-        logger.error(f"Handler error: {e}")
-        # NAK for retry (JetStream)
-        await msg.nak()
+    // Reply if request-reply pattern
+    if msg.Reply != "" {
+        data, _ := proto.Marshal(result)
+        msg.Respond(data)
+    }
+    msg.Ack()
+}
 ```

+> **Python NATS** is still used in Ray Serve `runtime_env` and Kubeflow pipeline components where needed, but all dedicated NATS handler services are Go.
+
 ---

 ## Kubernetes Manifest Conventions
@@ -499,8 +515,9 @@ Each application should have a README with:
 | Use `latest` image tags | Pin to specific versions |
 | Skip health checks | Always define liveness/readiness |
 | Ignore resource limits | Set appropriate requests/limits |
-| Use JSON for NATS messages | Use MessagePack (binary) |
-| Synchronous I/O in handlers | Use async/await |
+| Use JSON for NATS messages | Use Protocol Buffers (see ADR-0061) |
+| Write handler services in Python | Use Go with handler-base module (ADR-0061) |
+| Synchronous I/O in handlers | Use goroutines / async patterns |

 ---

--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@
 [![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)

 <!-- ADR-BADGES-START -->
-![ADR Count](https://img.shields.io/badge/ADRs-60_total-blue?logo=bookstack) ![Accepted](https://img.shields.io/badge/accepted-58-brightgreen) ![Proposed](https://img.shields.io/badge/proposed-1-yellow)
+![ADR Count](https://img.shields.io/badge/ADRs-64_total-blue?logo=bookstack) ![Accepted](https://img.shields.io/badge/accepted-58-brightgreen) ![Proposed](https://img.shields.io/badge/proposed-2-yellow)
 <!-- ADR-BADGES-END -->

 ## 📖 Quick Navigation
@@ -94,7 +94,7 @@ homelab-design/
 | 0001 | [Record Architecture Decisions](decisions/0001-record-architecture-decisions.md) | ✅ accepted | 2025-11-30 |
 | 0002 | [Use Talos Linux for Kubernetes Nodes](decisions/0002-use-talos-linux.md) | ✅ accepted | 2025-11-30 |
 | 0003 | [Use NATS for AI/ML Messaging](decisions/0003-use-nats-for-messaging.md) | ✅ accepted | 2025-12-01 |
-| 0004 | [Use MessagePack for NATS Messages](decisions/0004-use-messagepack-for-nats.md) | ✅ accepted | 2025-12-01 |
+| 0004 | [Use MessagePack for NATS Messages](decisions/0004-use-messagepack-for-nats.md) | ♻️ superseded by [ADR-0061](0061-go-handler-refactor.md) (Protocol Buffers) | 2025-12-01 |
 | 0005 | [Multi-GPU Heterogeneous Strategy](decisions/0005-multi-gpu-strategy.md) | ✅ accepted | 2025-12-01 |
 | 0006 | [GitOps with Flux CD](decisions/0006-gitops-with-flux.md) | ✅ accepted | 2025-11-30 |
 | 0007 | [Use KServe for ML Model Serving](decisions/0007-use-kserve-for-inference.md) | ♻️ superseded by [ADR-0011](0011-kuberay-unified-gpu-backend.md) | 2025-12-15 (Updated: 2026-02-02) |
@@ -109,7 +109,7 @@ homelab-design/
 | 0016 | [Affine Email Verification Strategy for Authentik OIDC](decisions/0016-affine-email-verification-strategy.md) | ✅ accepted | 2026-02-04 |
 | 0017 | [Secrets Management Strategy](decisions/0017-secrets-management-strategy.md) | ✅ accepted | 2026-02-04 |
 | 0018 | [Security Policy Enforcement](decisions/0018-security-policy-enforcement.md) | ✅ accepted | 2026-02-04 |
-| 0019 | [Python Module Deployment Strategy](decisions/0019-handler-deployment-strategy.md) | ✅ accepted | 2026-02-02 |
+| 0019 | [Python Module Deployment Strategy](decisions/0019-handler-deployment-strategy.md) | ♻️ superseded by [ADR-0061](0061-go-handler-refactor.md) | 2026-02-02 |
 | 0020 | [Internal Registry URLs for CI/CD](decisions/0020-internal-registry-for-cicd.md) | ✅ accepted | 2026-02-02 |
 | 0021 | [Notification Architecture](decisions/0021-notification-architecture.md) | ✅ accepted | 2026-02-04 |
 | 0022 | [ntfy-Discord Bridge Service](decisions/0022-ntfy-discord-bridge.md) | ✅ accepted | 2026-02-04 |
@@ -149,8 +149,12 @@ homelab-design/
 | 0056 | [Custom Trained Voice Support in TTS Module](decisions/0056-custom-voice-support-tts-module.md) | ✅ accepted | 2026-02-13 |
 | 0057 | [Per-Repository Renovate Configurations](decisions/0057-renovate-per-repo-configs.md) | ✅ accepted | 2026-02-13 |
 | 0058 | [Training Strategy – Distributed CPU Now, DGX Spark Later](decisions/0058-training-strategy-cpu-dgx-spark.md) | ✅ accepted | 2026-02-14 |
-| 0059 | [Add Mac Mini M4 Pro (waterdeep) to Ray Cluster as External Worker](decisions/0059-mac-mini-ray-worker.md) | 📝 proposed | 2026-02-16 |
+| 0059 | [Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation](decisions/0059-mac-mini-ray-worker.md) | ✅ accepted | 2026-02-16 |
 | 0060 | [Internal PKI with Vault and cert-manager](decisions/0060-internal-pki-vault.md) | ✅ accepted | 2026-02-16 |
+| 0061 | [Refactor NATS Handler Services from Python to Go](decisions/0061-go-handler-refactor.md) | ✅ accepted | 2026-02-19 |
+| 0062 | [BlenderMCP for 3D Avatar Creation via Kasm Workstation](decisions/0062-blender-mcp-3d-avatar-workflow.md) | ♻️ superseded by [ADR-0063](0063-comfyui-3d-avatar-pipeline.md) | 2026-02-21 |
+| 0063 | [ComfyUI Image-to-3D Avatar Pipeline with TRELLIS + UniRig](decisions/0063-comfyui-3d-avatar-pipeline.md) | 📝 proposed | 2026-02-24 |
+| 0064 | [waterdeep (Mac Mini M4 Pro) as Dedicated Coding Agent with Fine-Tuned Model](decisions/0064-waterdeep-coding-agent.md) | 📝 proposed | 2026-02-26 |
 <!-- ADR-TABLE-END -->

 ## 🔗 Related Repositories
@@ -188,4 +192,4 @@ The former monolithic `llm-workflows` repo has been archived and decomposed into

 ---

-*Last updated: 2026-02-17*
+*Last updated: 2026-02-26*
--- a/TECH-STACK.md
+++ b/TECH-STACK.md
@@ -117,9 +117,14 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca

 | Application | Language | Framework | Purpose |
 |-------------|----------|-----------|---------|
-| Companions | Go | net/http + HTMX | AI chat interface |
-| Voice WebApp | Python | Gradio | Voice assistant UI |
-| Various handlers | Python | asyncio + nats.py | NATS event handlers |
+| Companions | Go | net/http + HTMX | AI chat interface (SSR) |
+| Chat Handler | Go | handler-base | RAG + LLM text pipeline |
+| Voice Assistant | Go | handler-base | STT → RAG → LLM → TTS pipeline |
+| Pipeline Bridge | Go | handler-base | Kubeflow/Argo workflow triggers |
+| STT Module | Go | handler-base | Speech-to-text bridge |
+| TTS Module | Go | handler-base | Text-to-speech bridge |
+| Voice WebApp | Python | Gradio | Voice assistant UI (dev/testing) |
+| Ray Serve | Python | Ray Serve | GPU inference endpoints |

 ### Frontend

@@ -242,27 +247,41 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca

 ---

-## Python Dependencies (handler-base)
+## Go Dependencies (handler-base)

-Core library for all NATS handlers: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
+Shared Go module for all NATS handler services: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
+
+```go
+// go.mod (handler-base v1.0.0)
+require (
+    github.com/nats-io/nats.go          // NATS client
+    google.golang.org/protobuf           // Protocol Buffers encoding
+    github.com/zitadel/oidc/v3           // OIDC client
+    go.opentelemetry.io/otel             // OpenTelemetry traces + metrics
+    github.com/milvus-io/milvus-sdk-go   // Milvus vector search
+)
+```
+
+See [ADR-0061](decisions/0061-go-handler-refactor.md) for the full refactoring rationale.
+
+## Python Dependencies (ML/AI only)
+
+Python is retained for ML inference, pipeline orchestration, and dev tools:

 ```toml
-# Core
-nats-py>=2.7.0          # NATS client
-msgpack>=1.0.0          # Binary serialization
-httpx>=0.27.0           # HTTP client
+# ray-serve (GPU inference)
+ray[serve]>=2.53.0
+vllm>=0.8.0
+faster-whisper>=1.0.0
+TTS>=0.22.0
+sentence-transformers>=3.0.0

-# ML/AI
-pymilvus>=2.4.0         # Milvus client
-openai>=1.0.0           # vLLM OpenAI API
+# kubeflow (pipeline definitions)
+kfp>=2.12.1

-# Observability
-opentelemetry-api>=1.20.0
-opentelemetry-sdk>=1.20.0
-mlflow>=2.10.0          # Experiment tracking
-
-# Kubeflow (kubeflow repo)
-kfp>=2.12.1             # Pipeline SDK
+# mlflow (experiment tracking)
+mlflow>=3.7.0
+pymilvus>=2.4.0
 ```

 ---
--- a/decisions/0004-use-messagepack-for-nats.md
+++ b/decisions/0004-use-messagepack-for-nats.md
@@ -1,6 +1,6 @@
 # Use MessagePack for NATS Messages

-* Status: accepted
+* Status: superseded by [ADR-0061](0061-go-handler-refactor.md) (Protocol Buffers)
 * Date: 2025-12-01
 * Deciders: Billy Davies
 * Technical Story: Selecting serialization format for NATS messages
--- a/decisions/0019-handler-deployment-strategy.md
+++ b/decisions/0019-handler-deployment-strategy.md
@@ -1,10 +1,12 @@
 # Python Module Deployment Strategy

-* Status: accepted
+* Status: superseded by [ADR-0061](0061-go-handler-refactor.md)
 * Date: 2026-02-02
 * Deciders: Billy
 * Technical Story: Define how Python handler modules are packaged and deployed to Kubernetes

+> **Note (2026-02-23):** This ADR described deploying Python handlers as Ray Serve applications inside the Ray cluster. [ADR-0061](0061-go-handler-refactor.md) supersedes this approach — all five handler services (chat-handler, voice-assistant, pipeline-bridge, tts-module, stt-module) have been rewritten in Go and now deploy as standalone Kubernetes Deployments with distroless container images (~20 MB each). The Ray cluster is exclusively used for GPU inference workloads. The handler-base shared library is now a Go module published at `git.daviestechlabs.io/daviestechlabs/handler-base` using Protocol Buffers for NATS message encoding.
+
 ## Context

 We have Python modules for AI/ML workflows that need to run on our unified GPU cluster:
--- a/decisions/0046-companions-frontend-architecture.md
+++ b/decisions/0046-companions-frontend-architecture.md
@@ -14,7 +14,7 @@ How do we build a performant, maintainable frontend that integrates with the NAT
 ## Decision Drivers

 * Real-time streaming for chat and voice (WebSocket required)
-* Direct integration with NATS JetStream (binary MessagePack protocol)
+* Direct integration with NATS JetStream (Protocol Buffers encoding, see [ADR-0061](0061-go-handler-refactor.md))
 * Minimal client-side JavaScript (~20KB gzipped target)
 * No frontend build step (no webpack/vite/node required)
 * 3D avatar rendering for immersive experience
@@ -39,8 +39,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
 * No npm, no webpack, no build step — assets served directly
 * Server-side rendering via Go templates
 * WebSocket handled natively in Go (gorilla/websocket)
-* NATS integration with MessagePack in the same binary
+* NATS integration with Protocol Buffers in the same binary
 * Distroless container image for minimal attack surface
+* Type-safe NATS messages via handler-base shared Go module (protobuf stubs)

 ### Negative Consequences

@@ -58,8 +59,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
 | Client state | Alpine.js 3 | Lightweight reactive UI for local state |
 | 3D Avatars | Three.js + VRM | 3D character rendering with lip-sync |
 | Styling | Tailwind CSS 4 + DaisyUI | Utility-first CSS with component library |
-| Messaging | NATS JetStream | Real-time pub/sub with MessagePack encoding |
+| Messaging | NATS JetStream | Real-time pub/sub with Protocol Buffers encoding |
 | Auth | golang-jwt/jwt/v5 | JWT token handling for OAuth flows |
+| Shared lib | handler-base (Go module) | NATS client, protobuf messages, health, OTel, HTTP clients |
 | Database | PostgreSQL (lib/pq) + SQLite | Persistent + local session storage |
 | Observability | OpenTelemetry SDK | Traces, metrics via OTLP gRPC |

@@ -88,7 +90,7 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
 │              ┌─────────┴─────────┐                              │
 │              │  NATS Client      │                              │
 │              │  (JetStream +     │                              │
-│              │   MessagePack)    │                              │
+│              │   Protobuf)       │                              │
 │              └─────────┬─────────┘                              │
 └────────────────────────┼────────────────────────────────────────┘
                         │
@@ -130,8 +132,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
 ## Links

 * Related to [ADR-0003](0003-use-nats-for-messaging.md) (NATS messaging)
-* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding)
+* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding — superseded by Protocol Buffers, see [ADR-0061](0061-go-handler-refactor.md))
 * Related to [ADR-0011](0011-kuberay-unified-gpu-backend.md) (Ray Serve backend)
 * Related to [ADR-0028](0028-authentik-sso-strategy.md) (OAuth/OIDC)
+* Related to [ADR-0061](0061-go-handler-refactor.md) (Go handler refactor — handler-base shared module, protobuf wire format)
 * [HTMX Documentation](https://htmx.org/docs/)
 * [VRM Specification](https://vrm.dev/en/)
--- a/decisions/0059-mac-mini-ray-worker.md
+++ b/decisions/0059-mac-mini-ray-worker.md
@@ -1,338 +1,287 @@
-# Add Mac Mini M4 Pro (waterdeep) to Ray Cluster as External Worker
+# Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation

-* Status: proposed
+* Status: accepted
 * Date: 2026-02-16
+* Updated: 2026-02-23
 * Deciders: Billy
-* Technical Story: Expand Ray cluster with Apple Silicon compute for inference and training
+* Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role

 ## Context and Problem Statement

-The homelab Ray cluster currently runs entirely within Kubernetes, with GPU workers pinned to specific nodes:
+**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see [ADR-0037](0037-node-naming-conventions.md)). The original proposal was to add it to the Ray cluster as an external inference/training worker, but:

-| Node | GPU | Memory | Workload |
-|------|-----|--------|----------|
-| khelben | Strix Halo (ROCm) | 128 GB unified | vLLM 70B (0.95 GPU) |
-| elminster | RTX 2070 (CUDA) | 8 GB VRAM | Whisper (0.5) + TTS (0.5) |
-| drizzt | Radeon 680M (ROCm) | 12 GB VRAM | Embeddings (0.8) |
-| danilo | Intel Arc (i915) | ~6 GB shared | Reranker (0.8) |
+- All Ray inference slots are already allocated and stable — adding a 5th GPU class (MPS) increases complexity without filling a gap
+- vLLM's MPS backend remains experimental — not production-ready for serving
+- The real unmet need is **3D avatar creation** for companions-frontend ([ADR-0063](0063-comfyui-3d-avatar-pipeline.md))

-All GPUs are fully allocated to inference (see [ADR-0005](0005-multi-gpu-strategy.md), [ADR-0011](0011-kuberay-unified-gpu-backend.md)). Training is currently CPU-only and distributed across cluster nodes via Ray Train ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)).
+[ADR-0063](0063-comfyui-3d-avatar-pipeline.md) describes an automated ComfyUI + TRELLIS + UniRig pipeline for image-to-VRM avatar generation, running on a personal desktop as an on-demand Ray worker. This supersedes the manual BlenderMCP Kasm workflow from [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md). waterdeep retains its role as an interactive Blender workstation for manual refinement of auto-generated models.

-**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory that currently serves as a development workstation (see [ADR-0037](0037-node-naming-conventions.md)). Its Apple Silicon GPU (MPS backend) and unified memory architecture make it a strong candidate for both inference and training workloads — but macOS cannot run Talos Linux or easily join the Kubernetes cluster as a native node.
+waterdeep's M4 Pro has a 16-core GPU with hardware-accelerated Metal rendering and 48 GB of unified memory shared between CPU and GPU. Running Blender natively on waterdeep with BlenderMCP gives a dramatically better 3D creation experience than Kasm.

-How do we integrate waterdeep's compute into the Ray cluster without disrupting the existing Kubernetes-managed infrastructure?
+How should we use waterdeep to maximise the 3D avatar creation pipeline for companions-frontend?

 ## Decision Drivers

-* 48 GB unified memory is sufficient for medium-large models (e.g., 7B–30B at Q4/Q8 quantisation)
-* Apple Silicon MPS backend is supported by PyTorch and vLLM (experimental)
-* macOS cannot run Talos Linux — must integrate without Kubernetes
-* Ray natively supports heterogeneous clusters with external workers
-* Must not impact existing inference serving stability
-* Training workloads ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) would benefit from a GPU-accelerated worker
-* ARM64 architecture requires compatible Python packages and model formats
+* Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance
+* waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting
+* 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping
+* VS Code with Copilot agent mode and BlenderMCP server are installed on waterdeep — VS Code drives Blender via localhost:9876 with zero-latency socket communication
+* Exported VRM models must reach gravenhollow for production serving ([ADR-0063](0063-comfyui-3d-avatar-pipeline.md))
+* **rclone** chosen for asset promotion to gravenhollow's RustFS S3 endpoint — simpler than NFS mounts on macOS, consistent with existing Kasm rclone patterns, and avoids autofs/NFS fstab complexity
+* The automated ComfyUI pipeline from [ADR-0063](0063-comfyui-3d-avatar-pipeline.md) handles most avatar generation; waterdeep serves as the manual refinement station
+* ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified

 ## Considered Options

-1. **External Ray worker on macOS** — run a Ray worker process natively on waterdeep that connects to the cluster Ray head over the network
-2. **Linux VM on Mac** — run UTM/Parallels VM with Linux, join as a Kubernetes node
-3. **K3s agent on macOS** — run K3s directly on macOS via Docker Desktop
+1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via rclone (S3)
+2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training
+3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062

 ## Decision Outcome

-Chosen option: **Option 1 — External Ray worker on macOS**, because Ray natively supports heterogeneous workers joining over the network. This avoids the complexity of running Kubernetes on macOS, lets waterdeep remain a development workstation, and leverages Apple Silicon MPS acceleration transparently through PyTorch.
+Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Mini's Metal GPU makes it dramatically better for 3D work than CPU-rendered Kasm, the Ray cluster doesn't need another worker, and the local workflow eliminates network latency between VS Code, the MCP server, and Blender.

 ### Positive Consequences

-* Zero Kubernetes overhead on waterdeep — remains a usable dev workstation
-* 48 GB unified memory available for models (vs split VRAM/RAM on discrete GPUs)
-* MPS GPU acceleration for both inference and training
-* Adds a 5th GPU class to the Ray fleet (Apple MPS alongside ROCm, CUDA, Intel, RDNA2)
-* Training jobs ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) gain a GPU-accelerated worker
-* Can run a secondary LLM instance for overflow or A/B testing
-* Quick to set up — single `ray start` command
-* Worker can be stopped/started without affecting the cluster
+* Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting
+* Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution
+* 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure
+* VS Code + Copilot agent mode + BlenderMCP server installed natively — single editor drives both code and Blender commands
+* rclone for asset promotion — consistent with Kasm rclone patterns, avoids macOS NFS/autofs complexity
+* Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload
+* Kasm Blender remains available as a browser-based fallback for remote/mobile access
+* Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend

 ### Negative Consequences

-* Not managed by KubeRay or Flux — requires manual or launchd-based lifecycle management
-* Network dependency — if waterdeep sleeps or disconnects, Ray tasks on it fail
-* MPS backend has limited operator coverage compared to CUDA/ROCm
-* Python environment must be maintained separately (not in a container image)
-* No Longhorn storage — model cache managed locally or via NFS mount from gravenhollow (nfs-fast)
-* Monitoring not automatically scraped by Prometheus (needs node-exporter or push gateway)
+* Blender, VS Code, and add-ons must be installed and maintained locally on waterdeep via Homebrew
+* Assets created locally need explicit `rclone copy` to promote to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
+* waterdeep is a single machine — no redundancy for the 3D creation workflow
+* Not managed by Kubernetes or GitOps — relies on Homebrew-managed tooling

 ## Pros and Cons of the Options

-### Option 1: External Ray worker on macOS
+### Option 1: Local AI agent on waterdeep

-* Good, because Ray is designed for heterogeneous multi-node clusters
-* Good, because no VM overhead — full access to Metal/MPS and unified memory
-* Good, because waterdeep remains a functional dev workstation
-* Good, because trivial to start/stop (single process)
-* Bad, because not managed by Kubernetes or GitOps
-* Bad, because requires manual Python environment management
-* Bad, because MPS support in vLLM is experimental
+* Good, because Metal GPU acceleration makes Blender usable for real 3D work (sculpting, rendering, material preview)
+* Good, because localhost MCP socket eliminates all network latency
+* Good, because 48 GB unified memory supports complex scenes without swapping
+* Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer
+* Good, because waterdeep stays a dev workstation, aligning with its named role
+* Bad, because local-only — no browser-based remote access (use Kasm for that)
+* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP, VS Code)
+* Bad, because asset promotion to gravenhollow requires explicit rclone command

-### Option 2: Linux VM on Mac
+### Option 2: External Ray worker on macOS (original proposal)

-* Good, because would be a standard Kubernetes node
-* Good, because managed by KubeRay like other workers
-* Bad, because VM overhead reduces available memory (hypervisor, guest OS)
-* Bad, because no MPS/Metal GPU passthrough to Linux VMs on Apple Silicon
-* Bad, because complex to maintain (VM lifecycle, networking, storage)
-* Bad, because wastes the primary advantage (Apple Silicon GPU)
+* Good, because adds GPU compute to the Ray cluster
+* Good, because training jobs gain MPS acceleration
+* Bad, because vLLM MPS backend is experimental — not production-ready
+* Bad, because adds a 5th GPU class (MPS) to an already complex fleet
+* Bad, because Ray GCS port exposure adds security surface
+* Bad, because doesn't address the actual unmet need (3D avatar creation)
+* Bad, because waterdeep becomes a server, degrading its dev workstation role

-### Option 3: K3s agent on macOS
+### Option 3: Kasm-only workflow

-* Good, because Kubernetes-native, managed by Flux
-* Bad, because K3s on macOS requires Docker Desktop (resource overhead)
-* Bad, because container networking on macOS is fragile
-* Bad, because MPS device access from within Docker containers is unreliable
-* Bad, because not a supported K3s configuration
+* Good, because browser-based — usable from any device
+* Good, because no local installation required
+* Bad, because CPU-rendered Blender inside DinD — poor viewport performance
+* Bad, because network latency between VS Code and Blender socket
+* Bad, because limited memory inside Kasm container
+* Bad, because no GPU acceleration for rendering or sculpting

 ## Architecture

 ```
-┌──────────────────────────────────────────────────────────────────────────┐
-│                         Kubernetes Cluster (Talos)                        │
+┌─────────────────────────────────────────────────────────────────────────┐
+│  waterdeep (Mac Mini M4 Pro · 48 GB unified · Metal GPU)                │
 │                                                                         │
-│  ┌──────────────────────────────────────────────────────────────────┐    │
-│  │            RayService (ai-inference) — KubeRay managed           │    │
+│  ┌──────────────────────────────────────────────────────┐              │
+│  │         VS Code + GitHub Copilot (agent mode)        │              │
 │  │                                                      │              │
-│  │  Head: wulfgar                                                    │    │
-│  │  Workers: khelben (ROCm), elminster (CUDA),                       │    │
-│  │           drizzt (RDNA2), danilo (Intel)                          │    │
-│  └──────────────────────┬───────────────────────────────────────────┘    │
-│                         │ Ray GCS (port 6379)                            │
+│  │  BlenderMCP Server (uvx blender-mcp)                 │              │
+│  │  DISABLE_TELEMETRY=true                              │              │
+│  │         │                                            │              │
+│  │         │ TCP localhost:9876 (zero latency)           │              │
+│  │         ▼                                            │              │
+│  └─────────┬────────────────────────────────────────────┘              │
 │            │                                                            │
-└─────────────────────────┼────────────────────────────────────────────────┘
-                          │ Home network (LAN)
+│  ┌─────────▼────────────────────────────────────────────┐              │
+│  │              Blender 4.x (native macOS)              │              │
+│  │                                                      │              │
+│  │  Renderer: Metal (Eevee real-time + Cycles GPU)      │              │
+│  │  Add-ons:                                            │              │
+│  │   • BlenderMCP (addon.py) — socket server :9876      │              │
+│  │   • VRM Add-on for Blender — import/export VRM       │              │
+│  │                                                      │              │
+│  │  Working files: ~/blender-avatars/                    │              │
+│  │  ├── projects/          (.blend source files)        │              │
+│  │  ├── exports/           (.vrm exported models)       │              │
+│  │  └── textures/          (shared texture library)     │              │
+│  └──────────────────────────────────────────────────────┘              │
+│                          │                                              │
+│                    rclone (S3 asset promotion)                           │
+│                    gravenhollow RustFS :30292                            │
+└──────────────────────────┼──────────────────────────────────────────────┘
                           │
-┌─────────────────────────┼────────────────────────────────────────────────┐
-│  waterdeep (Mac Mini M4 Pro)                                             │
-│                         │                                                │
-│  ┌──────────────────────▼───────────────────────────────────────────┐    │
-│  │           External Ray Worker (ray start --address=...)          │    │
-│  │                                                                   │    │
-│  │  • 12-core CPU (8P + 4E) + 16-core Neural Engine                 │    │
-│  │  • 48 GB unified memory (shared CPU/GPU)                          │    │
-│  │  • MPS (Metal) GPU backend via PyTorch                            │    │
-│  │  • Custom resource: gpu_apple_mps: 1                              │    │
-│  │                                                                   │    │
-│  │  Workloads:                                                       │    │
-│  │  ├── Inference: secondary LLM (7B–30B), overflow serving          │    │
-│  │  └── Training: LoRA/QLoRA fine-tuning via Ray Train               │    │
-│  └──────────────────────────────────────────────────────────────────┘    │
+                           ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│            gravenhollow.lab.daviestechlabs.io                            │
+│            (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB)            │
 │                                                                         │
-│  Model cache: ~/Library/Caches/huggingface + NFS mount (gravenhollow)    │
-└──────────────────────────────────────────────────────────────────────────┘
+│  NFS: /mnt/gravenhollow/kubernetes/avatar-models/                       │
+│  ├── Seed-san.vrm          (default model)                              │
+│  ├── Companion-A.vrm       (promoted from waterdeep)                    │
+│  └── animations/           (shared animation clips)                     │
+│                                                                         │
+│  S3 (RustFS): avatar-models bucket                                      │
+│  (same data, served via Cloudflare Tunnel for remote users)             │
+└──────────────────────────┬──────────────────────────────────────────────┘
+                           │
+              ┌────────────┴───────────────┐
+              │                            │
+        NFS (nfs-fast PVC)          Cloudflare Tunnel
+              │                     (assets.daviestechlabs.io)
+              ▼                            │
+┌──────────────────────────┐               ▼
+│  companions-frontend     │   ┌──────────────────────────┐
+│  (Kubernetes pod)        │   │  Remote users (CDN-cached │
+│  LAN users               │   │  via Cloudflare edge)     │
+└──────────────────────────┘   └──────────────────────────┘
 ```

-## Updated GPU Fleet
-
-| Node | GPU | Backend | Memory | Custom Resource | Workload |
-|------|-----|---------|--------|-----------------|----------|
-| khelben | Strix Halo | ROCm | 128 GB unified | `gpu_strixhalo: 1` | vLLM 70B |
-| elminster | RTX 2070 | CUDA | 8 GB VRAM | `gpu_nvidia: 1` | Whisper + TTS |
-| drizzt | Radeon 680M | ROCm | 12 GB VRAM | `gpu_rdna2: 1` | Embeddings |
-| danilo | Intel Arc | i915/IPEX | ~6 GB shared | `gpu_intel: 1` | Reranker |
-| **waterdeep** | **M4 Pro** | **MPS (Metal)** | **48 GB unified** | **`gpu_apple_mps: 1`** | **LLM (7B–30B) + Training** |
-
 ## Implementation Plan

-### 1. Network Prerequisites
-
-waterdeep must be able to reach the Ray head node's GCS port:
+### 1. Install Blender and Add-ons

 ```bash
-# From waterdeep, verify connectivity
-nc -zv <ray-head-ip> 6379
+# Install Blender via Homebrew
+brew install --cask blender
+
+# Download BlenderMCP add-on
+curl -LO https://raw.githubusercontent.com/ahujasid/blender-mcp/main/addon.py
+
+# Install in Blender:
+# Edit > Preferences > Add-ons > Install... > select addon.py
+# Enable "Interface: Blender MCP"
+
+# Install VRM Add-on for Blender:
+# Download from https://vrm-addon-for-blender.info/en/
+# Edit > Preferences > Add-ons > Install... > select VRM add-on zip
+# Enable "Import-Export: VRM"
 ```

-The Ray head service (`ai-inference-raycluster-head-svc`) is ClusterIP-only. Options to expose it:
+### 2. VS Code MCP Configuration

-| Approach | Complexity | Recommended |
-|----------|-----------|-------------|
-| NodePort service on port 6379 | Low | For initial setup |
-| Envoy Gateway TCPRoute | Medium | For production use |
-| Tailscale/WireGuard mesh | Medium | If already in use |
+```json
+// .vscode/mcp.json (in companions-frontend or global settings)
+{
+  "servers": {
+    "blender": {
+      "command": "uvx",
+      "args": ["blender-mcp"],
+      "env": {
+        "BLENDER_HOST": "localhost",
+        "BLENDER_PORT": "9876",
+        "DISABLE_TELEMETRY": "true"
+      }
+    }
+  }
+}
+```

-### 2. Python Environment on waterdeep
+### 3. Python Environment for BlenderMCP

 ```bash
 # Install uv (per ADR-0012)
 curl -LsSf https://astral.sh/uv/install.sh | sh

-# Create Ray worker environment
-uv venv ~/ray-worker --python 3.12
-source ~/ray-worker/bin/activate
-
-# Install Ray with ML dependencies
-uv pip install "ray[default]==2.53.0" torch torchvision torchaudio \
-    transformers accelerate peft bitsandbytes \
-    ray-serve-apps  # internal package from Gitea PyPI
-
-# Verify MPS availability
-python -c "import torch; print(torch.backends.mps.is_available())"
+# uvx handles the BlenderMCP server environment automatically
+# Verify it works:
+uvx blender-mcp --help
 ```

-### 3. Start Ray Worker
+### 4. rclone for Asset Promotion
+
+Use rclone to promote finished VRM exports to gravenhollow's RustFS S3 endpoint. This is consistent with the promotion pattern from [ADR-0063](0063-comfyui-3d-avatar-pipeline.md) and avoids macOS NFS/autofs complexity.

 ```bash
-# Join the cluster with custom resources
-ray start \
-    --address="<ray-head-ip>:6379" \
-    --num-cpus=12 \
-    --num-gpus=1 \
-    --resources='{"gpu_apple_mps": 1}' \
-    --block
+# Install rclone
+brew install rclone
+
+# Configure gravenhollow RustFS endpoint
+rclone config create gravenhollow s3 \
+    provider=Other \
+    endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \
+    access_key_id=<key> \
+    secret_access_key=<secret>
+
+# Promote a finished VRM
+rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/
+
+# Sync all exports (idempotent)
+rclone sync ~/blender-avatars/exports/ gravenhollow:avatar-models/ --exclude "*.blend"
 ```

-### 4. launchd Service (Persistent)
+> **Why rclone over NFS?** macOS autofs/NFS mounts are fragile across reboots and network changes. rclone is a single binary, works over HTTPS, and matches the promotion pattern already used in Kasm workflows. The explicit `rclone copy` command also serves as a deliberate promotion gate — only intentionally promoted models reach production.

-```xml
-<!-- ~/Library/LaunchAgents/io.ray.worker.plist -->
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
-  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
-<plist version="1.0">
-<dict>
-    <key>Label</key>
-    <string>io.ray.worker</string>
-    <key>ProgramArguments</key>
-    <array>
-        <string>/Users/billy/ray-worker/bin/ray</string>
-        <string>start</string>
-        <string>--address=RAY_HEAD_IP:6379</string>
-        <string>--num-cpus=12</string>
-        <string>--num-gpus=1</string>
-        <string>--resources={"gpu_apple_mps": 1}</string>
-        <string>--block</string>
-    </array>
-    <key>RunAtLoad</key>
-    <true/>
-    <key>KeepAlive</key>
-    <true/>
-    <key>StandardOutPath</key>
-    <string>/tmp/ray-worker.log</string>
-    <key>StandardErrorPath</key>
-    <string>/tmp/ray-worker-error.log</string>
-    <key>EnvironmentVariables</key>
-    <dict>
-        <key>PATH</key>
-        <string>/Users/billy/ray-worker/bin:/usr/local/bin:/usr/bin:/bin</string>
-    </dict>
-</dict>
-</plist>
-```
+### 5. Avatar Creation Workflow (waterdeep)

-```bash
-launchctl load ~/Library/LaunchAgents/io.ray.worker.plist
-```
+1. **Open Blender** on waterdeep (native Metal-accelerated)
+2. **Enable BlenderMCP** → 3D View sidebar → "BlenderMCP" tab → click "Connect"
+3. **Open VS Code** with Copilot agent mode — BlenderMCP server starts automatically
+4. **Create avatars** using AI-assisted prompts:
+   - _"Create an anime-style character with silver hair and a mage outfit"_
+   - _"Apply metallic blue material to the staff"_
+   - _"Rig this character for VRM export with standard humanoid bones"_
+   - _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_
+5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps
+6. **Promote** the finished VRM to gravenhollow via rclone:
+   ```bash
+   rclone copy ~/blender-avatars/exports/Silver-Mage-v1.vrm gravenhollow:avatar-models/
+   ```
+7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit

-### 5. Model Cache via NFS
+### 6. Workflow Comparison: waterdeep vs Kasm

-Mount the gravenhollow NFS share on waterdeep so models are shared with the cluster via the fast all-SSD NAS:
-
-```bash
-# Mount gravenhollow NFS share (all-SSD, dual 10GbE)
-sudo mount -t nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models \
-    /Volumes/model-cache
-
-# Or add to /etc/fstab for persistence
-# gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/models /Volumes/model-cache nfs rw 0 0
-
-# Symlink to HuggingFace cache location
-ln -s /Volumes/model-cache ~/.cache/huggingface/hub
-```
-
-### 6. Ray Serve Deployment Targeting
-
-To schedule a deployment specifically on waterdeep, use the `gpu_apple_mps` custom resource in the RayService config:
-
-```yaml
-# In rayservice.yaml serveConfigV2
- name: llm-secondary
-  route_prefix: /llm-secondary
-  import_path: ray_serve.serve_llm:app
-  runtime_env:
-    env_vars:
-      MODEL_ID: "Qwen/Qwen2.5-32B-Instruct-AWQ"
-      DEVICE: "mps"
-      MAX_MODEL_LEN: "4096"
-  deployments:
-    - name: LLMDeployment
-      num_replicas: 1
-      ray_actor_options:
-        num_gpus: 0.95
-        resources:
-          gpu_apple_mps: 1
-```
-
-### 7. Training Integration
-
-Ray Train jobs from [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) will automatically discover waterdeep as an available worker. To prefer it for GPU-accelerated training:
-
-```python
-# In cpu_training_pipeline.py — updated to prefer MPS when available
-trainer = TorchTrainer(
-    train_func,
-    scaling_config=ScalingConfig(
-        num_workers=1,
-        use_gpu=True,
-        resources_per_worker={"gpu_apple_mps": 1},
-    ),
-)
-```
-
-## Monitoring
-
-Since waterdeep is not a Kubernetes node, standard Prometheus scraping won't reach it. Options:
-
-| Approach | Notes |
-|----------|-------|
-| Prometheus push gateway | Ray worker pushes metrics periodically |
-| Node-exporter on macOS | Homebrew `node_exporter`, scraped by Prometheus via static target |
-| Ray Dashboard | Already shows all connected workers (ray-serve.lab.daviestechlabs.io) |
-
-The Ray Dashboard at `ray-serve.lab.daviestechlabs.io` will automatically show waterdeep as a connected node with its resources, tasks, and memory usage — no additional configuration needed.
-
-## Power Management
-
-To prevent macOS from sleeping and disconnecting the Ray worker:
-
-```bash
-# Disable sleep when on power adapter
-sudo pmset -c sleep 0 displaysleep 0 disksleep 0
-
-# Or use caffeinate for the Ray process
-caffeinate -s ray start --address=... --block
-```
+| Aspect | waterdeep (local) | Kasm (browser) |
+|--------|-------------------|----------------|
+| **GPU rendering** | Metal 16-core GPU — Eevee real-time, Cycles GPU | CPU-only software rendering |
+| **Viewport FPS** | 60fps (Metal) | 5–15fps (CPU rasterisation) |
+| **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container |
+| **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation |
+| **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound |
+| **Asset promotion** | rclone to gravenhollow RustFS S3 | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
+| **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere |
+| **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image |
+| **Use when** | Primary creation workflow | Remote access, quick edits, mobile |

 ## Security Considerations

-* Ray's GCS port (6379) will be exposed outside the cluster — restrict with firewall rules to waterdeep's IP only
-* The Ray worker has no RBAC — it executes whatever tasks the head assigns
-* Model weights on NFS are read-only from waterdeep (mount with `ro` option if possible)
-* NFS traffic to gravenhollow traverses the LAN — ensure dual 10GbE links are active
-* Consider Tailscale or WireGuard for encrypted transport if the Ray GCS traffic crosses untrusted network segments
+* BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender — review AI-generated code before execution, especially file I/O operations
+* Telemetry disabled via `DISABLE_TELEMETRY=true` in MCP server config
+* BlenderMCP socket (port 9876) bound to localhost — not exposed to the network
+* NFS traffic to gravenhollow traverses the LAN — no sensitive data in VRM files
+* waterdeep has no cluster access — compromise doesn't impact Kubernetes workloads
+* `.blend` source files stay local on waterdeep; only finished VRM exports are promoted to gravenhollow

 ## Future Considerations

-* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, waterdeep can shift to secondary inference while DGX Spark handles training
-* **vLLM MPS maturity**: As vLLM's MPS backend matures, waterdeep could serve larger models more efficiently
-* **MLX backend**: Apple's MLX framework may provide better performance than PyTorch MPS for some workloads — worth evaluating as an alternative serving backend
-* **Second Mac Mini**: If another Apple Silicon node is added, the external-worker pattern scales trivially
+* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation
+* **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature
+* **Automated promotion**: A file watcher (fswatch/launchd) could auto-run `rclone sync` when a new VRM appears in `~/blender-avatars/exports/`
+* **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow

 ## Links

-* [Ray Clusters — Adding External Workers](https://docs.ray.io/en/latest/cluster/vms/getting-started.html)
-* [PyTorch MPS Backend](https://pytorch.org/docs/stable/notes/mps.html)
-* [vLLM Apple Silicon Support](https://docs.vllm.ai/en/latest/)
-* Related: [ADR-0005](0005-multi-gpu-strategy.md) — Multi-GPU strategy
-* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
-* Related: [ADR-0024](0024-ray-repository-structure.md) — Ray repository structure
-* Related: [ADR-0035](0035-arm64-worker-strategy.md) — ARM64 worker strategy
-* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions
-* Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy
+* Related: [ADR-0063](0063-comfyui-3d-avatar-pipeline.md) — ComfyUI image-to-3D avatar pipeline (supersedes ADR-0062)
+* Related: [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) — BlenderMCP 3D avatar workflow (superseded)
+* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend architecture (Three.js + VRM avatars)
+* Related: [ADR-0026](0026-storage-strategy.md) — Storage strategy (gravenhollow NFS-fast)
+* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions (waterdeep)
+* Related: [ADR-0012](0012-use-uv-for-python-development.md) — uv for Python development
+* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
+* [Blender Metal GPU Rendering](https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html)
+* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
+* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm)
--- a/decisions/0061-go-handler-refactor.md
+++ b/decisions/0061-go-handler-refactor.md
@@ -0,0 +1,147 @@
+# Refactor NATS Handler Services from Python to Go
+
+* Status: accepted
+* Date: 2026-02-19
+* Decided: 2026-02-21
+* Deciders: Billy
+* Technical Story: Reduce container image sizes and resource consumption for non-ML handler services by rewriting them in Go
+
+## Context and Problem Statement
+
+The AI pipeline's non-inference services — `chat-handler`, `voice-assistant`, `pipeline-bridge`, `tts-module`, and the HTTP-forwarding variant of `stt-module` — are Python applications built on the `handler-base` shared library. None of these services perform local ML inference; they orchestrate calls to external Ray Serve endpoints over HTTP and route messages via NATS with MessagePack encoding.
+
+> **Implementation note (2026-02-21):** During the Go rewrite, the wire format was upgraded from MessagePack to **Protocol Buffers** (see [ADR-0004 superseded](0004-use-messagepack-for-nats.md)). The shared Go module is published as `handler-base` v1.0.0 (not `handler-go` as originally proposed).
+
+Despite doing only lightweight I/O orchestration, each service inherits the full Python runtime and its dependency tree through `handler-base` (which pulls in `numpy`, `pymilvus`, `redis`, `httpx`, `pydantic`, `opentelemetry-*`, `mlflow`, and `psycopg2-binary`). This results in container images of **500–700 MB each** — five services totalling **~3 GB** of registry storage — for workloads that are fundamentally HTTP/NATS glue code.
+
+The homelab already has two production Go services (`companions-frontend` and `ntfy-discord`) that prove the NATS + MessagePack + OpenTelemetry pattern works well in Go with images under 30 MB.
+
+How do we reduce the image footprint and resource consumption of the non-ML handler services without disrupting the ML inference layer?
+
+## Decision Drivers
+
+* Container images for glue services are 500–700 MB despite doing no ML work
+* Go produces static binaries yielding images of ~15–30 MB (scratch/distroless base)
+* Go services start in milliseconds vs. seconds for Python, improving pod scheduling
+* Go's memory footprint is ~10× lower for equivalent I/O-bound workloads
+* The NATS + msgpack + OTel pattern is already proven in `companions-frontend`
+* Go has first-class Kubernetes client support (`client-go`) — relevant for `pipeline-bridge`
+* ML inference services (Ray Serve, kuberay-images) must remain Python — only orchestration moves
+* Five services share a common base (`handler-base`) — a single Go module replaces it for all
+
+## Considered Options
+
+1. **Rewrite handler services in Go with a shared Go module**
+2. **Optimise Python images (multi-stage builds, slim deps, compiled wheels)**
+3. **Keep current Python stack unchanged**
+
+## Decision Outcome
+
+Chosen option: **Option 1 — Rewrite handler services in Go**, because the services are pure I/O orchestration with no ML dependencies, the Go pattern is already proven in-cluster, and the image + resource savings are an order of magnitude improvement that Python optimisation cannot match.
+
+### Positive Consequences
+
+* Five container images shrink from ~3 GB total to ~100–150 MB total
+* Sub-second cold start enables faster rollouts and autoscaling via KEDA
+* Lower memory footprint frees cluster resources for ML workloads
+* Eliminates Python runtime CVE surface area from non-ML services
+* Single `handler-go` module provides shared NATS, health, OTel, and client code
+* `pipeline-bridge` gains `client-go` — the canonical Kubernetes client library
+* Go's type system catches message schema drift at compile time
+
+### Negative Consequences
+
+* One-time rewrite effort across five services
+* Team must maintain Go **and** Python codebases (Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs)
+* `handler-go` needs feature parity with `handler-base` for the orchestration subset (NATS client, health server, OTel, HTTP clients, Milvus client)
+* Audio handling in `stt-module` (VAD) requires a Go webrtcvad binding or equivalent
+
+## Pros and Cons of the Options
+
+### Option 1 — Rewrite in Go
+
+* Good, because images shrink from ~600 MB → ~20 MB per service
+* Good, because memory usage drops from ~150 MB → ~15 MB per service
+* Good, because startup time drops from ~3 s → <100 ms
+* Good, because Go has mature libraries for every dependency (nats.go, client-go, otel-go, milvus-sdk-go)
+* Good, because two existing Go services in the cluster prove the pattern
+* Bad, because one-time engineering effort to rewrite five services
+* Bad, because two language ecosystems to maintain
+
+### Option 2 — Optimise Python images
+
+* Good, because no rewrite needed
+* Good, because multi-stage builds and dependency trimming can reduce images by 30–50%
+* Bad, because Python runtime + interpreter overhead remains (~200 MB floor)
+* Bad, because memory and startup improvements are marginal
+* Bad, because `handler-base` dependency tree is difficult to slim without breaking shared code
+
+### Option 3 — Keep current stack
+
+* Good, because zero effort
+* Bad, because images remain 500–700 MB for glue code
+* Bad, because resource waste reduces headroom for ML workloads
+* Bad, because slow cold starts limit KEDA autoscaling effectiveness
+
+## Implementation Plan
+
+### Phase 1: `handler-base` Go Module (COMPLETE)
+
+Published as `git.daviestechlabs.io/daviestechlabs/handler-base` v1.0.0 with:
+
+| Package | Purpose | Python Equivalent |
+|---------|---------|-------------------|
+| `natsutil/` | NATS publish/request/decode with protobuf encoding | `handler_base.nats_client` |
+| `health/` | HTTP health + readiness server | `handler_base.health` |
+| `telemetry/` | OTel traces + metrics setup | `handler_base.telemetry` |
+| `config/` | Env-based configuration (struct tags) | `handler_base.config` (pydantic-settings) |
+| `clients/` | HTTP clients for LLM, embeddings, reranker, STT, TTS | `handler_base.clients` |
+| `handler/` | Typed NATS message handler with OTel + health wiring | `handler_base.handler` |
+| `messages/` | Type aliases from generated protobuf stubs | `handler_base.messages` |
+| `gen/messagespb/` | protoc-generated Go stubs (21 message types) | — |
+| `proto/messages/v1/` | `.proto` schema source | — |
+
+### Phase 2: Service Ports (COMPLETE)
+
+All five services rewritten in Go and migrated to handler-base v1.0.0 with protobuf wire format:
+
+| Order | Service | Status | Notes |
+|-------|---------|--------|-------|
+| 1 | `pipeline-bridge` | ✅ Done | NATS + HTTP + k8s API calls. Parameters changed to `map[string]string`. |
+| 2 | `tts-module` | ✅ Done | NATS ↔ HTTP bridge. `[]*TTSVoiceInfo` pointer slices, `int32` casts. |
+| 3 | `chat-handler` | ✅ Done | Core text pipeline. `EffectiveQuery()` standalone func, `int32(TopK)`. |
+| 4 | `voice-assistant` | ✅ Done | Same pattern with `[]*DocumentSource` pointer slices. |
+| 5 | `stt-module` | ✅ Done | HTTP-forwarding variant. `SessionId`/`SpeakerId` field renames, `int32(Sequence)`. |
+
+`companions-frontend` also migrated: 129-line duplicate type definitions replaced with type aliases from handler-base/messages.
+
+### Phase 3: Cleanup (COMPLETE)
+
+* ~~Archive Python versions of ported services~~ — Python handler-base remains for Ray Serve/Kubeflow
+* CI pipelines use `golangci-lint` v2 with errcheck, govet, staticcheck, misspell, bodyclose, nilerr
+* All repos pass `golangci-lint run ./...` and `go test ./...`
+* Wire format upgraded from MessagePack to Protocol Buffers (ADR-0004 superseded)
+
+### What Stays in Python
+
+| Repository | Reason |
+|------------|--------|
+| `ray-serve` | PyTorch, vLLM, sentence-transformers — core ML inference |
+| `kuberay-images` | GPU runtime Docker images (ROCm, CUDA, IPEX) |
+| `gradio-ui` | Gradio is Python-only; dev/testing tool, not production |
+| `kubeflow/` | Kubeflow Pipelines SDK is Python-only |
+| `mlflow/` | MLflow SDK integration (tracking + model registry) |
+| `stt-module` (local Whisper variant) | PyTorch + openai-whisper on GPU |
+| `spark-analytics-jobs` | PySpark (being replaced by Flink anyway) |
+
+## Links
+
+* Related: [ADR-0003](0003-use-nats-for-messaging.md) — NATS as messaging backbone
+* Related: [ADR-0004](0004-use-messagepack-for-nats.md) — MessagePack binary encoding
+* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
+* Related: [ADR-0013](0013-gitea-actions-for-ci.md) — Gitea Actions CI
+* Related: [ADR-0014](0014-docker-build-best-practices.md) — Docker build best practices
+* Related: [ADR-0019](0019-handler-deployment-strategy.md) — Handler deployment strategy
+* Related: [ADR-0024](0024-ray-repository-structure.md) — Ray repository structure
+* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend (Go reference)
+* Related: [ADR-0051](0051-keda-event-driven-autoscaling.md) — KEDA autoscaling
--- a/decisions/0062-blender-mcp-3d-avatar-workflow.md
+++ b/decisions/0062-blender-mcp-3d-avatar-workflow.md
@@ -0,0 +1,448 @@
+# BlenderMCP for 3D Avatar Creation via Kasm Workstation
+
+* Status: superseded by [ADR-0063](0063-comfyui-3d-avatar-pipeline.md)
+* Date: 2026-02-21
+* Deciders: Billy
+* Technical Story: Enable AI-assisted 3D avatar creation for companions-frontend using BlenderMCP in a Kasm Blender workstation with VS Code, storing assets in S3, serving locally from gravenhollow NFS and remotely via Cloudflare-cached RustFS
+
+## Context and Problem Statement
+
+The companions-frontend serves VRM avatar models for its Three.js-based 3D character rendering (see [ADR-0046](0046-companions-frontend-architecture.md)). Today the avatar library is limited to three models (`Seed-san.vrm`, `Aka.vrm`, `Midori.vrm`) — only one of which actually ships in the repo — and every model must be sourced or hand-sculpted externally.
+
+Creating custom VRM avatars is a manual, time-intensive process: open Blender, sculpt/rig a character, export to VRM, iterate. There is no integration between the AI coding workflow (VS Code / Copilot) and Blender, so context switching between the editor and the 3D tool is constant.
+
+How do we streamline custom 3D avatar creation for companions-frontend with AI assistance, while keeping assets durable and accessible across workstations?
+
+## Decision Drivers
+
+* The existing avatar pipeline is manual and disconnected from the development workflow
+* BlenderMCP (v1.5.5, 17k+ GitHub stars) bridges AI assistants to Blender via the Model Context Protocol — enabling prompt-driven 3D modelling, material control, scene manipulation, and code execution inside Blender
+* Kasm Workspaces already run in the cluster (`productivity` namespace) and support Docker-in-Docker with volume plugins for persistent storage
+* VS Code supports MCP servers natively (GitHub Copilot agent mode), meaning the same editor used for code can drive Blender scene creation
+* Custom volume mounts in Kasm map `/s3` to S3-compatible storage via the rclone Docker volume plugin — providing durable, off-node persistence
+* Quobyte S3-compatible endpoint with the `kasm` bucket is the existing Kasm storage backend
+* VRM models must ultimately land in the companions-frontend `/assets/models/` path at build time or be served from an external URL
+* Final production models and animations should live on gravenhollow (all-SSD TrueNAS, dual 10GbE) for fast local serving via NFS
+* Remote users accessing companions-chat through Cloudflare Tunnel need a CDN-cached path for multi-MB VRM downloads
+* Models are write-once/read-many — ideal for aggressive caching
+* gravenhollow already runs RustFS (S3-compatible) — exposing it via Cloudflare Tunnel gives CDN caching without a separate storage tier
+
+## Considered Options
+
+1. **BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3 (`kasm` bucket)**
+2. **Local Blender + BlenderMCP on a developer laptop**
+3. **Hyper3D / Rodin cloud generation only (no Blender)**
+4. **Manual Blender workflow (status quo)**
+
+## Decision Outcome
+
+Chosen option: **Option 1 — BlenderMCP in Kasm Blender workstation + VS Code MCP client, assets in Quobyte S3**, because it integrates AI-assisted modelling directly into the existing Kasm + VS Code workflow, stores assets durably in S3, and requires no additional infrastructure beyond what is already deployed.
+
+### Positive Consequences
+
+* AI-assisted 3D modelling — prompt-driven creation, material application, and scene manipulation inside Blender via MCP
+* Zero context switching — VS Code agent mode drives Blender commands through the same editor used for code
+* Persistent storage — VRM exports written to `/s3` survive session teardown and are available from any Kasm session or CI pipeline
+* Existing infrastructure — Kasm agent, DinD, rclone volume plugin, Quobyte S3, gravenhollow NFS, and Cloudflare are all already deployed
+* No image rebuild for new models — VRM files live on gravenhollow NFS, mounted read-only into the pod; add a model and update the allowlist
+* LAN performance — all-SSD NFS with dual 10GbE delivers VRM files in <100ms
+* Remote performance — RustFS exposed through Cloudflare Tunnel with CDN caching at 300+ global PoPs; no separate storage tier needed
+* Poly Haven / Hyper3D integration — BlenderMCP supports downloading Poly Haven assets and generating models via Hyper3D Rodin, expanding the asset library
+* VRM ecosystem — Blender VRM add-on exports directly to VRM 0.x/1.0 format consumed by `@pixiv/three-vrm` in companions-frontend
+* Reproducible — Kasm workspace images are versioned; Blender + add-ons are pre-baked
+
+### Negative Consequences
+
+* BlenderMCP `execute_blender_code` tool runs arbitrary Python in Blender — must trust AI-generated code or review before execution
+* Socket-based communication (TCP 9876) between the MCP server and Blender add-on adds a failure mode
+* VRM export quality depends on correct rigging/weight painting — AI can scaffold but manual touch-up may still be needed
+* Kasm Blender image must be configured with both the BlenderMCP add-on and the VRM add-on pre-installed
+* Telemetry is on by default in BlenderMCP — must disable via `DISABLE_TELEMETRY=true` for privacy
+* Cache misses from remote users hit gravenhollow via the tunnel — negligible with immutable files and long TTLs
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                          Developer Workstation                          │
+│                                                                         │
+│  ┌──────────────────────────────────┐                                   │
+│  │         VS Code (local)          │                                   │
+│  │                                  │                                   │
+│  │  GitHub Copilot (agent mode)     │                                   │
+│  │         │                        │                                   │
+│  │         ▼                        │                                   │
+│  │  BlenderMCP Server (MCP)         │                                   │
+│  │  (uvx blender-mcp)              │                                   │
+│  │         │                        │                                   │
+│  └─────────┼────────────────────────┘                                   │
+│            │ TCP :9876 (JSON over socket)                                │
+└────────────┼────────────────────────────────────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│               Kasm Blender Workstation (browser session)                │
+│               kasm.daviestechlabs.io                                    │
+│                                                                         │
+│  ┌──────────────────────────────────────────────────────┐              │
+│  │                   Blender 4.x                        │              │
+│  │                                                      │              │
+│  │  Add-ons:                                            │              │
+│  │   • BlenderMCP (addon.py) — socket server :9876      │              │
+│  │   • VRM Add-on for Blender — import/export VRM       │              │
+│  │                                                      │              │
+│  │  ┌────────────────────────────────────────────────┐  │              │
+│  │  │  /s3/blender-avatars/                          │  │              │
+│  │  │  ├── projects/          (.blend source files)  │  │              │
+│  │  │  ├── exports/           (.vrm exported models) │  │              │
+│  │  │  └── textures/          (shared texture lib)   │  │              │
+│  │  └────────────────────────────────────────────────┘  │              │
+│  └──────────────────────────────────────────────────────┘              │
+│                          │                                              │
+│                    rclone volume                                        │
+│                    plugin (S3)                                          │
+└──────────────────────────┼──────────────────────────────────────────────┘
+                           │
+                           ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                    Quobyte S3 Endpoint                                   │
+│                    Bucket: kasm                                          │
+│                                                                         │
+│  kasm/blender-avatars/projects/Companion-A.blend                        │
+│  kasm/blender-avatars/exports/Companion-A.vrm                           │
+│  kasm/blender-avatars/textures/skin-tone-01.png                         │
+└──────────────────────────┬──────────────────────────────────────────────┘
+                           │
+                    rclone sync (promotion)
+                           │
+                           ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│            gravenhollow.lab.daviestechlabs.io                            │
+│            (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB)            │
+│                                                                         │
+│  NFS: /mnt/gravenhollow/kubernetes/avatar-models/                       │
+│  ├── Seed-san.vrm          (default model)                              │
+│  ├── Aka.vrm               (Legend tier)                                │
+│  ├── Midori.vrm            (Legend tier)                                │
+│  ├── Companion-A.vrm       (custom, promoted from Kasm S3)             │
+│  └── animations/           (shared animation clips)                     │
+│                                                                         │
+│  S3 (RustFS): avatar-models bucket                                      │
+│  (same data as NFS dir, served via S3 API for Cloudflare Tunnel)       │
+└──────────┬─────────────────────────────────┬────────────────────────────┘
+           │                                 │
+     NFS mount (nfs-fast)              S3 API (RustFS :30292)
+     for pod volume                    via Cloudflare Tunnel
+           │                                 │
+           ▼                                 ▼
+┌──────────────────────────┐   ┌──────────────────────────────────────────┐
+│  companions-frontend     │   │       Cloudflare Tunnel + CDN            │
+│  (Kubernetes pod)        │   │                                          │
+│                          │   │  assets.daviestechlabs.io                │
+│  /models/ volume mount   │   │    → envoy-external                      │
+│  (nfs-fast PVC, RO)     │   │    → avatar-assets-svc (in-cluster)      │
+│                          │   │    → gravenhollow RustFS :30292          │
+│  Go FileServer:          │   │                                          │
+│  /assets/models/ →       │   │  Cloudflare CDN caches at 300+ PoPs     │
+│    serves from PVC       │   │  Cache-Control: public, max-age=31536000 │
+│                          │   │  (immutable, versioned filenames)        │
+└──────────┬───────────────┘   └──────────────────────┬───────────────────┘
+           │                                          │
+     LAN clients                              Remote clients
+  companions-chat.lab...                   companions-chat via
+  (envoy-internal, direct)                 Cloudflare Tunnel
+           │                                          │
+           └──────────────────┬───────────────────────┘
+                              ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                         Browser (Three.js)                               │
+│  AvatarManager.loadModel('/assets/models/Companion-A.vrm')              │
+│                                                                         │
+│  LAN:    fetch from companions-frontend pod (NFS-backed, ~10GbE)       │
+│  Remote: fetch from assets.daviestechlabs.io (Cloudflare CDN-cached)   │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+## Workflow
+
+### 1. Kasm Workspace Setup
+
+The Kasm Blender workspace image is configured with:
+
+| Component | Version | Purpose |
+|-----------|---------|---------|
+| Blender | 4.x | 3D modelling and sculpting |
+| BlenderMCP add-on (`addon.py`) | 1.5.5 | Socket server for MCP commands |
+| VRM Add-on for Blender | latest | Import/export VRM format |
+| Python | 3.10+ | Blender scripting runtime |
+
+The Kasm storage mapping mounts `/s3` via the rclone Docker volume plugin to the Quobyte S3 endpoint (`kasm` bucket). The sub-path `blender-avatars/` is used for all 3D asset work.
+
+### 2. VS Code MCP Configuration
+
+Add BlenderMCP as an MCP server in VS Code (`.vscode/mcp.json` or user settings):
+
+```json
+{
+  "servers": {
+    "blender": {
+      "command": "uvx",
+      "args": ["blender-mcp"],
+      "env": {
+        "BLENDER_HOST": "localhost",
+        "BLENDER_PORT": "9876",
+        "DISABLE_TELEMETRY": "true"
+      }
+    }
+  }
+}
+```
+
+When the Kasm session is accessed remotely, set `BLENDER_HOST` to the Kasm workstation's reachable address.
+
+### 3. Avatar Creation Workflow
+
+1. **Launch** the Kasm Blender workspace via `kasm.daviestechlabs.io`
+2. **Enable** the BlenderMCP add-on in Blender → 3D View sidebar → "BlenderMCP" tab → "Connect to Claude"
+3. **Open VS Code** with Copilot agent mode and the BlenderMCP MCP server running
+4. **Prompt** the AI to create or modify avatars:
+   - _"Create a humanoid character with anime-style proportions, blue hair, and a fantasy outfit"_
+   - _"Apply a metallic gold material to the armor pieces"_
+   - _"Set up the lighting for a character showcase render"_
+   - _"Rig this character for VRM export with standard humanoid bones"_
+5. **Export** the finished model to VRM via the VRM add-on (or via BlenderMCP `execute_blender_code` calling the VRM export operator)
+6. **Save** the `.vrm` to `/s3/blender-avatars/exports/` and the `.blend` source to `/s3/blender-avatars/projects/`
+7. **Import** the VRM into companions-frontend — copy to `assets/models/`, update the allowlists in `internal/database/database.go` and `static/js/avatar.js`
+
+### 4. Asset Pipeline (Kasm S3 → gravenhollow → production)
+
+| Stage | Action |
+|-------|--------|
+| **Create** | AI-assisted modelling + VRM export in Kasm Blender → `/s3/blender-avatars/exports/*.vrm` |
+| **Store** | rclone syncs `/s3` to Quobyte S3 `kasm` bucket automatically |
+| **Promote** | `rclone copy quobyte:kasm/blender-avatars/exports/Model.vrm gravenhollow-nfs:/avatar-models/` (manual or CI) |
+| **Register** | Add model path to `AllowedAvatarModels` in Go and JS allowlists, commit to repo |
+| **Deploy** | Flux rolls out updated companions-frontend config; model already available on NFS PVC — no image rebuild needed |
+| **CDN** | Model immediately available via `assets.daviestechlabs.io` — Cloudflare Tunnel proxies to RustFS, CDN caches at edge |
+
+### 5. Deployment and Storage Architecture
+
+#### Local Serving (LAN users)
+
+Companions-frontend currently serves VRM models via `http.FileServer(http.Dir("assets"))` from the container filesystem. This bakes models into the image and requires a rebuild to add new avatars.
+
+The new approach mounts avatar models from gravenhollow via an `nfs-fast` PVC:
+
+```yaml
+# PersistentVolumeClaim for avatar models
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: avatar-models
+  namespace: ai-ml
+spec:
+  storageClassName: nfs-fast
+  accessModes: [ReadOnlyMany]
+  resources:
+    requests:
+      storage: 10Gi
+```
+
+The pod mounts this PVC at `/models` and the Go server serves it at `/assets/models/`:
+
+```go
+// Replace embedded assets with NFS-backed volume
+mux.Handle("/assets/models/", http.StripPrefix("/assets/models/",
+    http.FileServer(http.Dir("/models"))))
+```
+
+Benefits:
+- **No image rebuild** to add/update models — write to gravenhollow NFS, pod sees it immediately (with `actimeo=600` cache, within 10 minutes)
+- **All-SSD + dual 10GbE** — VRM files (typically 5–30 MB) load in <100ms on LAN
+- **ReadOnlyMany** — multiple replicas can share the same PVC
+- Source `.blend` files and textures remain on Quobyte S3 (Kasm bucket) for the creation workflow; only promoted VRM exports land on gravenhollow
+
+#### Remote Serving (Cloudflare-cached RustFS)
+
+Companions-chat is accessed externally via Cloudflare Tunnel → `envoy-internal`. Rather than duplicating assets to a separate storage tier (e.g., Cloudflare R2), gravenhollow's RustFS S3 endpoint is exposed directly through the Cloudflare Tunnel with a dedicated hostname. Cloudflare's CDN automatically caches responses at edge PoPs — since VRM files are immutable with year-long TTLs, virtually all requests are served from cache.
+
+| | |
+|---|---|
+| **Origin** | gravenhollow RustFS `avatar-models` bucket (`:30292`, same data as NFS dir) |
+| **Public hostname** | `assets.daviestechlabs.io` (Cloudflare DNS, orange-clouded) |
+| **Tunnel routing** | Cloudflare Tunnel → `envoy-external` → `avatar-assets-svc` → gravenhollow RustFS |
+| **CDN caching** | Cloudflare CDN caches at 300+ global PoPs; `Cache-Control: public, max-age=31536000, immutable` |
+| **Egress** | Cloudflare-proxied traffic has no bandwidth surcharge |
+| **Auth** | Public read (models are not sensitive); RustFS write credentials stay internal |
+| **No sync needed** | Single source of truth — NFS and RustFS serve the same data from gravenhollow |
+
+##### In-Cluster Proxy Service
+
+An ExternalName or Endpoints service proxies cluster traffic to gravenhollow's RustFS endpoint so the HTTPRoute can reference it:
+
+```yaml
+# Service pointing to gravenhollow RustFS for avatar assets
+apiVersion: v1
+kind: Service
+metadata:
+  name: avatar-assets
+  namespace: ai-ml
+spec:
+  type: ExternalName
+  externalName: gravenhollow.lab.daviestechlabs.io
+  ports:
+    - port: 30292
+      protocol: TCP
+```
+
+##### HTTPRoute (Cloudflare Tunnel → RustFS)
+
+```yaml
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+  name: avatar-assets
+  namespace: ai-ml
+  annotations:
+    external-dns.alpha.kubernetes.io/hostname: assets.daviestechlabs.io
+spec:
+  hostnames:
+    - assets.daviestechlabs.io
+  parentRefs:
+    - name: envoy-external
+      namespace: network
+  rules:
+    - matches:
+        - path:
+            type: PathPrefix
+            value: /avatar-models/
+      backendRefs:
+        - name: avatar-assets
+          port: 30292
+      filters:
+        - type: ResponseHeaderModifier
+          responseHeaderModifier:
+            set:
+              - name: Cache-Control
+                value: "public, max-age=31536000, immutable"
+              - name: Access-Control-Allow-Origin
+                value: "https://companions-chat.daviestechlabs.io"
+```
+
+Cloudflare Tunnel picks up `assets.daviestechlabs.io` via the existing wildcard ingress rule (`*.daviestechlabs.io → envoy-external`). The CDN caches based on the `Cache-Control` header — after the first request per PoP, all subsequent loads are served from Cloudflare's edge.
+
+##### Client-Side Routing
+
+The frontend detects whether the user is on LAN or remote and routes model fetches accordingly:
+
+```javascript
+// avatar.js — model URL resolution
+function resolveModelURL(path) {
+    // LAN users: serve from the Go server (NFS-backed, same origin)
+    // Remote users: serve from Cloudflare-cached RustFS
+    const isLAN = location.hostname.endsWith('.lab.daviestechlabs.io');
+    if (isLAN) return path; // e.g. /assets/models/Companion-A.vrm
+    return `https://assets.daviestechlabs.io/avatar-models/${path.split('/').pop()}`;  
+    // → https://assets.daviestechlabs.io/avatar-models/Companion-A.vrm
+}
+```
+
+Alternatively, the Go server can set the model base URL via a template variable based on the `Host` header, keeping the logic server-side.
+
+#### Versioning Strategy
+
+VRM files are immutable once promoted — updated models get a new filename (e.g., `Companion-A-v2.vrm`) rather than overwriting. This ensures:
+- Cloudflare CDN cache never serves stale content
+- Rollback is trivial — point the allowlist back to the previous version
+- Browser `Cache-Control: immutable` works correctly
+
+#### Storage Tier Summary
+
+| Location | Purpose | Tier | Access |
+|----------|---------|------|--------|
+| Quobyte S3 (`kasm` bucket) | Working files: `.blend`, textures, WIP exports | Kasm rclone volume | Kasm sessions only |
+| gravenhollow NFS (`/avatar-models/`) | Production VRM models + animations | `nfs-fast` PVC (RO) | companions-frontend pod, LAN |
+| gravenhollow RustFS S3 (`avatar-models`) | Same data as NFS, exposed to Cloudflare Tunnel for remote users | S3 API via HTTPRoute | Cloudflare CDN-cached, global |
+
+## BlenderMCP Capabilities Used
+
+| MCP Tool | Avatar Workflow Use |
+|----------|-------------------|
+| `get_scene_info` | Inspect current scene before modifications |
+| `create_object` | Scaffold base meshes for characters |
+| `modify_object` | Adjust proportions, positions, bone placement |
+| `set_material` | Apply skin, hair, clothing materials |
+| `execute_blender_code` | Run VRM export scripts, batch operations, custom rigging |
+| `get_screenshot` | AI reviews viewport to understand current state |
+| `poly_haven_download` | Fetch HDRIs, textures for environment/materials |
+| `hyper3d_generate` | Generate base 3D models from text prompts via Hyper3D Rodin |
+
+## Security Considerations
+
+* **Code execution:** BlenderMCP's `execute_blender_code` runs arbitrary Python in Blender. The Kasm session is sandboxed (DinD container with no cluster access), limiting blast radius. Always save before executing AI-generated code.
+* **Telemetry:** BlenderMCP collects anonymous telemetry by default. Disabled via `DISABLE_TELEMETRY=true` in the MCP server config.
+* **Network:** The TCP socket (port 9876) between the MCP server and Blender add-on is local to the session. If accessed remotely, ensure the connection is tunnelled or restricted.
+* **S3 credentials:** rclone volume plugin credentials are managed via Kasm storage mappings and the existing `kasm-agent` ExternalSecret — no new secrets required.
+* **RustFS exposure:** The `avatar-models` RustFS bucket is exposed read-only through Cloudflare Tunnel. RustFS write credentials remain internal. The HTTPRoute only routes GET requests to the bucket path — no write operations are reachable externally.
+* **Public assets:** Avatar models are public assets (served to any authenticated companions-chat user). No sensitive data in VRM files. CORS restricts to `companions-chat.daviestechlabs.io` origin.
+* **Model allowlist:** Even though models are served from NFS/R2, the server-side and client-side allowlists in companions-frontend gate which models users can actually select. Uploading a VRM to gravenhollow does not make it available without a code change.
+
+## Pros and Cons of the Options
+
+### Option 1 — BlenderMCP in Kasm + VS Code + Quobyte S3 + gravenhollow (NFS + RustFS via Cloudflare)
+
+* Good, because AI-assisted modelling reduces manual effort for avatar creation
+* Good, because assets persist in S3 across sessions and are accessible from CI
+* Good, because no new infrastructure — Kasm, rclone, Quobyte, gravenhollow, Cloudflare Tunnel are all already deployed
+* Good, because VS Code MCP integration means one editor for code and 3D work
+* Good, because Kasm sandboxes Blender execution away from the cluster
+* Good, because NFS-fast serving decouples model assets from container images (no rebuild to add models)
+* Good, because RustFS through Cloudflare Tunnel provides CDN caching with zero additional storage tiers — no R2 bucket, no sync CronJob, no extra credentials
+* Good, because single source of truth — gravenhollow serves both LAN (NFS) and remote (RustFS → Cloudflare CDN) from the same data
+* Good, because immutable versioned filenames enable aggressive caching and trivial rollback
+* Good, because models are available to remote users immediately after promotion (no sync delay)
+* Bad, because BlenderMCP is a third-party tool with arbitrary code execution
+* Bad, because socket communication adds latency for remote Kasm sessions
+* Bad, because VRM rigging quality may require manual adjustment after AI scaffolding
+* Bad, because cache misses hit gravenhollow via the tunnel (negligible with immutable files + long TTLs)
+
+### Option 2 — Local Blender + BlenderMCP on developer laptop
+
+* Good, because lowest latency (everything local)
+* Good, because no Kasm dependency
+* Bad, because assets are local — no durable S3 storage without manual sync
+* Bad, because Blender + add-ons must be installed on every dev machine
+* Bad, because not reproducible across machines
+
+### Option 3 — Hyper3D / Rodin cloud generation only
+
+* Good, because no Blender installation needed
+* Good, because fully prompt-driven model generation
+* Bad, because limited control over output — no fine-tuning materials, rigging, or proportions
+* Bad, because Hyper3D free tier has daily generation limits
+* Bad, because generated models require post-processing for VRM compliance (humanoid rig, expressions, visemes)
+* Bad, because vendor dependency for a core asset pipeline
+
+### Option 4 — Manual Blender workflow (status quo)
+
+* Good, because full manual control
+* Good, because no new tooling
+* Bad, because slow — no AI assistance for repetitive modelling tasks
+* Bad, because no integration with the development workflow
+* Bad, because assets stored ad-hoc with no structured pipeline to companions-frontend
+
+## Links
+
+* Related to [ADR-0046](0046-companions-frontend-architecture.md) (companions-frontend architecture — Three.js + VRM avatars)
+* Related to [ADR-0026](0026-storage-strategy.md) (storage strategy — gravenhollow NFS-fast, Quobyte S3, rclone)
+* Related to [ADR-0044](0044-dns-and-external-access.md) (DNS and external access — Cloudflare Tunnel, split-horizon)
+* Related to [ADR-0049](0049-self-hosted-productivity-suite.md) (Kasm Workspaces)
+* Related to [ADR-0059](0059-mac-mini-ray-worker.md) (waterdeep as local AI agent — primary 3D creation workstation with Metal GPU)
+* [BlenderMCP GitHub](https://github.com/ahujasid/blender-mcp)
+* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
+* [VRM Specification](https://vrm.dev/en/)
+* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader used in companions-frontend)
+* [Poly Haven](https://polyhaven.com/) (free 3D assets, HDRIs, textures)
+* [Hyper3D Rodin](https://hyper3d.ai/) (AI 3D model generation)
+* [Cloudflare Tunnel Docs](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/)
+* [Cloudflare CDN Cache Rules](https://developers.cloudflare.com/cache/)
--- a/decisions/0063-comfyui-3d-avatar-pipeline.md
+++ b/decisions/0063-comfyui-3d-avatar-pipeline.md
@@ -0,0 +1,483 @@
+# ComfyUI Image-to-3D Avatar Pipeline with TRELLIS + UniRig
+
+* Status: proposed
+* Date: 2026-02-24
+* Deciders: Billy
+* Technical Story: Replace the manual BlenderMCP 3D avatar creation workflow with an automated, GPU-accelerated image-to-rigged-3D-model pipeline using ComfyUI, TRELLIS 2-4B, and UniRig — running on a personal desktop (NVIDIA RTX 4070) as an on-demand Ray worker, with direct MLflow logging and rclone asset promotion
+
+## Context and Problem Statement
+
+The companions-frontend serves VRM avatar models for Three.js-based 3D character rendering ([ADR-0046](0046-companions-frontend-architecture.md)). The previous approach ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md)) proposed using BlenderMCP in a Kasm workstation or on waterdeep ([ADR-0059](0059-mac-mini-ray-worker.md)) for AI-assisted avatar creation. While BlenderMCP bridges VS Code to Blender, the workflow is fundamentally **interactive and manual** — an operator must prompt the AI, review each sculpting step, and hand-tune rigging and VRM export. This is slow, non-reproducible, and doesn't scale.
+
+Meanwhile, the state of the art in image-to-3D generation has matured significantly:
+
+- **TRELLIS** (Microsoft, CVPR'25 Spotlight, 12k+ GitHub stars) generates high-quality textured 3D meshes from a single image in seconds using Structured 3D Latents (SLAT) — with models up to 2B parameters
+- **UniRig** (Tsinghua/Tripo, SIGGRAPH'25, 1.4k+ GitHub stars) automatically generates topologically valid skeletons and skinning weights for arbitrary 3D models using autoregressive transformers — the first model to rig humans, animals, and objects with a single unified framework
+- **ComfyUI-3D-Pack** (3.7k+ GitHub stars) provides battle-tested ComfyUI nodes for TRELLIS, 3D Gaussian Splatting, mesh processing, and GLB/VRM export — enabling node-graph-based automation without custom code
+
+Together, these tools enable a fully automated **image → 3D mesh → rigged model → VRM** pipeline that eliminates manual Blender work for the common case, produces reproducible results, and integrates with the existing MLflow + Ray stack.
+
+A personal desktop (Ryzen 9 7950X, 64 GB DDR5, NVIDIA RTX 4070 12 GB VRAM) running Arch Linux is available as an **on-demand external Ray worker** — it won't be a permanent cluster member (it's not running Talos), but can join the Ray cluster via `ray start` when 3D generation workloads need to run. This adds a 5th GPU to the fleet specifically for 3D generation, without disrupting the stable inference allocations.
+
+How do we build an automated, reproducible image-to-VRM pipeline that leverages the desktop's CUDA GPU and integrates with the existing AI/ML platform for experiment tracking and asset serving?
+
+## Decision Drivers
+
+* BlenderMCP workflow from ADR-0062 is interactive and non-reproducible — every avatar requires an operator in the loop
+* TRELLIS generates production-quality textured meshes from a single reference image in ~30 seconds on a 12 GB GPU
+* UniRig automatically rigs arbitrary 3D models with skeleton + skinning weights — no manual weight painting
+* ComfyUI-3D-Pack bundles TRELLIS, mesh processing, and GLB export as composable nodes — enabling visual pipeline authoring
+* The desktop's RTX 4070 (12 GB VRAM) meets TRELLIS's 16 GB minimum when using fp16/attention optimizations, and exceeds UniRig's 8 GB requirement
+* The desktop can join/leave the Ray cluster on demand — no permanent infrastructure commitment
+* MLflow tracks generation parameters, quality metrics, and output artifacts for reproducibility — the desktop logs directly to the cluster's MLflow service over HTTP
+* waterdeep (Mac Mini M4 Pro) remains available for interactive Blender touch-up on models that need manual refinement
+* VRM export, asset promotion to gravenhollow, and serving architecture from ADR-0062 remain valid and are reused
+
+## Considered Options
+
+1. **ComfyUI + TRELLIS + UniRig on desktop Ray worker, with direct MLflow logging and rclone promotion**
+2. **BlenderMCP interactive workflow** (ADR-0062, superseded)
+3. **Cloud-hosted 3D generation (Hyper3D Rodin, Meshy, etc.)**
+4. **Run TRELLIS + UniRig directly as Ray Serve deployments in-cluster**
+
+## Decision Outcome
+
+Chosen option: **Option 1 — ComfyUI + TRELLIS + UniRig on desktop Ray worker**, because it automates the entire image-to-rigged-model pipeline without operator interaction, leverages purpose-built state-of-the-art models (TRELLIS for generation, UniRig for rigging), and uses the desktop's RTX 4070 as on-demand GPU capacity without disrupting the stable inference cluster. ComfyUI's visual node graph provides the pipeline orchestration directly on the desktop — no Kubernetes-side orchestrator needed since all compute is local to one machine.
+
+waterdeep retains its role as an interactive Blender workstation for manual refinement of auto-generated models when needed — but the expectation is that most avatars pass through the automated pipeline without manual touch-up.
+
+### Positive Consequences
+
+* **Fully automated pipeline** — image → textured mesh → rigged model → VRM with no operator in the loop
+* **Reproducible** — same image + seed produces identical output; parameters tracked in MLflow
+* **Fast** — TRELLIS generates a mesh in ~30s, UniRig rigs it in ~60s; end-to-end under 5 minutes including VRM export
+* **On-demand GPU** — desktop joins Ray cluster only when needed; no standing resource cost
+* **Composable** — ComfyUI node graph can be extended with additional 3D processing nodes (Hunyuan3D, TripoSG, Stable3DGen) without code changes
+* **Quality** — TRELLIS (CVPR'25) and UniRig (SIGGRAPH'25) represent current state of the art
+* **MLflow integration** — generation parameters, mesh quality metrics, and output artifacts are logged directly to the cluster's MLflow service over HTTP
+* **Simple orchestration** — ComfyUI node graph handles the pipeline; no Kubernetes-side orchestrator needed for a single-GPU linear workflow
+* **Reuses existing serving architecture** — gravenhollow NFS + RustFS CDN serving from ADR-0062 is unchanged
+* **waterdeep fallback** — interactive Blender + BlenderMCP on waterdeep for models needing hand-tuning
+
+### Negative Consequences
+
+* Desktop must be powered on and `ray start` must be run manually to participate in the pipeline
+* TRELLIS requires NVIDIA CUDA — cannot run on the existing AMD/Intel GPU fleet (khelben, drizzt, danilo)
+* ComfyUI adds a Python dependency stack (PyTorch, CUDA, spconv, flash-attn) to maintain on the desktop
+* RTX 4070 has 12 GB VRAM — large TRELLIS models (2B params) may require fp16 + attention optimization; the 1.2B image-to-3D model fits comfortably
+* Auto-generated VRM models may still need manual expression/viseme morph targets for full companions-frontend lip-sync support
+* Desktop is not managed by GitOps/Kubernetes — Ansible or manual setup
+
+## Pros and Cons of the Options
+
+### Option 1 — ComfyUI + TRELLIS + UniRig on desktop Ray worker
+
+* Good, because fully automated image-to-VRM pipeline eliminates manual sculpting
+* Good, because TRELLIS (CVPR'25) and UniRig (SIGGRAPH'25) are state-of-the-art, MIT-licensed
+* Good, because ComfyUI-3D-Pack provides tested node implementations — no custom TRELLIS integration code
+* Good, because desktop GPU is free/idle capacity with no cluster impact
+* Good, because MLflow integration reuses existing experiment tracking infrastructure
+* Good, because ComfyUI can queue and batch-generate multiple avatars unattended
+* Bad, because desktop availability is not guaranteed (must be manually started)
+* Bad, because CUDA-only — doesn't leverage the existing ROCm/Intel fleet
+* Bad, because auto-rigging quality varies by model topology — some models may need manual refinement
+
+### Option 2 — BlenderMCP interactive workflow (ADR-0062)
+
+* Good, because maximum creative control via VS Code + Copilot
+* Good, because Kasm provides browser-based access from anywhere
+* Bad, because every avatar requires an operator in the loop — slow and non-reproducible
+* Bad, because Blender sculpting from scratch is time-intensive even with AI assistance
+* Bad, because Kasm runs Blender CPU-only (no GPU acceleration inside DinD)
+* Bad, because no MLflow tracking or reproducibility
+
+### Option 3 — Cloud-hosted 3D generation
+
+* Good, because no local GPU required
+* Good, because some services (Meshy, Hyper3D Rodin) offer API access
+* Bad, because vendor dependency for a core asset pipeline
+* Bad, because free tiers have daily limits; paid tiers add recurring cost
+* Bad, because limited control over output quality, rigging, and VRM compliance
+* Bad, because data leaves the homelab network
+
+### Option 4 — TRELLIS + UniRig as in-cluster Ray Serve deployments
+
+* Good, because fully integrated with existing Ray cluster
+* Good, because no desktop dependency
+* Bad, because TRELLIS requires NVIDIA CUDA — no CUDA GPUs in-cluster have enough VRAM (elminster has 8 GB, needs 12–16 GB)
+* Bad, because would require purchasing new in-cluster NVIDIA hardware
+* Bad, because 3D generation is batch/occasional, not real-time serving — Ray Serve's always-on model is wasteful
+* Bad, because TRELLIS's CUDA dependencies (spconv, flash-attn, nvdiffrast, kaolin) conflict with existing Ray worker images
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                  Kubeflow Pipelines (namespace: kubeflow)                     │
+│                                                                              │
+│  ┌────────────────────────────────────────────────────────────────────────┐  │
+│  │              3d_avatar_generation_pipeline                              │  │
+│  │                                                                        │  │
+│  │  1. prepare_reference    Load/generate reference image from prompt     │  │
+│  │         │                (optional: use vLLM + Stable Diffusion)       │  │
+│  │         ▼                                                              │  │
+│  │  2. generate_3d_mesh     Submit RayJob → desktop ComfyUI worker        │  │
+│  │         │                TRELLIS image-large (1.2B) → GLB mesh         │  │
+│  │         ▼                                                              │  │
+│  │  3. auto_rig             Submit RayJob → desktop UniRig worker         │  │
+│  │         │                UniRig skeleton + skinning → rigged FBX/GLB   │  │
+│  │         ▼                                                              │  │
+│  │  4. convert_to_vrm       Blender CLI (headless) on desktop or cluster  │  │
+│  │         │                Import rigged GLB → configure VRM metadata    │  │
+│  │         ▼                → export .vrm                                 │  │
+│  │  5. validate_vrm         Check humanoid rig, expressions, visemes      │  │
+│  │         │                                                              │  │
+│  │         ▼                                                              │  │
+│  │  6. promote_to_storage   rclone copy → gravenhollow RustFS S3          │  │
+│  │         │                                                              │  │
+│  │         ▼                                                              │  │
+│  │  7. log_to_mlflow        Parameters, metrics, artifacts → MLflow       │  │
+│  └────────────────────────────────────────────────────────────────────────┘  │
+└──────────────────────────────────────┬──────────────────────────────────────┘
+                                       │
+                           RayJob CR (ephemeral)
+                                       │
+                                       ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│  desktop (Arch Linux · Ryzen 9 7950X · 64 GB DDR5 · RTX 4070 12 GB)        │
+│  On-demand Ray worker (ray start --address=<ray-head>:6379)                 │
+│                                                                              │
+│  ┌───────────────────────────────────────────────────────────────────────┐  │
+│  │                     ComfyUI + Custom Nodes                            │  │
+│  │                                                                       │  │
+│  │  ComfyUI-3D-Pack:                                                     │  │
+│  │   • TRELLIS image-large (1.2B) — image → textured GLB mesh           │  │
+│  │   • Mesh processing nodes — simplify, UV unwrap, texture bake         │  │
+│  │   • 3D preview — viewport render for quality check                    │  │
+│  │   • GLB/OBJ/PLY export                                               │  │
+│  │                                                                       │  │
+│  │  UniRig:                                                              │  │
+│  │   • Skeleton prediction — autoregressive bone hierarchy               │  │
+│  │   • Skinning weights — bone-point cross-attention                     │  │
+│  │   • Merge — skeleton + skin + original mesh → rigged model            │  │
+│  │   • Supports GLB, FBX, OBJ input/output                              │  │
+│  │                                                                       │  │
+│  │  Blender 4.x (headless CLI):                                          │  │
+│  │   • VRM Add-on for Blender — GLB → VRM conversion                    │  │
+│  │   • Humanoid rig mapping, expression morphs, viseme config            │  │
+│  │   • Batch export via bpy scripting                                    │  │
+│  └───────────────────────────────────────────────────────────────────────┘  │
+│                                                                              │
+│  GPU: NVIDIA RTX 4070 12 GB (CUDA 12.x)                                    │
+│  Ray: worker node with resource label {"nvidia_gpu": 1, "rtx4070": 1}       │
+│  Storage: ~/comfyui-3d/ (working dir), rclone → gravenhollow S3             │
+└──────────────────────────────────┬──────────────────────────────────────────┘
+                                   │
+                             rclone (S3)
+                                   │
+                                   ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│            gravenhollow.lab.daviestechlabs.io                                │
+│            (TrueNAS Scale · All-SSD · Dual 10GbE · 12.2 TB)                │
+│                                                                              │
+│  NFS: /mnt/gravenhollow/kubernetes/avatar-models/                            │
+│  ├── Seed-san.vrm          (default model)                                  │
+│  ├── Generated-A-v1.vrm    (auto-generated via pipeline)                    │
+│  └── animations/           (shared animation clips)                          │
+│                                                                              │
+│  S3 (RustFS): avatar-models bucket                                          │
+│  (same data, served via Cloudflare Tunnel for remote users)                 │
+└──────────────────────────┬──────────────────────────────────────────────────┘
+                           │
+              ┌────────────┴───────────────┐
+              │                            │
+        NFS (nfs-fast PVC)          Cloudflare Tunnel
+              │                     (assets.daviestechlabs.io)
+              ▼                            │
+┌──────────────────────────┐               ▼
+│  companions-frontend     │   ┌──────────────────────────┐
+│  (Kubernetes pod)        │   │  Remote users (CDN-cached │
+│  LAN users               │   │  via Cloudflare edge)     │
+└──────────────────────────┘   └──────────────────────────┘
+```
+
+### Ray Cluster Integration
+
+The desktop joins the existing KubeRay-managed cluster as an external worker. It is **not** a Talos node and not managed by Kubernetes — it connects to the Ray head node's GCS port directly:
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                     Ray Cluster (KubeRay RayService)                         │
+│                                                                              │
+│  Head: Ray head pod (in-cluster)                                            │
+│  GCS port: 6379 (exposed via NodePort or LoadBalancer)                      │
+│                                                                              │
+│  In-Cluster Workers (permanent, managed by KubeRay):                        │
+│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐                      │
+│  │ khelben  │ │elminster │ │  drizzt  │ │  danilo  │                      │
+│  │Strix Halo│ │RTX 2070  │ │Radeon 680│ │Intel Arc │                      │
+│  │  ROCm    │ │  CUDA    │ │  ROCm    │ │  Intel   │                      │
+│  │ /llm     │ │/whisper  │ │/embeddings│ │/reranker │                      │
+│  │          │ │  /tts    │ │          │ │          │                      │
+│  └──────────┘ └──────────┘ └──────────┘ └──────────┘                      │
+│                                                                              │
+│  External Worker (on-demand, self-managed):                                 │
+│  ┌──────────────────────────────────────────────────┐                      │
+│  │  desktop (Arch Linux, external)                   │                      │
+│  │  RTX 4070 12 GB · CUDA                            │                      │
+│  │  ComfyUI + TRELLIS + UniRig + Blender CLI         │                      │
+│  │  Resource labels: {"nvidia_gpu": 1, "3d_gen": 1}  │                      │
+│  │  Joins via: ray start --address=<head>:6379       │                      │
+│  └──────────────────────────────────────────────────┘                      │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+The existing inference deployments (`/llm`, `/whisper`, `/tts`, `/embeddings`, `/reranker`) are unaffected — they are pinned to their respective in-cluster GPU nodes via Ray resource labels. The desktop's `3d_gen` resource label ensures only 3D generation RayJobs get scheduled there.
+
+### Ray Service Multiplexing
+
+The desktop's RTX 4070 can **time-share between inference overflow and 3D generation** when idle. When no 3D generation jobs are queued, the desktop can optionally serve as overflow capacity for inference workloads:
+
+| Mode | When | What runs on desktop |
+|------|------|---------------------|
+| **3D generation** | ComfyUI workflow triggered (manually or via API) | ComfyUI + TRELLIS → UniRig → Blender VRM export |
+| **Inference overflow** | Manually enabled, high-traffic periods | vLLM (secondary), Whisper, or TTS replica |
+| **Idle** | Desktop powered on, no jobs | Ray worker connected but idle (0 resource cost) |
+
+Mode switching is managed by Ray's resource scheduling — 3D jobs request `{"3d_gen": 1}` and inference jobs request their specific GPU labels. When the desktop is off, all workloads continue on the existing in-cluster fleet with no impact.
+
+## Implementation Plan
+
+### 1. Desktop Environment Setup
+
+```bash
+# Install NVIDIA drivers + CUDA toolkit (Arch Linux)
+sudo pacman -S nvidia nvidia-utils cuda cudnn
+
+# Install Python environment (uv per ADR-0012)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Create project directory
+mkdir -p ~/comfyui-3d && cd ~/comfyui-3d
+
+# Install ComfyUI
+git clone https://github.com/comfyanonymous/ComfyUI.git
+cd ComfyUI
+uv venv --python 3.11
+source .venv/bin/activate
+uv pip install -r requirements.txt
+
+# Install ComfyUI-3D-Pack (includes TRELLIS nodes)
+cd custom_nodes
+git clone https://github.com/MrForExample/ComfyUI-3D-Pack.git
+cd ComfyUI-3D-Pack
+uv pip install -r requirements.txt
+python install.py
+
+# Install UniRig
+cd ~/comfyui-3d
+git clone https://github.com/VAST-AI-Research/UniRig.git
+cd UniRig
+uv pip install torch torchvision
+uv pip install -r requirements.txt
+uv pip install spconv-cu124  # Match CUDA version
+uv pip install flash-attn --no-build-isolation
+
+# Install Blender (headless CLI for VRM export)
+sudo pacman -S blender
+# Install VRM Add-on
+python -c "import bpy, os; bpy.ops.preferences.addon_install(filepath=os.path.abspath('UniRig/blender/add-on-vrm-v2.20.77_modified.zip'))"
+
+# Install rclone for asset promotion
+sudo pacman -S rclone
+rclone config create gravenhollow s3 \
+    provider=Other \
+    endpoint=https://gravenhollow.lab.daviestechlabs.io:30292 \
+    access_key_id=<key> \
+    secret_access_key=<secret>
+
+# Install Ray for cluster joining
+uv pip install "ray[default]"
+```
+
+### 2. Ray Worker Configuration
+
+```bash
+# Join the Ray cluster on demand
+# Ray head GCS port must be exposed (NodePort 30637 or similar)
+ray start \
+    --address=<ray-head-external-ip>:6379 \
+    --num-cpus=16 \
+    --num-gpus=1 \
+    --resources='{"3d_gen": 1, "rtx4070": 1}' \
+    --node-name=desktop
+
+# Verify connection
+ray status  # Should show desktop as a connected worker
+```
+
+The Ray head's GCS port needs to be reachable from the desktop. Options:
+- **NodePort**: Expose port 6379 as a NodePort (e.g., 30637) on a cluster node
+- **Tailscale/WireGuard**: If the desktop is on a different network segment
+- **Direct LAN**: If desktop and cluster are on the same 192.168.100.0/24 subnet
+
+### 3. ComfyUI Workflow (Node Graph)
+
+The ComfyUI workflow JSON defines the image-to-GLB pipeline:
+
+```
+[Load Image] → [TRELLIS Image-to-3D] → [Mesh Simplify] → [Texture Bake]
+                                                              │
+                                                              ▼
+                                                       [Save GLB]
+                                                              │
+                                                              ▼
+                                               [UniRig Skeleton Prediction]
+                                                              │
+                                                              ▼
+                                               [UniRig Skinning Weights]
+                                                              │
+                                                              ▼
+                                               [UniRig Merge (rigged model)]
+                                                              │
+                                                              ▼
+                                               [Blender VRM Export (CLI)]
+                                                              │
+                                                              ▼
+                                               [Save VRM → ~/comfyui-3d/exports/]
+```
+
+Key TRELLIS parameters exposed:
+- `sparse_structure_sampler_params.steps`: 12 (default)
+- `sparse_structure_sampler_params.cfg_strength`: 7.5
+- `slat_sampler_params.steps`: 12
+- `slat_sampler_params.cfg_strength`: 3.0
+- `simplify`: 0.95 (triangle reduction ratio)
+- `texture_size`: 1024
+
+### 4. MLflow Experiment Tracking
+
+The desktop logs directly to the cluster's MLflow service over HTTP. Set `MLFLOW_TRACKING_URI` in the ComfyUI environment or in a post-generation logging script:
+
+```bash
+export MLFLOW_TRACKING_URI=http://<mlflow-service>:5000
+```
+
+Each generation run logs to a dedicated MLflow experiment:
+
+| What | MLflow Concept | Content |
+|------|---------------|---------|
+| Reference image | Artifact | `reference.png` |
+| TRELLIS parameters | Params | seed, cfg_strength, steps, simplify, texture_size |
+| UniRig parameters | Params | skeleton_seed |
+| Raw mesh | Artifact | `{name}_raw.glb` (pre-rigging) |
+| Rigged model | Artifact | `{name}_rigged.glb` (post-rigging) |
+| Final VRM | Artifact | `{name}.vrm` |
+| Mesh quality | Metrics | vertex_count, face_count, texture_resolution |
+| Rig quality | Metrics | bone_count, skinning_weight_coverage |
+| Pipeline duration | Metrics | trellis_time_s, unirig_time_s, total_time_s |
+
+### 5. VRM Export Script (Blender CLI)
+
+```python
+#!/usr/bin/env python3
+"""vrm_export.py — Headless Blender script for GLB→VRM conversion."""
+import bpy
+import sys
+
+argv = sys.argv[sys.argv.index("--") + 1:]
+input_glb = argv[0]
+output_vrm = argv[1]
+avatar_name = argv[2] if len(argv) > 2 else "Generated Avatar"
+
+# Clear scene
+bpy.ops.wm.read_factory_settings(use_empty=True)
+
+# Import rigged GLB
+bpy.ops.import_scene.gltf(filepath=input_glb)
+
+# Select armature
+armature = next(obj for obj in bpy.data.objects if obj.type == 'ARMATURE')
+bpy.context.view_layer.objects.active = armature
+
+# Configure VRM metadata
+armature["vrm_addon_extension"] = {
+    "spec_version": "1.0",
+    "vrm0": {
+        "meta": {
+            "title": avatar_name,
+            "author": "DaviesTechLabs Pipeline",
+            "allowedUserName": "Everyone",
+        }
+    }
+}
+
+# Export VRM
+bpy.ops.export_scene.vrm(filepath=output_vrm)
+print(f"Exported VRM: {output_vrm}")
+```
+
+Invoked via:
+```bash
+blender --background --python vrm_export.py -- input.glb output.vrm "Avatar Name"
+```
+
+### 6. Asset Promotion (Reuses ADR-0062 Architecture)
+
+The VRM serving architecture from ADR-0062 is preserved unchanged:
+
+| Stage | Action |
+|-------|--------|
+| **Generate** | Automated pipeline: image → TRELLIS → UniRig → VRM |
+| **Promote** | `rclone copy ~/comfyui-3d/exports/{name}.vrm gravenhollow:avatar-models/` |
+| **Register** | Add model path to `AllowedAvatarModels` in companions-frontend Go + JS allowlists |
+| **Deploy** | Flux rolls out config; model already on NFS PVC — no image rebuild |
+| **CDN** | Cloudflare Tunnel → RustFS → CDN cache at 300+ edge PoPs |
+
+## Model Requirements and VRAM Budget
+
+| Component | Model Size | VRAM Required | Notes |
+|-----------|-----------|---------------|-------|
+| TRELLIS image-large | 1.2B params | ~10 GB (fp16) | Image-to-3D, best quality |
+| TRELLIS text-xlarge | 2.0B params | ~14 GB (fp16) | Text-to-3D, optional |
+| UniRig skeleton | ~350M params | ~4 GB | Autoregressive skeleton prediction |
+| UniRig skinning | ~350M params | ~4 GB | Bone-point cross-attention |
+| Blender CLI | N/A | CPU only | Headless VRM export |
+
+**RTX 4070 budget (12 GB):** Models are loaded sequentially (not concurrently) — TRELLIS runs first, output is saved to disk, then UniRig loads for rigging. Peak VRAM usage is ~10 GB during TRELLIS inference. The desktop's 64 GB system RAM provides ample buffer for model loading and mesh processing.
+
+## Security Considerations
+
+* **Ray GCS port exposure**: The Ray head's port 6379 must be reachable from the desktop. Use a NodePort with network policy restricting source IPs to the desktop's address, or use a WireGuard/Tailscale tunnel.
+* **No cluster credentials on desktop**: The desktop runs Ray worker processes and ComfyUI only — it has no `kubeconfig` or Kubernetes API access. Generation is triggered locally via ComfyUI's UI or API, not from the cluster.
+* **Model provenance**: TRELLIS and UniRig checkpoints are downloaded from Hugging Face (Microsoft and VAST-AI orgs respectively). Pin checkpoint hashes in the setup script.
+* **ComfyUI network**: ComfyUI's web UI (port 8188) should be bound to localhost only when not in use. It is not exposed to the cluster.
+* **rclone credentials**: gravenhollow RustFS write credentials stored in `~/.config/rclone/rclone.conf` with `600` permissions.
+* **Generated content**: Auto-generated 3D models inherit no licensing restrictions (TRELLIS and UniRig are both MIT-licensed).
+
+## Future Considerations
+
+* **Kubeflow pipeline for model refinement**: When iterating on existing models (re-rigging, parameter sweeps, A/B testing generation backends), a Kubeflow pipeline can orchestrate multi-step refinement workflows with artifact lineage, caching, and retries — submitting RayJobs to the desktop worker via the existing KFP + RayJob pattern from [ADR-0058](0058-training-strategy-cpu-dgx-spark.md)
+* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, could run TRELLIS + UniRig in-cluster with dedicated GPU, eliminating desktop dependency
+* **Stable3DGen / Hunyuan3D alternatives**: ComfyUI-3D-Pack supports multiple generation backends — can A/B test quality via MLflow metrics
+* **VRM expression morphs**: Investigate automated viseme and expression blendshape generation for full lip-sync support without manual Blender work
+* **ComfyUI API mode**: ComfyUI supports headless API-only execution (`--listen 0.0.0.0 --port 8188`) — a script or future Kubeflow pipeline can submit workflows via HTTP POST to `/prompt`
+* **Text-to-3D**: Use the cluster's vLLM instance to generate a character description, then Stable Diffusion (on desktop) to create a reference image, feeding into TRELLIS — fully text-to-avatar pipeline
+* **Batch generation**: Schedule overnight batch runs via CronWorkflow to generate avatar libraries from curated reference images
+* **In-cluster migration**: If a 16+ GB NVIDIA GPU is added to the cluster (e.g., via DGX Spark or RTX 5070), migrate TRELLIS + UniRig to a dedicated Ray Serve deployment for always-available generation
+
+## Links
+
+* Supersedes: [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) — BlenderMCP for 3D avatar creation (interactive workflow)
+* Updates: [ADR-0059](0059-mac-mini-ray-worker.md) — waterdeep retains Blender role for manual refinement only
+* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend architecture (Three.js + VRM avatars)
+* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
+* Related: [ADR-0005](0005-multi-gpu-strategy.md) — Multi-GPU heterogeneous strategy
+* Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy (Kubeflow + RayJob pattern for future pipeline work)
+* Related: [ADR-0047](0047-mlflow-experiment-tracking.md) — MLflow experiment tracking
+* Related: [ADR-0026](0026-storage-strategy.md) — Storage strategy (gravenhollow NFS-fast, RustFS S3)
+* [Microsoft TRELLIS](https://github.com/microsoft/TRELLIS) — Structured 3D Latents for Scalable 3D Generation (CVPR'25 Spotlight)
+* [VAST-AI UniRig](https://github.com/VAST-AI-Research/UniRig) — One Model to Rig Them All (SIGGRAPH'25)
+* [ComfyUI-3D-Pack](https://github.com/MrForExample/ComfyUI-3D-Pack) — Extensive 3D node suite for ComfyUI
+* [VRM Add-on for Blender](https://vrm-addon-for-blender.info/en/)
+* [@pixiv/three-vrm](https://github.com/pixiv/three-vrm) (runtime loader in companions-frontend)
--- a/decisions/0064-waterdeep-coding-agent.md
+++ b/decisions/0064-waterdeep-coding-agent.md
@@ -0,0 +1,445 @@
+# waterdeep (Mac Mini M4 Pro) as Dedicated Coding Agent with Fine-Tuned Model
+
+* Status: proposed
+* Date: 2026-02-26
+* Deciders: Billy
+* Technical Story: Repurpose waterdeep as a dedicated local coding agent serving a fine-tuned code-completion model for OpenCode, Copilot Chat, and other AI coding tools, with a pipeline for continually tuning the model on the homelab codebase
+
+## Context and Problem Statement
+
+**waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory ([ADR-0059](0059-mac-mini-ray-worker.md)). Its current role as a 3D avatar creation workstation ([ADR-0059](0059-mac-mini-ray-worker.md)) is being superseded by the automated ComfyUI pipeline ([ADR-0063](0063-comfyui-3d-avatar-pipeline.md)), which handles avatar generation on a personal desktop as an on-demand Ray worker. This frees waterdeep for a higher-value use case.
+
+GitHub Copilot and cloud-hosted coding assistants work well for general code, but they have no knowledge of DaviesTechLabs-specific patterns: the handler-base module API, NATS protobuf message conventions, Kubeflow pipeline structure, Ray Serve deployment patterns, Flux/Kustomize layout, or the Go handler lifecycle used across chat-handler, voice-assistant, pipeline-bridge, stt-module, and tts-module. A model fine-tuned on the homelab codebase would produce completions that follow project conventions out of the box.
+
+With 48 GB of unified memory and no other workloads, waterdeep can serve **Qwen 2.5 Coder 32B Instruct** at Q8_0 quantisation (~34 GB) via MLX with ample headroom for KV cache, leaving the machine responsive for the inference server and macOS overhead. This is the largest purpose-built coding model that fits at high quantisation on this hardware, and it consistently outperforms general-purpose 70B models at Q4 on coding benchmarks.
+
+How should we configure waterdeep as a dedicated coding agent and build a pipeline for fine-tuning the model on our codebase?
+
+## Decision Drivers
+
+* waterdeep's 48 GB unified memory is fully available — no competing workloads after ComfyUI pipeline takeover
+* Qwen 2.5 Coder 32B Instruct is the highest-quality open-source coding model that fits at Q8_0 (~34 GB weights + ~10 GB KV cache headroom)
+* MLX on Apple Silicon provides native Metal-accelerated inference with no framework overhead — purpose-built for M-series chips
+* OpenCode and VS Code Copilot Chat both support OpenAI-compatible API endpoints — a local server is a drop-in replacement
+* The homelab codebase has strong conventions (handler-base, protobuf messages, Kubeflow pipelines, Ray Serve apps, Flux GitOps) that a general model doesn't know
+* Existing training infrastructure ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) provides Kubeflow Pipelines + MLflow + S3 data flow for fine-tuning orchestration
+* LoRA adapters are small (~50–200 MB) and can be merged into the base model or hot-swapped in mlx-lm-server
+* The cluster's CPU training capacity (126 cores, 378 GB RAM across 14 nodes) can prepare training datasets; waterdeep itself can run the LoRA fine-tune on its Metal GPU
+
+## Considered Options
+
+1. **Qwen 2.5 Coder 32B Instruct (Q8_0) via mlx-lm-server on waterdeep** — fine-tuned with LoRA on the homelab codebase using MLX
+2. **Llama 3.1 70B Instruct (Q4_K_M) via llama.cpp on waterdeep** — larger general-purpose model at aggressive quantisation
+3. **DeepSeek Coder V2 Lite 16B via MLX on waterdeep** — smaller coding model, lower resource usage
+4. **Keep using cloud Copilot only** — no local model, no fine-tuning
+
+## Decision Outcome
+
+Chosen option: **Option 1 — Qwen 2.5 Coder 32B Instruct (Q8_0) via mlx-lm-server**, because it is the best-in-class open-source coding model at a quantisation level that preserves near-full quality, fits comfortably within the 48 GB memory budget with room for KV cache, and MLX provides the optimal inference stack for Apple Silicon. Fine-tuning with LoRA on the homelab codebase will specialise the model to project conventions.
+
+### Positive Consequences
+
+* Purpose-built coding model — Qwen 2.5 Coder 32B tops open-source coding benchmarks (HumanEval, MBPP, BigCodeBench)
+* Q8_0 quantisation preserves >99% of full-precision quality — minimal degradation vs Q4
+* ~34 GB model weights + ~10 GB KV cache headroom = comfortable fit in 48 GB unified memory
+* MLX inference leverages Metal GPU for token generation — fast enough for interactive coding assistance
+* OpenAI-compatible API via mlx-lm-server — works with OpenCode, VS Code Copilot Chat (custom endpoint), Continue.dev, and any OpenAI SDK client
+* Fine-tuned LoRA adapter teaches project-specific patterns: handler-base API, NATS message conventions, Kubeflow pipeline structure, Flux layout
+* LoRA fine-tuning runs directly on waterdeep using mlx-lm — no cluster resources needed for training
+* Adapter files are small (~50–200 MB) — easy to version in Gitea and track in MLflow
+* Fully offline — no cloud dependency, no data leaves the network
+* Frees Copilot quota for non-coding tasks — local model handles bulk code completion
+
+### Negative Consequences
+
+* waterdeep is dedicated to this role — cannot simultaneously serve other workloads (Blender, etc.)
+* Model updates require manual download and conversion to MLX format
+* LoRA fine-tuning quality depends on training data curation — garbage in, garbage out
+* 32B model is slower than cloud Copilot for very long completions — acceptable for interactive use
+* Single point of failure — if waterdeep is down, fall back to cloud Copilot
+
+## Pros and Cons of the Options
+
+### Option 1: Qwen 2.5 Coder 32B Instruct (Q8_0) via MLX
+
+* Good, because purpose-built for code — trained on 5.5T tokens of code data
+* Good, because 32B at Q8_0 (~34 GB) fits in 48 GB with KV cache headroom
+* Good, because Q8_0 preserves near-full quality (vs Q4 which drops noticeably on coding tasks)
+* Good, because MLX is Apple's native framework — zero-copy unified memory, Metal GPU kernels
+* Good, because mlx-lm supports LoRA fine-tuning natively — train and serve on the same machine
+* Good, because OpenAI-compatible API (mlx-lm-server) — drop-in for any coding tool
+* Bad, because 32B generates ~15–25 tokens/sec on M4 Pro — adequate but not instant for long outputs
+* Bad, because MLX model format requires conversion from HuggingFace (one-time, scripted)
+
+### Option 2: Llama 3.1 70B Instruct (Q4_K_M) via llama.cpp
+
+* Good, because 70B is a larger, more capable general model
+* Good, because llama.cpp is mature and well-supported on macOS
+* Bad, because Q4_K_M quantisation loses meaningful quality — especially on code tasks where precision matters
+* Bad, because ~42 GB weights leaves only ~6 GB for KV cache — tight, risks OOM on long contexts
+* Bad, because general-purpose model — not trained specifically for code, underperforms Qwen 2.5 Coder 32B on coding benchmarks despite being 2× larger
+* Bad, because slower token generation (~8–12 tok/s) due to larger model size
+* Bad, because llama.cpp doesn't natively support LoRA fine-tuning — need a separate training framework
+
+### Option 3: DeepSeek Coder V2 Lite 16B via MLX
+
+* Good, because smaller model — faster inference (~30–40 tok/s), lighter memory footprint
+* Good, because still a capable coding model
+* Bad, because significantly less capable than Qwen 2.5 Coder 32B on benchmarks
+* Bad, because leaves 30+ GB of unified memory unused — not maximising the hardware
+* Bad, because fewer parameters mean less capacity to absorb fine-tuning knowledge
+
+### Option 4: Cloud Copilot only
+
+* Good, because zero local infrastructure to maintain
+* Good, because always up-to-date with latest model improvements
+* Bad, because no knowledge of homelab-specific conventions — completions require heavy editing
+* Bad, because cloud latency for every completion
+* Bad, because data (code context) leaves the network
+* Bad, because wastes waterdeep's 48 GB of unified memory sitting idle
+
+## Architecture
+
+### Inference Server
+
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│  waterdeep (Mac Mini M4 Pro · 48 GB unified · Metal GPU · dedicated)    │
+│                                                                          │
+│  ┌────────────────────────────────────────────────────────────────────┐  │
+│  │  mlx-lm-server (launchd-managed)                                   │  │
+│  │                                                                    │  │
+│  │  Model: Qwen2.5-Coder-32B-Instruct (Q8_0, MLX format)             │  │
+│  │  LoRA:  ~/.mlx-models/adapters/homelab-coder/latest/               │  │
+│  │                                                                    │  │
+│  │  Endpoint: http://waterdeep.lab.daviestechlabs.io:8080/v1          │  │
+│  │  ├── /v1/completions         (code completion, FIM)                │  │
+│  │  ├── /v1/chat/completions    (chat / instruct)                     │  │
+│  │  └── /v1/models              (model listing)                       │  │
+│  │                                                                    │  │
+│  │  Memory: ~34 GB model + ~10 GB KV cache = ~44 GB                   │  │
+│  └────────────────────────────────────────────────────────────────────┘  │
+│                                                                          │
+│  ┌─────────────────────────┐  ┌──────────────────────────────────────┐  │
+│  │  macOS overhead ~3 GB    │  │  Training (on-demand, same GPU)      │  │
+│  │  (kernel, WindowServer,  │  │  mlx-lm LoRA fine-tune               │  │
+│  │   mDNSResponder, etc.)   │  │  (server stopped during training)    │  │
+│  └─────────────────────────┘  └──────────────────────────────────────┘  │
+└──────────────────────────────────────────────────────────────────────────┘
+         │
+         │ HTTP :8080 (OpenAI-compatible API)
+         │
+    ┌────┴──────────────────────────────────────────────────────┐
+    │                                                            │
+    ▼                                                            ▼
+┌─────────────────────────────┐    ┌─────────────────────────────────────┐
+│  VS Code (any machine)      │    │  OpenCode (terminal, any machine)   │
+│                              │    │                                     │
+│  Copilot Chat / Continue.dev │    │  OPENCODE_MODEL_PROVIDER=openai     │
+│  Custom endpoint →           │    │  OPENAI_API_BASE=                   │
+│  waterdeep:8080/v1           │    │    http://waterdeep:8080/v1         │
+└─────────────────────────────┘    └─────────────────────────────────────┘
+```
+
+### Fine-Tuning Pipeline
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                        Fine-Tuning Pipeline (Kubeflow)                      │
+│                                                                             │
+│  Trigger: weekly cron or manual (after significant codebase changes)        │
+│                                                                             │
+│  ┌──────────────┐    ┌──────────────────┐    ┌────────────────────────┐    │
+│  │ 1. Clone repos│    │ 2. Build training │    │ 3. Upload dataset to   │    │
+│  │    from Gitea │───▶│    dataset        │───▶│    S3                  │    │
+│  │    (all repos)│    │    (instruction   │    │    training-data/      │    │
+│  │               │    │     pairs + FIM)  │    │    code-finetune/      │    │
+│  └──────────────┘    └──────────────────┘    └──────────┬─────────────┘    │
+│                                                          │                  │
+│  ┌──────────────────────────────────────────────────────┐│                  │
+│  │ 4. Trigger LoRA fine-tune on waterdeep               ││                  │
+│  │    (SSH or webhook → mlx-lm lora on Metal GPU)       │◀                  │
+│  │                                                      │                   │
+│  │    Base: Qwen2.5-Coder-32B-Instruct (MLX Q8_0)      │                   │
+│  │    Method: LoRA (r=16, alpha=32)                     │                   │
+│  │    Data: instruction pairs + fill-in-middle samples  │                   │
+│  │    Epochs: 3–5                                       │                   │
+│  │    Output: adapter weights (~50–200 MB)              │                   │
+│  └──────────────────────┬───────────────────────────────┘                   │
+│                         │                                                    │
+│  ┌──────────────────────▼───────────────────────────────┐                   │
+│  │ 5. Evaluate adapter                                   │                   │
+│  │    • HumanEval pass@1 (baseline vs fine-tuned)        │                   │
+│  │    • Project-specific eval (handler-base patterns,    │                   │
+│  │      Kubeflow pipeline templates, Flux manifests)     │                   │
+│  └──────────────────────┬───────────────────────────────┘                   │
+│                         │                                                    │
+│  ┌──────────────────────▼───┐  ┌────────────────────────────────────────┐   │
+│  │ 6. Push adapter to Gitea │  │ 7. Log metrics to MLflow               │   │
+│  │    code-lora-adapters    │  │    experiment: waterdeep-coder-finetune │   │
+│  │    repo (versioned)      │  │    metrics: eval_loss, humaneval,       │   │
+│  └──────────────────────────┘  │             project_specific_score      │   │
+│                                └────────────────────────────────────────┘   │
+│                                                                             │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │ 8. Deploy adapter on waterdeep                                      │    │
+│  │    • Pull latest adapter from Gitea                                 │    │
+│  │    • Restart mlx-lm-server with --adapter-path pointing to new ver  │    │
+│  │    • Smoke test: send test completion requests                      │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Training Data Preparation
+
+The training dataset is built from all DaviesTechLabs repositories:
+
+| Source | Format | Purpose |
+|--------|--------|---------|
+| Go handlers (chat-handler, voice-assistant, etc.) | Instruction pairs | Teach handler-base API patterns, NATS message handling, protobuf encoding |
+| Kubeflow pipelines (kubeflow/*.py) | Instruction pairs | Teach pipeline structure, KFP component patterns, S3 data flow |
+| Ray Serve apps (ray-serve/) | Instruction pairs | Teach Ray Serve deployment, vLLM config, model serving patterns |
+| Flux manifests (homelab-k8s2/) | Instruction pairs | Teach HelmRelease, Kustomization, namespace layout |
+| Argo workflows (argo/*.yaml) | Instruction pairs | Teach WorkflowTemplate patterns, NATS triggers |
+| ADRs (homelab-design/decisions/) | Instruction pairs | Teach architecture rationale and decision format |
+| All source files | Fill-in-middle (FIM) | Teach code completion with project-specific context |
+
+**Instruction pair example (Go handler):**
+
+```json
+{
+  "instruction": "Create a new NATS handler module that bridges to an external gRPC service, following the handler-base pattern used in chat-handler and voice-assistant.",
+  "output": "package main\n\nimport (\n\t\"context\"\n\t\"os\"\n\t\"os/signal\"\n\t\"syscall\"\n\n\t\"git.daviestechlabs.io/daviestechlabs/handler-base/config\"\n\t\"git.daviestechlabs.io/daviestechlabs/handler-base/handler\"\n\t\"git.daviestechlabs.io/daviestechlabs/handler-base/health\"\n\t..."
+}
+```
+
+**Fill-in-middle example:**
+
+```json
+{
+  "prefix": "func (h *Handler) HandleMessage(ctx context.Context, msg *messages.UserMessage) (*messages.AssistantMessage, error) {\n\t",
+  "suffix": "\n\treturn response, nil\n}",
+  "middle": "response, err := h.client.Complete(ctx, msg.Content)\n\tif err != nil {\n\t\treturn nil, fmt.Errorf(\"completion failed: %w\", err)\n\t}"
+}
+```
+
+## Implementation Plan
+
+### 1. Model Setup
+
+```bash
+# Install MLX and mlx-lm via uv (per ADR-0012)
+uv tool install mlx-lm
+
+# Download and convert Qwen 2.5 Coder 32B Instruct to MLX Q8_0 format
+mlx_lm.convert \
+  --hf-path Qwen/Qwen2.5-Coder-32B-Instruct \
+  --mlx-path ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
+  --quantize \
+  --q-bits 8
+
+# Verify model loads and generates
+mlx_lm.generate \
+  --model ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
+  --prompt "def fibonacci(n: int) -> int:"
+```
+
+### 2. Inference Server (launchd)
+
+```bash
+# Start the server manually first to verify
+mlx_lm.server \
+  --model ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
+  --adapter-path ~/.mlx-models/adapters/homelab-coder/latest \
+  --host 0.0.0.0 \
+  --port 8080
+
+# Verify OpenAI-compatible endpoint
+curl http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "qwen2.5-coder-32b",
+    "messages": [{"role": "user", "content": "Write a Go handler using handler-base that processes NATS messages"}],
+    "max_tokens": 512
+  }'
+```
+
+**launchd plist** (`~/Library/LaunchAgents/io.daviestechlabs.mlx-coder.plist`):
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
+  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+  <key>Label</key>
+  <string>io.daviestechlabs.mlx-coder</string>
+  <key>ProgramArguments</key>
+  <array>
+    <string>/Users/billy/.local/bin/mlx_lm.server</string>
+    <string>--model</string>
+    <string>/Users/billy/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8</string>
+    <string>--adapter-path</string>
+    <string>/Users/billy/.mlx-models/adapters/homelab-coder/latest</string>
+    <string>--host</string>
+    <string>0.0.0.0</string>
+    <string>--port</string>
+    <string>8080</string>
+  </array>
+  <key>RunAtLoad</key>
+  <true/>
+  <key>KeepAlive</key>
+  <true/>
+  <key>StandardOutPath</key>
+  <string>/Users/billy/.mlx-models/logs/server.log</string>
+  <key>StandardErrorPath</key>
+  <string>/Users/billy/.mlx-models/logs/server.err</string>
+</dict>
+</plist>
+```
+
+```bash
+# Load the service
+launchctl load ~/Library/LaunchAgents/io.daviestechlabs.mlx-coder.plist
+
+# Verify it's running
+launchctl list | grep mlx-coder
+curl http://waterdeep.lab.daviestechlabs.io:8080/v1/models
+```
+
+### 3. Client Configuration
+
+**OpenCode** (`~/.config/opencode/config.json` on any dev machine):
+
+```json
+{
+  "provider": "openai",
+  "model": "qwen2.5-coder-32b",
+  "baseURL": "http://waterdeep.lab.daviestechlabs.io:8080/v1"
+}
+```
+
+**VS Code** (settings.json — Continue.dev extension):
+
+```json
+{
+  "continue.models": [
+    {
+      "title": "waterdeep-coder",
+      "provider": "openai",
+      "model": "qwen2.5-coder-32b",
+      "apiBase": "http://waterdeep.lab.daviestechlabs.io:8080/v1",
+      "apiKey": "not-needed"
+    }
+  ]
+}
+```
+
+### 4. Fine-Tuning on waterdeep (MLX LoRA)
+
+```bash
+# Prepare training data (run on cluster via Kubeflow, or locally)
+# Output: train.jsonl and valid.jsonl in chat/instruction format
+
+# Fine-tune with LoRA using mlx-lm
+mlx_lm.lora \
+  --model ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
+  --train \
+  --data ~/.mlx-models/training-data/homelab-coder/ \
+  --adapter-path ~/.mlx-models/adapters/homelab-coder/$(date +%Y%m%d)/ \
+  --lora-layers 16 \
+  --batch-size 1 \
+  --iters 1000 \
+  --learning-rate 1e-5 \
+  --val-batches 25 \
+  --save-every 100
+
+# Evaluate the adapter
+mlx_lm.generate \
+  --model ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
+  --adapter-path ~/.mlx-models/adapters/homelab-coder/$(date +%Y%m%d)/ \
+  --prompt "Create a new Go NATS handler using handler-base that..."
+
+# Update the 'latest' symlink
+ln -sfn ~/.mlx-models/adapters/homelab-coder/$(date +%Y%m%d) \
+        ~/.mlx-models/adapters/homelab-coder/latest
+
+# Restart the server to pick up new adapter
+launchctl kickstart -k gui/$(id -u)/io.daviestechlabs.mlx-coder
+```
+
+### 5. Training Data Pipeline (Kubeflow)
+
+A new `code_finetune_pipeline.py` orchestrates dataset preparation on the cluster:
+
+```
+ code_finetune_pipeline.yaml
+       │
+       ├── 1. clone_repos           Clone all DaviesTechLabs repos from Gitea
+       ├── 2. extract_patterns      Parse Go, Python, YAML files into instruction pairs
+       ├── 3. generate_fim          Create fill-in-middle samples from source files
+       ├── 4. deduplicate           Remove near-duplicate samples (MinHash)
+       ├── 5. format_dataset        Convert to mlx-lm JSONL format (train + validation split)
+       ├── 6. upload_to_s3          Push dataset to s3://training-data/code-finetune/{run_id}/
+       └── 7. log_to_mlflow         Log dataset stats (num_samples, token_count, repo_coverage)
+```
+
+The actual LoRA fine-tune runs on waterdeep (not the cluster) because:
+- mlx-lm LoRA leverages the M4 Pro's Metal GPU — significantly faster than CPU training
+- The model is already loaded on waterdeep — no need to transfer 34 GB to/from the cluster
+- Training a 32B model with LoRA requires ~40 GB — only waterdeep and khelben have enough memory
+
+### 6. Memory Budget
+
+| Component | Memory |
+|-----------|--------|
+| macOS + system services | ~3 GB |
+| Qwen 2.5 Coder 32B (Q8_0 weights) | ~34 GB |
+| KV cache (8192 context) | ~6 GB |
+| mlx-lm-server overhead | ~1 GB |
+| **Total (inference)** | **~44 GB** |
+| **Headroom** | **~4 GB** |
+
+During LoRA fine-tuning (server stopped):
+
+| Component | Memory |
+|-----------|--------|
+| macOS + system services | ~3 GB |
+| Model weights (frozen, Q8_0) | ~34 GB |
+| LoRA adapter gradients + optimizer | ~4 GB |
+| Training batch + activations | ~5 GB |
+| **Total (training)** | **~46 GB** |
+| **Headroom** | **~2 GB** |
+
+Both workloads fit within the 48 GB budget. Inference and training are mutually exclusive — the server is stopped during fine-tuning runs to reclaim KV cache memory for training.
+
+## Security Considerations
+
+* mlx-lm-server has no authentication — bind to LAN only; waterdeep's firewall blocks external access
+* No code leaves the network — all inference and training is local
+* Training data is sourced exclusively from Gitea (internal repos) — no external data contamination
+* Adapter weights are versioned in Gitea — auditable lineage from training data to deployed model
+* Consider adding a simple API key check via a reverse proxy (Caddy/nginx) if the LAN is not fully trusted
+
+## Future Considerations
+
+* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): If acquired, DGX Spark could fine-tune larger coding models (70B+) or run full fine-tunes instead of LoRA. waterdeep would remain the serving endpoint unless the DGX Spark also serves inference.
+* **Adapter hot-swap**: mlx-lm supports loading adapters at request time — could serve multiple fine-tuned adapters (e.g., Go-specific, Python-specific, YAML-specific) from a single base model
+* **RAG augmentation**: Combine the fine-tuned model with a RAG pipeline that retrieves relevant code snippets from Milvus ([ADR-0008](0008-use-milvus-for-vectors.md)) for even better context-aware completions
+* **Continuous fine-tuning**: Trigger the pipeline automatically on Gitea push events via NATS — the model stays current with codebase changes
+* **Evaluation suite**: Build a project-specific eval set (handler-base patterns, pipeline templates, Flux manifests) to measure fine-tuning quality beyond generic benchmarks
+* **Newer models**: As new coding models are released (Qwen 3 Coder, DeepSeek Coder V3, etc.), re-evaluate which model maximises quality within the 48 GB budget
+
+## Links
+
+* Updates: [ADR-0059](0059-mac-mini-ray-worker.md) — waterdeep repurposed from 3D avatar workstation to dedicated coding agent
+* Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy (distributed CPU + DGX Spark path)
+* Related: [ADR-0047](0047-mlflow-experiment-tracking.md) — MLflow experiment tracking
+* Related: [ADR-0054](0054-kubeflow-pipeline-cicd.md) — Kubeflow Pipeline CI/CD
+* Related: [ADR-0012](0012-use-uv-for-python-development.md) — uv for Python development
+* Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions (waterdeep)
+* Related: [ADR-0060](0060-internal-pki-vault.md) — Internal PKI (TLS for waterdeep endpoint)
+* [Qwen 2.5 Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) — Model card
+* [MLX LM](https://github.com/ml-explore/mlx-examples/tree/main/llms/mlx_lm) — Apple MLX language model framework
+* [OpenCode](https://opencode.ai) — Terminal-based AI coding assistant
+* [Continue.dev](https://continue.dev) — VS Code AI coding extension with custom model support
Author	SHA1	Message	Date
Gitea Actions	6097be571c	docs: auto-update ADR index in README [skip ci]	2026-02-26 11:17:22 +00:00
Billy D.	6e574ffc4b	adding in proto for mr beaker. All checks were successful Update README with ADR Index / update-readme (push) Successful in 9s Details	2026-02-26 06:17:10 -05:00
Gitea Actions	b64102d853	docs: auto-update ADR index in README [skip ci]	2026-02-24 10:35:37 +00:00
Billy D.	4affddf9b4	replacing mcp blender with reproducable flow. All checks were successful Update README with ADR Index / update-readme (push) Successful in 1m5s Details	2026-02-24 05:34:29 -05:00
Gitea Actions	e28382a765	docs: auto-update ADR index in README [skip ci]	2026-02-23 11:15:32 +00:00
Billy D.	100ba21eba	updates to adrs and fixing to reflect go refactor. All checks were successful Update README with ADR Index / update-readme (push) Successful in 1m2s Details	2026-02-23 06:14:30 -05:00
Gitea Actions	f19fa3e969	docs: auto-update ADR index in README [skip ci]	2026-02-21 21:28:33 +00:00
Billy D.	50b14b2a75	ADR-0059: repurpose waterdeep from Ray worker to local AI agent All checks were successful Update README with ADR Index / update-readme (push) Successful in 1m6s Details Replace the proposed Ray cluster worker role with a dedicated local AI agent for BlenderMCP 3D avatar creation (supporting ADR-0062). waterdeep's Metal GPU provides hardware-accelerated rendering in Blender — far superior to Kasm's CPU-only DinD environment. The Ray cluster GPU fleet is fully allocated and stable; adding MPS complexity is not justified. Also adds cross-reference from ADR-0062 to ADR-0059.	2026-02-21 16:27:25 -05:00
Gitea Actions	32e370401f	docs: auto-update ADR index in README [skip ci]	2026-02-21 21:18:40 +00:00
Billy D.	654b7ae774	ADR-0062: replace Cloudflare R2 with RustFS via Cloudflare Tunnel All checks were successful Update README with ADR Index / update-readme (push) Successful in 1m8s Details gravenhollow RustFS is already S3-compatible — expose it through the existing Cloudflare Tunnel with a dedicated HTTPRoute at assets.daviestechlabs.io. Cloudflare CDN caches at edge PoPs. Eliminates: R2 bucket, rclone sync CronJob, R2 API token, and 6-hour sync delay. Single source of truth on gravenhollow.	2026-02-21 16:17:20 -05:00
Billy D.	9fe12e0cff	ADR-0062: BlenderMCP 3D avatar workflow with Kasm, gravenhollow NFS, and Cloudflare R2 Some checks failed Update README with ADR Index / update-readme (push) Has been cancelled Details	2026-02-21 16:13:36 -05:00
Gitea Actions	defbd5b2f9	docs: auto-update ADR index in README [skip ci]	2026-02-21 20:58:43 +00:00
Billy D.	555b70b9d9	docs: accept ADR-0061 (Go handler refactor), supersede ADR-0004 (msgpack→protobuf) All checks were successful Update README with ADR Index / update-readme (push) Successful in 1m5s Details All 5 handler services + companions-frontend migrated to handler-base v1.0.0 with protobuf wire format. golangci-lint clean across all repos.	2026-02-21 15:46:43 -05:00
Gitea Actions	f650b4bd22	docs: auto-update ADR index in README [skip ci]	2026-02-20 12:38:33 +00:00
Billy Davies	cbd892c7c9	Merge pull request 'docs(adr): add ADR-0061 Go handler refactor' (#1 ) from feature/go-handler-refactor into main All checks were successful Update README with ADR Index / update-readme (push) Successful in 1m2s Details Reviewed-on: #1	2026-02-20 12:32:57 +00:00
Billy D.	e57d998d9a	docs(adr): add ADR-0061 Go handler refactor	2026-02-19 07:14:36 -05:00