updates to adrs and fixing to reflect go refactor.

2026-02-23 06:14:23 -05:00
parent f19fa3e969
commit 100ba21eba
7 changed files with 181 additions and 129 deletions
--- a/AGENT-ONBOARDING.md
+++ b/AGENT-ONBOARDING.md
@@ -22,13 +22,13 @@ You are working on a **homelab Kubernetes cluster** running:

 | Repo | Purpose |
 |------|---------|
-| `handler-base` | Shared Python library for NATS handlers |
-| `chat-handler` | Text chat with RAG pipeline |
-| `voice-assistant` | Voice pipeline (STT → RAG → LLM → TTS) |
+| `handler-base` | Shared Go module for NATS handlers (protobuf, health, OTel, clients) |
+| `chat-handler` | Text chat with RAG pipeline (Go) |
+| `voice-assistant` | Voice pipeline: STT → RAG → LLM → TTS (Go) |
 | `kuberay-images` | GPU-specific Ray worker Docker images |
-| `pipeline-bridge` | Bridge between pipelines and services |
-| `stt-module` | Speech-to-text service |
-| `tts-module` | Text-to-speech service |
+| `pipeline-bridge` | Bridge between pipelines and services (Go) |
+| `stt-module` | Speech-to-text service (Go) |
+| `tts-module` | Text-to-speech service (Go) |
 | `ray-serve` | Ray Serve inference services |
 | `argo` | Argo Workflows (training, batch inference) |
 | `kubeflow` | Kubeflow Pipeline definitions |
@@ -48,7 +48,7 @@ You are working on a **homelab Kubernetes cluster** running:
 ┌─────────────────────────────────────────────────────────────────┐
 │                      NATS MESSAGE BUS                            │
 │  Subjects: ai.chat.*, ai.voice.*, ai.pipeline.*                 │
-│  Format: MessagePack (binary)                                   │
+│  Format: Protocol Buffers (binary, see ADR-0061)                │
 └───────────────────────────┬─────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
@@ -93,19 +93,23 @@ talos/
 ### AI/ML Services (Gitea daviestechlabs org)

 ```
-handler-base/                 # Shared handler library
-├── handler_base/             #   Core classes
-│   ├── handler.py            #   Base Handler class
-│   ├── nats_client.py        #   NATS wrapper
-│   └── clients/              #   Service clients (STT, TTS, LLM, etc.)
+handler-base/                 # Shared Go module (NATS, health, OTel, protobuf)
+├── clients/                  #   HTTP clients (LLM, STT, TTS, embeddings, reranker)
+├── config/                   #   Env-based configuration (struct tags)
+├── gen/messagespb/           #   Generated protobuf stubs
+├── handler/                  #   Typed NATS message handler
+├── health/                   #   HTTP health + readiness server
+└── natsutil/                 #   NATS publish/request with protobuf

-chat-handler/                 # RAG chat service
-├── chat_handler_v2.py        #   Handler-base version
-└── Dockerfile.v2
+chat-handler/                 # RAG chat service (Go)
+├── main.go
+├── main_test.go
+└── Dockerfile

-voice-assistant/              # Voice pipeline service
-├── voice_assistant_v2.py     #   Handler-base version
-└── pipelines/voice_pipeline.py
+voice-assistant/              # Voice pipeline service (Go)
+├── main.go
+├── main_test.go
+└── Dockerfile

 argo/                         # Argo WorkflowTemplates
 ├── batch-inference.yaml
@@ -127,8 +131,23 @@ kuberay-images/               # GPU worker images

 ## 🔌 Service Endpoints (Internal)

+```go
+// Copy-paste ready for Go handler services
+const (
+    NATSUrl        = "nats://nats.ai-ml.svc.cluster.local:4222"
+    VLLMUrl        = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
+    WhisperUrl     = "http://whisper-predictor.ai-ml.svc.cluster.local"
+    TTSUrl         = "http://tts-predictor.ai-ml.svc.cluster.local"
+    EmbeddingsUrl  = "http://embeddings-predictor.ai-ml.svc.cluster.local"
+    RerankerUrl    = "http://reranker-predictor.ai-ml.svc.cluster.local"
+    MilvusHost     = "milvus.ai-ml.svc.cluster.local"
+    MilvusPort     = 19530
+    ValkeyUrl      = "redis://valkey.ai-ml.svc.cluster.local:6379"
+)
+```
+
 ```python
-# Copy-paste ready for Python code
+# For Python services (Ray Serve, Kubeflow pipelines, Gradio UIs)
 NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
 VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
 WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
@@ -175,7 +194,7 @@ f"ai.pipeline.status.{request_id}"     # Status updates

 ### Add a New NATS Handler

-1. Create handler repo or add to existing (use `handler-base` library)
+1. Create Go handler repo using `handler-base` module (see [ADR-0061](decisions/0061-go-handler-refactor.md))
 2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
 3. Push to main → Flux deploys automatically

--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -44,7 +44,7 @@ The homelab is a production-grade Kubernetes cluster running on bare-metal hardw
 │  │  • AI_PIPELINE (24h, file)           - Workflow triggers            │    │
 │  └─────────────────────────────────────────────────────────────────────┘    │
 │                                                                              │
-│  Message Format: MessagePack (binary, not JSON)                             │
+│  Message Format: Protocol Buffers (binary, see ADR-0061)                │
 └─────────────────────────────────────────────────────────────────────────────┘
                                  │
        ┌─────────────────────────┼─────────────────────────┐
@@ -312,12 +312,12 @@ Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafan
 |----------|-----------|-----|
 | Talos Linux | Immutable, API-driven, secure | [ADR-0002](decisions/0002-use-talos-linux.md) |
 | NATS over Kafka | Simpler ops, sufficient throughput | [ADR-0003](decisions/0003-use-nats-for-messaging.md) |
-| MessagePack over JSON | Binary efficiency for audio | [ADR-0004](decisions/0004-use-messagepack-for-nats.md) |
+| Protocol Buffers over MessagePack | Type-safe, schema-driven, Go-native | [ADR-0061](decisions/0061-go-handler-refactor.md) |
 | Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) |
 | GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
 | KServe for inference | Standardized API, autoscaling | [ADR-0007](decisions/0007-use-kserve-for-inference.md) |
 | KubeRay unified backend | Fractional GPU, single endpoint | [ADR-0011](decisions/0011-kuberay-unified-gpu-backend.md) |
-| Go handler refactor | Slim images for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) |
+| Go handler refactor | Slim images, type-safe protobuf for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) |

 ## Related Documents

--- a/CODING-CONVENTIONS.md
+++ b/CODING-CONVENTIONS.md
@@ -28,27 +28,29 @@ kubernetes/
 ### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)

 ```
-handler-base/                # Shared library for all handlers
-├── handler_base/
-│   ├── handler.py           # Base Handler class
-│   ├── nats_client.py       # NATS wrapper
-│   ├── config.py            # Pydantic Settings
-│   ├── health.py            # K8s probes
-│   ├── telemetry.py         # OpenTelemetry
-│   └── clients/             # Service clients
-├── tests/
-└── pyproject.toml
+handler-base/                # Shared Go module for all NATS handlers
+├── clients/                 #   HTTP clients (LLM, STT, TTS, embeddings, reranker)
+├── config/                  #   Env-based configuration (struct tags)
+├── gen/messagespb/          #   Generated protobuf stubs
+├── handler/                 #   Typed NATS message handler with OTel + health wiring
+├── health/                  #   HTTP health + readiness server
+├── messages/                #   Type aliases from generated protobuf stubs
+├── natsutil/                #   NATS publish/request with protobuf encoding
+├── proto/messages/v1/       #   .proto schema source
+├── go.mod
+└── buf.yaml                 #   buf protobuf toolchain config

-chat-handler/                # Text chat service
-voice-assistant/             # Voice pipeline service
-pipeline-bridge/             # Workflow engine bridge
-├── {name}.py                # Handler implementation (uses handler-base)
-├── pyproject.toml           # PEP 621 project metadata (see ADR-0012)
-├── uv.lock                  # Deterministic lock file
-├── tests/
-│   ├── conftest.py
-│   └── test_{name}.py
-└── Dockerfile
+chat-handler/                # Text chat service (Go)
+voice-assistant/             # Voice pipeline service (Go)
+pipeline-bridge/             # Workflow engine bridge (Go)
+stt-module/                  # Speech-to-text bridge (Go)
+tts-module/                  # Text-to-speech bridge (Go)
+├── main.go                  # Service entry point
+├── main_test.go             # Unit tests
+├── e2e_test.go              # End-to-end tests
+├── go.mod                   # Go module (depends on handler-base)
+├── Dockerfile               # Distroless container (~20 MB)
+└── renovate.json            # Dependency update config

 argo/                        # Argo WorkflowTemplates
 ├── {workflow-name}.yaml
@@ -138,7 +140,20 @@ tts_task = synthesize_speech(text=llm_task.output)  # noqa: F841

 ### Project Structure

+```go
+// Go handler services use handler-base shared module
+import (
+    "git.daviestechlabs.io/daviestechlabs/handler-base/clients"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/config"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/handler"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/health"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/messages"
+    "git.daviestechlabs.io/daviestechlabs/handler-base/natsutil"
+)
+```
+
 ```python
+# Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs
 # Use async/await for I/O
 async def handle_message(msg: Msg) -> None:
    ...
@@ -149,10 +164,6 @@ class ChatRequest:
    user_id: str
    message: str
    enable_rag: bool = True
-
-# Use msgpack for NATS messages
-import msgpack
-data = msgpack.packb({"key": "value"})
 ```

 ### Naming
@@ -200,31 +211,36 @@ except Exception as e:

 ### NATS Message Handling

-```python
-import nats
-import msgpack
+All NATS handler services use Go with Protocol Buffers encoding (see [ADR-0061](decisions/0061-go-handler-refactor.md)):

-async def message_handler(msg: Msg) -> None:
-    try:
-        # Decode MessagePack
-        data = msgpack.unpackb(msg.data, raw=False)
-        
-        # Process
-        result = await process(data)
-        
-        # Reply if request-reply pattern
-        if msg.reply:
-            await msg.respond(msgpack.packb(result))
-        
-        # Acknowledge for JetStream
-        await msg.ack()
-        
-    except Exception as e:
-        logger.error(f"Handler error: {e}")
-        # NAK for retry (JetStream)
-        await msg.nak()
+```go
+// Go NATS handler (production pattern)
+func (h *Handler) handleMessage(msg *nats.Msg) {
+    var req messages.ChatRequest
+    if err := proto.Unmarshal(msg.Data, &req); err != nil {
+        h.logger.Error("failed to unmarshal", "error", err)
+        return
+    }
+
+    // Process
+    result, err := h.process(ctx, &req)
+    if err != nil {
+        h.logger.Error("handler error", "error", err)
+        msg.Nak()
+        return
+    }
+
+    // Reply if request-reply pattern
+    if msg.Reply != "" {
+        data, _ := proto.Marshal(result)
+        msg.Respond(data)
+    }
+    msg.Ack()
+}
 ```

+> **Python NATS** is still used in Ray Serve `runtime_env` and Kubeflow pipeline components where needed, but all dedicated NATS handler services are Go.
+
 ---

 ## Kubernetes Manifest Conventions
@@ -499,8 +515,9 @@ Each application should have a README with:
 | Use `latest` image tags | Pin to specific versions |
 | Skip health checks | Always define liveness/readiness |
 | Ignore resource limits | Set appropriate requests/limits |
-| Use JSON for NATS messages | Use MessagePack (binary) |
-| Synchronous I/O in handlers | Use async/await |
+| Use JSON for NATS messages | Use Protocol Buffers (see ADR-0061) |
+| Write handler services in Python | Use Go with handler-base module (ADR-0061) |
+| Synchronous I/O in handlers | Use goroutines / async patterns |

 ---

--- a/TECH-STACK.md
+++ b/TECH-STACK.md
@@ -117,9 +117,14 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca

 | Application | Language | Framework | Purpose |
 |-------------|----------|-----------|---------|
-| Companions | Go | net/http + HTMX | AI chat interface |
-| Voice WebApp | Python | Gradio | Voice assistant UI |
-| Various handlers | Python | asyncio + nats.py | NATS event handlers |
+| Companions | Go | net/http + HTMX | AI chat interface (SSR) |
+| Chat Handler | Go | handler-base | RAG + LLM text pipeline |
+| Voice Assistant | Go | handler-base | STT → RAG → LLM → TTS pipeline |
+| Pipeline Bridge | Go | handler-base | Kubeflow/Argo workflow triggers |
+| STT Module | Go | handler-base | Speech-to-text bridge |
+| TTS Module | Go | handler-base | Text-to-speech bridge |
+| Voice WebApp | Python | Gradio | Voice assistant UI (dev/testing) |
+| Ray Serve | Python | Ray Serve | GPU inference endpoints |

 ### Frontend

@@ -242,27 +247,41 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca

 ---

-## Python Dependencies (handler-base)
+## Go Dependencies (handler-base)

-Core library for all NATS handlers: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
+Shared Go module for all NATS handler services: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
+
+```go
+// go.mod (handler-base v1.0.0)
+require (
+    github.com/nats-io/nats.go          // NATS client
+    google.golang.org/protobuf           // Protocol Buffers encoding
+    github.com/zitadel/oidc/v3           // OIDC client
+    go.opentelemetry.io/otel             // OpenTelemetry traces + metrics
+    github.com/milvus-io/milvus-sdk-go   // Milvus vector search
+)
+```
+
+See [ADR-0061](decisions/0061-go-handler-refactor.md) for the full refactoring rationale.
+
+## Python Dependencies (ML/AI only)
+
+Python is retained for ML inference, pipeline orchestration, and dev tools:

 ```toml
-# Core
-nats-py>=2.7.0          # NATS client
-msgpack>=1.0.0          # Binary serialization
-httpx>=0.27.0           # HTTP client
+# ray-serve (GPU inference)
+ray[serve]>=2.53.0
+vllm>=0.8.0
+faster-whisper>=1.0.0
+TTS>=0.22.0
+sentence-transformers>=3.0.0

-# ML/AI
-pymilvus>=2.4.0         # Milvus client
-openai>=1.0.0           # vLLM OpenAI API
+# kubeflow (pipeline definitions)
+kfp>=2.12.1

-# Observability
-opentelemetry-api>=1.20.0
-opentelemetry-sdk>=1.20.0
-mlflow>=2.10.0          # Experiment tracking
-
-# Kubeflow (kubeflow repo)
-kfp>=2.12.1             # Pipeline SDK
+# mlflow (experiment tracking)
+mlflow>=3.7.0
+pymilvus>=2.4.0
 ```

 ---
--- a/decisions/0019-handler-deployment-strategy.md
+++ b/decisions/0019-handler-deployment-strategy.md
@@ -1,10 +1,12 @@
 # Python Module Deployment Strategy

-* Status: accepted
+* Status: superseded by [ADR-0061](0061-go-handler-refactor.md)
 * Date: 2026-02-02
 * Deciders: Billy
 * Technical Story: Define how Python handler modules are packaged and deployed to Kubernetes

+> **Note (2026-02-23):** This ADR described deploying Python handlers as Ray Serve applications inside the Ray cluster. [ADR-0061](0061-go-handler-refactor.md) supersedes this approach — all five handler services (chat-handler, voice-assistant, pipeline-bridge, tts-module, stt-module) have been rewritten in Go and now deploy as standalone Kubernetes Deployments with distroless container images (~20 MB each). The Ray cluster is exclusively used for GPU inference workloads. The handler-base shared library is now a Go module published at `git.daviestechlabs.io/daviestechlabs/handler-base` using Protocol Buffers for NATS message encoding.
+
 ## Context

 We have Python modules for AI/ML workflows that need to run on our unified GPU cluster:
--- a/decisions/0046-companions-frontend-architecture.md
+++ b/decisions/0046-companions-frontend-architecture.md
@@ -14,7 +14,7 @@ How do we build a performant, maintainable frontend that integrates with the NAT
 ## Decision Drivers

 * Real-time streaming for chat and voice (WebSocket required)
-* Direct integration with NATS JetStream (binary MessagePack protocol)
+* Direct integration with NATS JetStream (Protocol Buffers encoding, see [ADR-0061](0061-go-handler-refactor.md))
 * Minimal client-side JavaScript (~20KB gzipped target)
 * No frontend build step (no webpack/vite/node required)
 * 3D avatar rendering for immersive experience
@@ -39,8 +39,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
 * No npm, no webpack, no build step — assets served directly
 * Server-side rendering via Go templates
 * WebSocket handled natively in Go (gorilla/websocket)
-* NATS integration with MessagePack in the same binary
+* NATS integration with Protocol Buffers in the same binary
 * Distroless container image for minimal attack surface
+* Type-safe NATS messages via handler-base shared Go module (protobuf stubs)

 ### Negative Consequences

@@ -58,8 +59,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
 | Client state | Alpine.js 3 | Lightweight reactive UI for local state |
 | 3D Avatars | Three.js + VRM | 3D character rendering with lip-sync |
 | Styling | Tailwind CSS 4 + DaisyUI | Utility-first CSS with component library |
-| Messaging | NATS JetStream | Real-time pub/sub with MessagePack encoding |
+| Messaging | NATS JetStream | Real-time pub/sub with Protocol Buffers encoding |
 | Auth | golang-jwt/jwt/v5 | JWT token handling for OAuth flows |
+| Shared lib | handler-base (Go module) | NATS client, protobuf messages, health, OTel, HTTP clients |
 | Database | PostgreSQL (lib/pq) + SQLite | Persistent + local session storage |
 | Observability | OpenTelemetry SDK | Traces, metrics via OTLP gRPC |

@@ -88,7 +90,7 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
 │              ┌─────────┴─────────┐                              │
 │              │  NATS Client      │                              │
 │              │  (JetStream +     │                              │
-│              │   MessagePack)    │                              │
+│              │   Protobuf)       │                              │
 │              └─────────┬─────────┘                              │
 └────────────────────────┼────────────────────────────────────────┘
                         │
@@ -130,8 +132,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
 ## Links

 * Related to [ADR-0003](0003-use-nats-for-messaging.md) (NATS messaging)
-* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding)
+* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding — superseded by Protocol Buffers, see [ADR-0061](0061-go-handler-refactor.md))
 * Related to [ADR-0011](0011-kuberay-unified-gpu-backend.md) (Ray Serve backend)
 * Related to [ADR-0028](0028-authentik-sso-strategy.md) (OAuth/OIDC)
+* Related to [ADR-0061](0061-go-handler-refactor.md) (Go handler refactor — handler-base shared module, protobuf wire format)
 * [HTMX Documentation](https://htmx.org/docs/)
 * [VRM Specification](https://vrm.dev/en/)
--- a/decisions/0059-mac-mini-ray-worker.md
+++ b/decisions/0059-mac-mini-ray-worker.md
@@ -1,8 +1,8 @@
 # Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation

-* Status: proposed
+* Status: accepted
 * Date: 2026-02-16
-* Updated: 2026-02-21
+* Updated: 2026-02-23
 * Deciders: Billy
 * Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role

@@ -25,14 +25,15 @@ How should we use waterdeep to maximise the 3D avatar creation pipeline for comp
 * Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance
 * waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting
 * 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping
-* VS Code with Copilot agent mode can drive BlenderMCP locally with zero-latency socket communication (localhost:9876)
+* VS Code with Copilot agent mode and BlenderMCP server are installed on waterdeep — VS Code drives Blender via localhost:9876 with zero-latency socket communication
 * Exported VRM models must reach gravenhollow for production serving ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md))
+* **rclone** chosen for asset promotion to gravenhollow's RustFS S3 endpoint — simpler than NFS mounts on macOS, consistent with existing Kasm rclone patterns, and avoids autofs/NFS fstab complexity
 * The Kasm Blender workflow from ADR-0062 remains available as a fallback (browser-based, no local install required)
 * ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified

 ## Considered Options

-1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via NFS/rclone
+1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via rclone (S3)
 2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training
 3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062

@@ -45,17 +46,18 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
 * Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting
 * Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution
 * 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure
-* VS Code + Copilot agent mode runs natively with full local context for both code and Blender commands
+* VS Code + Copilot agent mode + BlenderMCP server installed natively — single editor drives both code and Blender commands
+* rclone for asset promotion — consistent with Kasm rclone patterns, avoids macOS NFS/autofs complexity
 * Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload
 * Kasm Blender remains available as a browser-based fallback for remote/mobile access
 * Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend

 ### Negative Consequences

-* Blender + add-ons must be installed and maintained locally on waterdeep
-* Assets created locally need explicit promotion to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
+* Blender, VS Code, and add-ons must be installed and maintained locally on waterdeep via Homebrew
+* Assets created locally need explicit `rclone copy` to promote to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
 * waterdeep is a single machine — no redundancy for the 3D creation workflow
-* Not managed by Kubernetes or GitOps — relies on manual or Homebrew-managed tooling
+* Not managed by Kubernetes or GitOps — relies on Homebrew-managed tooling

 ## Pros and Cons of the Options

@@ -67,8 +69,8 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
 * Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer
 * Good, because waterdeep stays a dev workstation, aligning with its named role
 * Bad, because local-only — no browser-based remote access (use Kasm for that)
-* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP)
-* Bad, because asset promotion to gravenhollow requires explicit action
+* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP, VS Code)
+* Bad, because asset promotion to gravenhollow requires explicit rclone command

 ### Option 2: External Ray worker on macOS (original proposal)

@@ -119,8 +121,8 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
 │  │  └── textures/          (shared texture library)     │              │
 │  └──────────────────────────────────────────────────────┘              │
 │                          │                                              │
-│                    NFS mount or rclone                                   │
-│                    (asset promotion)                                     │
+│                    rclone (S3 asset promotion)                           │
+│                    gravenhollow RustFS :30292                            │
 └──────────────────────────┼──────────────────────────────────────────────┘
                           │
                           ▼
@@ -200,24 +202,9 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
 uvx blender-mcp --help
 ```

-### 4. NFS Mount for Asset Promotion
+### 4. rclone for Asset Promotion

-Mount gravenhollow's avatar-models directory for direct promotion of finished VRM exports:
-
-```bash
-# Create mount point
-sudo mkdir -p /Volumes/avatar-models
-
-# Mount gravenhollow NFS (all-SSD, dual 10GbE)
-sudo mount -t nfs \
-    gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models \
-    /Volumes/avatar-models
-
-# Add to /etc/auto_master for persistent mount (macOS autofs)
-# /Volumes/avatar-models  -fstype=nfs  gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models
-```
-
-Alternatively, use rclone for S3-based promotion:
+Use rclone to promote finished VRM exports to gravenhollow's RustFS S3 endpoint. This is consistent with the Kasm rclone volume plugin pattern from [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) and avoids macOS NFS/autofs complexity.

 ```bash
 # Install rclone
@@ -232,8 +219,13 @@ rclone config create gravenhollow s3 \

 # Promote a finished VRM
 rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/
+
+# Sync all exports (idempotent)
+rclone sync ~/blender-avatars/exports/ gravenhollow:avatar-models/ --exclude "*.blend"
 ```

+> **Why rclone over NFS?** macOS autofs/NFS mounts are fragile across reboots and network changes. rclone is a single binary, works over HTTPS, and matches the promotion pattern already used in Kasm workflows. The explicit `rclone copy` command also serves as a deliberate promotion gate — only intentionally promoted models reach production.
+
 ### 5. Avatar Creation Workflow (waterdeep)

 1. **Open Blender** on waterdeep (native Metal-accelerated)
@@ -245,9 +237,9 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models
   - _"Rig this character for VRM export with standard humanoid bones"_
   - _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_
 5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps
-6. **Promote** the finished VRM to gravenhollow:
+6. **Promote** the finished VRM to gravenhollow via rclone:
   ```bash
-   cp ~/blender-avatars/exports/Silver-Mage-v1.vrm /Volumes/avatar-models/
+   rclone copy ~/blender-avatars/exports/Silver-Mage-v1.vrm gravenhollow:avatar-models/
   ```
 7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit

@@ -260,7 +252,7 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models
 | **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container |
 | **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation |
 | **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound |
-| **Asset promotion** | NFS mount or rclone to gravenhollow | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
+| **Asset promotion** | rclone to gravenhollow RustFS S3 | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
 | **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere |
 | **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image |
 | **Use when** | Primary creation workflow | Remote access, quick edits, mobile |
@@ -278,7 +270,7 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models

 * **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation
 * **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature
-* **Automated promotion**: A file watcher (fswatch/launchd) could auto-promote VRM exports from `~/blender-avatars/exports/` to gravenhollow when a new file appears
+* **Automated promotion**: A file watcher (fswatch/launchd) could auto-run `rclone sync` when a new VRM appears in `~/blender-avatars/exports/`
 * **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow

 ## Links