From 100ba21eba2f3ed88a5b44c192875f7b1b040537 Mon Sep 17 00:00:00 2001 From: "Billy D." Date: Mon, 23 Feb 2026 06:14:23 -0500 Subject: [PATCH] updates to adrs and fixing to reflect go refactor. --- AGENT-ONBOARDING.md | 59 +++++---- ARCHITECTURE.md | 6 +- CODING-CONVENTIONS.md | 113 ++++++++++-------- TECH-STACK.md | 57 ++++++--- decisions/0019-handler-deployment-strategy.md | 4 +- .../0046-companions-frontend-architecture.md | 13 +- decisions/0059-mac-mini-ray-worker.md | 58 ++++----- 7 files changed, 181 insertions(+), 129 deletions(-) diff --git a/AGENT-ONBOARDING.md b/AGENT-ONBOARDING.md index 25a71ce..a6ea1e6 100644 --- a/AGENT-ONBOARDING.md +++ b/AGENT-ONBOARDING.md @@ -22,13 +22,13 @@ You are working on a **homelab Kubernetes cluster** running: | Repo | Purpose | |------|---------| -| `handler-base` | Shared Python library for NATS handlers | -| `chat-handler` | Text chat with RAG pipeline | -| `voice-assistant` | Voice pipeline (STT → RAG → LLM → TTS) | +| `handler-base` | Shared Go module for NATS handlers (protobuf, health, OTel, clients) | +| `chat-handler` | Text chat with RAG pipeline (Go) | +| `voice-assistant` | Voice pipeline: STT → RAG → LLM → TTS (Go) | | `kuberay-images` | GPU-specific Ray worker Docker images | -| `pipeline-bridge` | Bridge between pipelines and services | -| `stt-module` | Speech-to-text service | -| `tts-module` | Text-to-speech service | +| `pipeline-bridge` | Bridge between pipelines and services (Go) | +| `stt-module` | Speech-to-text service (Go) | +| `tts-module` | Text-to-speech service (Go) | | `ray-serve` | Ray Serve inference services | | `argo` | Argo Workflows (training, batch inference) | | `kubeflow` | Kubeflow Pipeline definitions | @@ -48,7 +48,7 @@ You are working on a **homelab Kubernetes cluster** running: ┌─────────────────────────────────────────────────────────────────┐ │ NATS MESSAGE BUS │ │ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │ -│ Format: MessagePack (binary) │ +│ Format: Protocol Buffers (binary, see ADR-0061) │ └───────────────────────────┬─────────────────────────────────────┘ │ ┌───────────────────┼───────────────────┐ @@ -93,19 +93,23 @@ talos/ ### AI/ML Services (Gitea daviestechlabs org) ``` -handler-base/ # Shared handler library -├── handler_base/ # Core classes -│ ├── handler.py # Base Handler class -│ ├── nats_client.py # NATS wrapper -│ └── clients/ # Service clients (STT, TTS, LLM, etc.) +handler-base/ # Shared Go module (NATS, health, OTel, protobuf) +├── clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker) +├── config/ # Env-based configuration (struct tags) +├── gen/messagespb/ # Generated protobuf stubs +├── handler/ # Typed NATS message handler +├── health/ # HTTP health + readiness server +└── natsutil/ # NATS publish/request with protobuf -chat-handler/ # RAG chat service -├── chat_handler_v2.py # Handler-base version -└── Dockerfile.v2 +chat-handler/ # RAG chat service (Go) +├── main.go +├── main_test.go +└── Dockerfile -voice-assistant/ # Voice pipeline service -├── voice_assistant_v2.py # Handler-base version -└── pipelines/voice_pipeline.py +voice-assistant/ # Voice pipeline service (Go) +├── main.go +├── main_test.go +└── Dockerfile argo/ # Argo WorkflowTemplates ├── batch-inference.yaml @@ -127,8 +131,23 @@ kuberay-images/ # GPU worker images ## 🔌 Service Endpoints (Internal) +```go +// Copy-paste ready for Go handler services +const ( + NATSUrl = "nats://nats.ai-ml.svc.cluster.local:4222" + VLLMUrl = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1" + WhisperUrl = "http://whisper-predictor.ai-ml.svc.cluster.local" + TTSUrl = "http://tts-predictor.ai-ml.svc.cluster.local" + EmbeddingsUrl = "http://embeddings-predictor.ai-ml.svc.cluster.local" + RerankerUrl = "http://reranker-predictor.ai-ml.svc.cluster.local" + MilvusHost = "milvus.ai-ml.svc.cluster.local" + MilvusPort = 19530 + ValkeyUrl = "redis://valkey.ai-ml.svc.cluster.local:6379" +) +``` + ```python -# Copy-paste ready for Python code +# For Python services (Ray Serve, Kubeflow pipelines, Gradio UIs) NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222" VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1" WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local" @@ -175,7 +194,7 @@ f"ai.pipeline.status.{request_id}" # Status updates ### Add a New NATS Handler -1. Create handler repo or add to existing (use `handler-base` library) +1. Create Go handler repo using `handler-base` module (see [ADR-0061](decisions/0061-go-handler-refactor.md)) 2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/` 3. Push to main → Flux deploys automatically diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index c50a303..9daa286 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -44,7 +44,7 @@ The homelab is a production-grade Kubernetes cluster running on bare-metal hardw │ │ • AI_PIPELINE (24h, file) - Workflow triggers │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ -│ Message Format: MessagePack (binary, not JSON) │ +│ Message Format: Protocol Buffers (binary, see ADR-0061) │ └─────────────────────────────────────────────────────────────────────────────┘ │ ┌─────────────────────────┼─────────────────────────┐ @@ -312,12 +312,12 @@ Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafan |----------|-----------|-----| | Talos Linux | Immutable, API-driven, secure | [ADR-0002](decisions/0002-use-talos-linux.md) | | NATS over Kafka | Simpler ops, sufficient throughput | [ADR-0003](decisions/0003-use-nats-for-messaging.md) | -| MessagePack over JSON | Binary efficiency for audio | [ADR-0004](decisions/0004-use-messagepack-for-nats.md) | +| Protocol Buffers over MessagePack | Type-safe, schema-driven, Go-native | [ADR-0061](decisions/0061-go-handler-refactor.md) | | Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) | | GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) | | KServe for inference | Standardized API, autoscaling | [ADR-0007](decisions/0007-use-kserve-for-inference.md) | | KubeRay unified backend | Fractional GPU, single endpoint | [ADR-0011](decisions/0011-kuberay-unified-gpu-backend.md) | -| Go handler refactor | Slim images for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) | +| Go handler refactor | Slim images, type-safe protobuf for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) | ## Related Documents diff --git a/CODING-CONVENTIONS.md b/CODING-CONVENTIONS.md index 67812f1..c600481 100644 --- a/CODING-CONVENTIONS.md +++ b/CODING-CONVENTIONS.md @@ -28,27 +28,29 @@ kubernetes/ ### AI/ML Repos (git.daviestechlabs.io/daviestechlabs) ``` -handler-base/ # Shared library for all handlers -├── handler_base/ -│ ├── handler.py # Base Handler class -│ ├── nats_client.py # NATS wrapper -│ ├── config.py # Pydantic Settings -│ ├── health.py # K8s probes -│ ├── telemetry.py # OpenTelemetry -│ └── clients/ # Service clients -├── tests/ -└── pyproject.toml +handler-base/ # Shared Go module for all NATS handlers +├── clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker) +├── config/ # Env-based configuration (struct tags) +├── gen/messagespb/ # Generated protobuf stubs +├── handler/ # Typed NATS message handler with OTel + health wiring +├── health/ # HTTP health + readiness server +├── messages/ # Type aliases from generated protobuf stubs +├── natsutil/ # NATS publish/request with protobuf encoding +├── proto/messages/v1/ # .proto schema source +├── go.mod +└── buf.yaml # buf protobuf toolchain config -chat-handler/ # Text chat service -voice-assistant/ # Voice pipeline service -pipeline-bridge/ # Workflow engine bridge -├── {name}.py # Handler implementation (uses handler-base) -├── pyproject.toml # PEP 621 project metadata (see ADR-0012) -├── uv.lock # Deterministic lock file -├── tests/ -│ ├── conftest.py -│ └── test_{name}.py -└── Dockerfile +chat-handler/ # Text chat service (Go) +voice-assistant/ # Voice pipeline service (Go) +pipeline-bridge/ # Workflow engine bridge (Go) +stt-module/ # Speech-to-text bridge (Go) +tts-module/ # Text-to-speech bridge (Go) +├── main.go # Service entry point +├── main_test.go # Unit tests +├── e2e_test.go # End-to-end tests +├── go.mod # Go module (depends on handler-base) +├── Dockerfile # Distroless container (~20 MB) +└── renovate.json # Dependency update config argo/ # Argo WorkflowTemplates ├── {workflow-name}.yaml @@ -138,7 +140,20 @@ tts_task = synthesize_speech(text=llm_task.output) # noqa: F841 ### Project Structure +```go +// Go handler services use handler-base shared module +import ( + "git.daviestechlabs.io/daviestechlabs/handler-base/clients" + "git.daviestechlabs.io/daviestechlabs/handler-base/config" + "git.daviestechlabs.io/daviestechlabs/handler-base/handler" + "git.daviestechlabs.io/daviestechlabs/handler-base/health" + "git.daviestechlabs.io/daviestechlabs/handler-base/messages" + "git.daviestechlabs.io/daviestechlabs/handler-base/natsutil" +) +``` + ```python +# Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs # Use async/await for I/O async def handle_message(msg: Msg) -> None: ... @@ -149,10 +164,6 @@ class ChatRequest: user_id: str message: str enable_rag: bool = True - -# Use msgpack for NATS messages -import msgpack -data = msgpack.packb({"key": "value"}) ``` ### Naming @@ -200,31 +211,36 @@ except Exception as e: ### NATS Message Handling -```python -import nats -import msgpack +All NATS handler services use Go with Protocol Buffers encoding (see [ADR-0061](decisions/0061-go-handler-refactor.md)): -async def message_handler(msg: Msg) -> None: - try: - # Decode MessagePack - data = msgpack.unpackb(msg.data, raw=False) - - # Process - result = await process(data) - - # Reply if request-reply pattern - if msg.reply: - await msg.respond(msgpack.packb(result)) - - # Acknowledge for JetStream - await msg.ack() - - except Exception as e: - logger.error(f"Handler error: {e}") - # NAK for retry (JetStream) - await msg.nak() +```go +// Go NATS handler (production pattern) +func (h *Handler) handleMessage(msg *nats.Msg) { + var req messages.ChatRequest + if err := proto.Unmarshal(msg.Data, &req); err != nil { + h.logger.Error("failed to unmarshal", "error", err) + return + } + + // Process + result, err := h.process(ctx, &req) + if err != nil { + h.logger.Error("handler error", "error", err) + msg.Nak() + return + } + + // Reply if request-reply pattern + if msg.Reply != "" { + data, _ := proto.Marshal(result) + msg.Respond(data) + } + msg.Ack() +} ``` +> **Python NATS** is still used in Ray Serve `runtime_env` and Kubeflow pipeline components where needed, but all dedicated NATS handler services are Go. + --- ## Kubernetes Manifest Conventions @@ -499,8 +515,9 @@ Each application should have a README with: | Use `latest` image tags | Pin to specific versions | | Skip health checks | Always define liveness/readiness | | Ignore resource limits | Set appropriate requests/limits | -| Use JSON for NATS messages | Use MessagePack (binary) | -| Synchronous I/O in handlers | Use async/await | +| Use JSON for NATS messages | Use Protocol Buffers (see ADR-0061) | +| Write handler services in Python | Use Go with handler-base module (ADR-0061) | +| Synchronous I/O in handlers | Use goroutines / async patterns | --- diff --git a/TECH-STACK.md b/TECH-STACK.md index a140c7a..49b1cc7 100644 --- a/TECH-STACK.md +++ b/TECH-STACK.md @@ -117,9 +117,14 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca | Application | Language | Framework | Purpose | |-------------|----------|-----------|---------| -| Companions | Go | net/http + HTMX | AI chat interface | -| Voice WebApp | Python | Gradio | Voice assistant UI | -| Various handlers | Python | asyncio + nats.py | NATS event handlers | +| Companions | Go | net/http + HTMX | AI chat interface (SSR) | +| Chat Handler | Go | handler-base | RAG + LLM text pipeline | +| Voice Assistant | Go | handler-base | STT → RAG → LLM → TTS pipeline | +| Pipeline Bridge | Go | handler-base | Kubeflow/Argo workflow triggers | +| STT Module | Go | handler-base | Speech-to-text bridge | +| TTS Module | Go | handler-base | Text-to-speech bridge | +| Voice WebApp | Python | Gradio | Voice assistant UI (dev/testing) | +| Ray Serve | Python | Ray Serve | GPU inference endpoints | ### Frontend @@ -242,27 +247,41 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca --- -## Python Dependencies (handler-base) +## Go Dependencies (handler-base) -Core library for all NATS handlers: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base) +Shared Go module for all NATS handler services: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base) + +```go +// go.mod (handler-base v1.0.0) +require ( + github.com/nats-io/nats.go // NATS client + google.golang.org/protobuf // Protocol Buffers encoding + github.com/zitadel/oidc/v3 // OIDC client + go.opentelemetry.io/otel // OpenTelemetry traces + metrics + github.com/milvus-io/milvus-sdk-go // Milvus vector search +) +``` + +See [ADR-0061](decisions/0061-go-handler-refactor.md) for the full refactoring rationale. + +## Python Dependencies (ML/AI only) + +Python is retained for ML inference, pipeline orchestration, and dev tools: ```toml -# Core -nats-py>=2.7.0 # NATS client -msgpack>=1.0.0 # Binary serialization -httpx>=0.27.0 # HTTP client +# ray-serve (GPU inference) +ray[serve]>=2.53.0 +vllm>=0.8.0 +faster-whisper>=1.0.0 +TTS>=0.22.0 +sentence-transformers>=3.0.0 -# ML/AI -pymilvus>=2.4.0 # Milvus client -openai>=1.0.0 # vLLM OpenAI API +# kubeflow (pipeline definitions) +kfp>=2.12.1 -# Observability -opentelemetry-api>=1.20.0 -opentelemetry-sdk>=1.20.0 -mlflow>=2.10.0 # Experiment tracking - -# Kubeflow (kubeflow repo) -kfp>=2.12.1 # Pipeline SDK +# mlflow (experiment tracking) +mlflow>=3.7.0 +pymilvus>=2.4.0 ``` --- diff --git a/decisions/0019-handler-deployment-strategy.md b/decisions/0019-handler-deployment-strategy.md index fbadd4a..3fdece4 100644 --- a/decisions/0019-handler-deployment-strategy.md +++ b/decisions/0019-handler-deployment-strategy.md @@ -1,10 +1,12 @@ # Python Module Deployment Strategy -* Status: accepted +* Status: superseded by [ADR-0061](0061-go-handler-refactor.md) * Date: 2026-02-02 * Deciders: Billy * Technical Story: Define how Python handler modules are packaged and deployed to Kubernetes +> **Note (2026-02-23):** This ADR described deploying Python handlers as Ray Serve applications inside the Ray cluster. [ADR-0061](0061-go-handler-refactor.md) supersedes this approach — all five handler services (chat-handler, voice-assistant, pipeline-bridge, tts-module, stt-module) have been rewritten in Go and now deploy as standalone Kubernetes Deployments with distroless container images (~20 MB each). The Ray cluster is exclusively used for GPU inference workloads. The handler-base shared library is now a Go module published at `git.daviestechlabs.io/daviestechlabs/handler-base` using Protocol Buffers for NATS message encoding. + ## Context We have Python modules for AI/ML workflows that need to run on our unified GPU cluster: diff --git a/decisions/0046-companions-frontend-architecture.md b/decisions/0046-companions-frontend-architecture.md index a4530e9..5981860 100644 --- a/decisions/0046-companions-frontend-architecture.md +++ b/decisions/0046-companions-frontend-architecture.md @@ -14,7 +14,7 @@ How do we build a performant, maintainable frontend that integrates with the NAT ## Decision Drivers * Real-time streaming for chat and voice (WebSocket required) -* Direct integration with NATS JetStream (binary MessagePack protocol) +* Direct integration with NATS JetStream (Protocol Buffers encoding, see [ADR-0061](0061-go-handler-refactor.md)) * Minimal client-side JavaScript (~20KB gzipped target) * No frontend build step (no webpack/vite/node required) * 3D avatar rendering for immersive experience @@ -39,8 +39,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi * No npm, no webpack, no build step — assets served directly * Server-side rendering via Go templates * WebSocket handled natively in Go (gorilla/websocket) -* NATS integration with MessagePack in the same binary +* NATS integration with Protocol Buffers in the same binary * Distroless container image for minimal attack surface +* Type-safe NATS messages via handler-base shared Go module (protobuf stubs) ### Negative Consequences @@ -58,8 +59,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi | Client state | Alpine.js 3 | Lightweight reactive UI for local state | | 3D Avatars | Three.js + VRM | 3D character rendering with lip-sync | | Styling | Tailwind CSS 4 + DaisyUI | Utility-first CSS with component library | -| Messaging | NATS JetStream | Real-time pub/sub with MessagePack encoding | +| Messaging | NATS JetStream | Real-time pub/sub with Protocol Buffers encoding | | Auth | golang-jwt/jwt/v5 | JWT token handling for OAuth flows | +| Shared lib | handler-base (Go module) | NATS client, protobuf messages, health, OTel, HTTP clients | | Database | PostgreSQL (lib/pq) + SQLite | Persistent + local session storage | | Observability | OpenTelemetry SDK | Traces, metrics via OTLP gRPC | @@ -88,7 +90,7 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi │ ┌─────────┴─────────┐ │ │ │ NATS Client │ │ │ │ (JetStream + │ │ -│ │ MessagePack) │ │ +│ │ Protobuf) │ │ │ └─────────┬─────────┘ │ └────────────────────────┼────────────────────────────────────────┘ │ @@ -130,8 +132,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi ## Links * Related to [ADR-0003](0003-use-nats-for-messaging.md) (NATS messaging) -* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding) +* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding — superseded by Protocol Buffers, see [ADR-0061](0061-go-handler-refactor.md)) * Related to [ADR-0011](0011-kuberay-unified-gpu-backend.md) (Ray Serve backend) * Related to [ADR-0028](0028-authentik-sso-strategy.md) (OAuth/OIDC) +* Related to [ADR-0061](0061-go-handler-refactor.md) (Go handler refactor — handler-base shared module, protobuf wire format) * [HTMX Documentation](https://htmx.org/docs/) * [VRM Specification](https://vrm.dev/en/) diff --git a/decisions/0059-mac-mini-ray-worker.md b/decisions/0059-mac-mini-ray-worker.md index a7d8b83..ead643b 100644 --- a/decisions/0059-mac-mini-ray-worker.md +++ b/decisions/0059-mac-mini-ray-worker.md @@ -1,8 +1,8 @@ # Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation -* Status: proposed +* Status: accepted * Date: 2026-02-16 -* Updated: 2026-02-21 +* Updated: 2026-02-23 * Deciders: Billy * Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role @@ -25,14 +25,15 @@ How should we use waterdeep to maximise the 3D avatar creation pipeline for comp * Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance * waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting * 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping -* VS Code with Copilot agent mode can drive BlenderMCP locally with zero-latency socket communication (localhost:9876) +* VS Code with Copilot agent mode and BlenderMCP server are installed on waterdeep — VS Code drives Blender via localhost:9876 with zero-latency socket communication * Exported VRM models must reach gravenhollow for production serving ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md)) +* **rclone** chosen for asset promotion to gravenhollow's RustFS S3 endpoint — simpler than NFS mounts on macOS, consistent with existing Kasm rclone patterns, and avoids autofs/NFS fstab complexity * The Kasm Blender workflow from ADR-0062 remains available as a fallback (browser-based, no local install required) * ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified ## Considered Options -1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via NFS/rclone +1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via rclone (S3) 2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training 3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062 @@ -45,17 +46,18 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min * Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting * Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution * 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure -* VS Code + Copilot agent mode runs natively with full local context for both code and Blender commands +* VS Code + Copilot agent mode + BlenderMCP server installed natively — single editor drives both code and Blender commands +* rclone for asset promotion — consistent with Kasm rclone patterns, avoids macOS NFS/autofs complexity * Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload * Kasm Blender remains available as a browser-based fallback for remote/mobile access * Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend ### Negative Consequences -* Blender + add-ons must be installed and maintained locally on waterdeep -* Assets created locally need explicit promotion to gravenhollow (vs Kasm's automatic rclone to Quobyte S3) +* Blender, VS Code, and add-ons must be installed and maintained locally on waterdeep via Homebrew +* Assets created locally need explicit `rclone copy` to promote to gravenhollow (vs Kasm's automatic rclone to Quobyte S3) * waterdeep is a single machine — no redundancy for the 3D creation workflow -* Not managed by Kubernetes or GitOps — relies on manual or Homebrew-managed tooling +* Not managed by Kubernetes or GitOps — relies on Homebrew-managed tooling ## Pros and Cons of the Options @@ -67,8 +69,8 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min * Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer * Good, because waterdeep stays a dev workstation, aligning with its named role * Bad, because local-only — no browser-based remote access (use Kasm for that) -* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP) -* Bad, because asset promotion to gravenhollow requires explicit action +* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP, VS Code) +* Bad, because asset promotion to gravenhollow requires explicit rclone command ### Option 2: External Ray worker on macOS (original proposal) @@ -119,8 +121,8 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min │ │ └── textures/ (shared texture library) │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ -│ NFS mount or rclone │ -│ (asset promotion) │ +│ rclone (S3 asset promotion) │ +│ gravenhollow RustFS :30292 │ └──────────────────────────┼──────────────────────────────────────────────┘ │ ▼ @@ -200,24 +202,9 @@ curl -LsSf https://astral.sh/uv/install.sh | sh uvx blender-mcp --help ``` -### 4. NFS Mount for Asset Promotion +### 4. rclone for Asset Promotion -Mount gravenhollow's avatar-models directory for direct promotion of finished VRM exports: - -```bash -# Create mount point -sudo mkdir -p /Volumes/avatar-models - -# Mount gravenhollow NFS (all-SSD, dual 10GbE) -sudo mount -t nfs \ - gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models \ - /Volumes/avatar-models - -# Add to /etc/auto_master for persistent mount (macOS autofs) -# /Volumes/avatar-models -fstype=nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models -``` - -Alternatively, use rclone for S3-based promotion: +Use rclone to promote finished VRM exports to gravenhollow's RustFS S3 endpoint. This is consistent with the Kasm rclone volume plugin pattern from [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) and avoids macOS NFS/autofs complexity. ```bash # Install rclone @@ -232,8 +219,13 @@ rclone config create gravenhollow s3 \ # Promote a finished VRM rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/ + +# Sync all exports (idempotent) +rclone sync ~/blender-avatars/exports/ gravenhollow:avatar-models/ --exclude "*.blend" ``` +> **Why rclone over NFS?** macOS autofs/NFS mounts are fragile across reboots and network changes. rclone is a single binary, works over HTTPS, and matches the promotion pattern already used in Kasm workflows. The explicit `rclone copy` command also serves as a deliberate promotion gate — only intentionally promoted models reach production. + ### 5. Avatar Creation Workflow (waterdeep) 1. **Open Blender** on waterdeep (native Metal-accelerated) @@ -245,9 +237,9 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models - _"Rig this character for VRM export with standard humanoid bones"_ - _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_ 5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps -6. **Promote** the finished VRM to gravenhollow: +6. **Promote** the finished VRM to gravenhollow via rclone: ```bash - cp ~/blender-avatars/exports/Silver-Mage-v1.vrm /Volumes/avatar-models/ + rclone copy ~/blender-avatars/exports/Silver-Mage-v1.vrm gravenhollow:avatar-models/ ``` 7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit @@ -260,7 +252,7 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models | **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container | | **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation | | **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound | -| **Asset promotion** | NFS mount or rclone to gravenhollow | Auto rclone to Quobyte S3 → manual promote to gravenhollow | +| **Asset promotion** | rclone to gravenhollow RustFS S3 | Auto rclone to Quobyte S3 → manual promote to gravenhollow | | **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere | | **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image | | **Use when** | Primary creation workflow | Remote access, quick edits, mobile | @@ -278,7 +270,7 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models * **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation * **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature -* **Automated promotion**: A file watcher (fswatch/launchd) could auto-promote VRM exports from `~/blender-avatars/exports/` to gravenhollow when a new file appears +* **Automated promotion**: A file watcher (fswatch/launchd) could auto-run `rclone sync` when a new VRM appears in `~/blender-avatars/exports/` * **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow ## Links