updates to adrs and fixing to reflect go refactor.
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m2s
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m2s
This commit is contained in:
@@ -22,13 +22,13 @@ You are working on a **homelab Kubernetes cluster** running:
|
|||||||
|
|
||||||
| Repo | Purpose |
|
| Repo | Purpose |
|
||||||
|------|---------|
|
|------|---------|
|
||||||
| `handler-base` | Shared Python library for NATS handlers |
|
| `handler-base` | Shared Go module for NATS handlers (protobuf, health, OTel, clients) |
|
||||||
| `chat-handler` | Text chat with RAG pipeline |
|
| `chat-handler` | Text chat with RAG pipeline (Go) |
|
||||||
| `voice-assistant` | Voice pipeline (STT → RAG → LLM → TTS) |
|
| `voice-assistant` | Voice pipeline: STT → RAG → LLM → TTS (Go) |
|
||||||
| `kuberay-images` | GPU-specific Ray worker Docker images |
|
| `kuberay-images` | GPU-specific Ray worker Docker images |
|
||||||
| `pipeline-bridge` | Bridge between pipelines and services |
|
| `pipeline-bridge` | Bridge between pipelines and services (Go) |
|
||||||
| `stt-module` | Speech-to-text service |
|
| `stt-module` | Speech-to-text service (Go) |
|
||||||
| `tts-module` | Text-to-speech service |
|
| `tts-module` | Text-to-speech service (Go) |
|
||||||
| `ray-serve` | Ray Serve inference services |
|
| `ray-serve` | Ray Serve inference services |
|
||||||
| `argo` | Argo Workflows (training, batch inference) |
|
| `argo` | Argo Workflows (training, batch inference) |
|
||||||
| `kubeflow` | Kubeflow Pipeline definitions |
|
| `kubeflow` | Kubeflow Pipeline definitions |
|
||||||
@@ -48,7 +48,7 @@ You are working on a **homelab Kubernetes cluster** running:
|
|||||||
┌─────────────────────────────────────────────────────────────────┐
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
│ NATS MESSAGE BUS │
|
│ NATS MESSAGE BUS │
|
||||||
│ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │
|
│ Subjects: ai.chat.*, ai.voice.*, ai.pipeline.* │
|
||||||
│ Format: MessagePack (binary) │
|
│ Format: Protocol Buffers (binary, see ADR-0061) │
|
||||||
└───────────────────────────┬─────────────────────────────────────┘
|
└───────────────────────────┬─────────────────────────────────────┘
|
||||||
│
|
│
|
||||||
┌───────────────────┼───────────────────┐
|
┌───────────────────┼───────────────────┐
|
||||||
@@ -93,19 +93,23 @@ talos/
|
|||||||
### AI/ML Services (Gitea daviestechlabs org)
|
### AI/ML Services (Gitea daviestechlabs org)
|
||||||
|
|
||||||
```
|
```
|
||||||
handler-base/ # Shared handler library
|
handler-base/ # Shared Go module (NATS, health, OTel, protobuf)
|
||||||
├── handler_base/ # Core classes
|
├── clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker)
|
||||||
│ ├── handler.py # Base Handler class
|
├── config/ # Env-based configuration (struct tags)
|
||||||
│ ├── nats_client.py # NATS wrapper
|
├── gen/messagespb/ # Generated protobuf stubs
|
||||||
│ └── clients/ # Service clients (STT, TTS, LLM, etc.)
|
├── handler/ # Typed NATS message handler
|
||||||
|
├── health/ # HTTP health + readiness server
|
||||||
|
└── natsutil/ # NATS publish/request with protobuf
|
||||||
|
|
||||||
chat-handler/ # RAG chat service
|
chat-handler/ # RAG chat service (Go)
|
||||||
├── chat_handler_v2.py # Handler-base version
|
├── main.go
|
||||||
└── Dockerfile.v2
|
├── main_test.go
|
||||||
|
└── Dockerfile
|
||||||
|
|
||||||
voice-assistant/ # Voice pipeline service
|
voice-assistant/ # Voice pipeline service (Go)
|
||||||
├── voice_assistant_v2.py # Handler-base version
|
├── main.go
|
||||||
└── pipelines/voice_pipeline.py
|
├── main_test.go
|
||||||
|
└── Dockerfile
|
||||||
|
|
||||||
argo/ # Argo WorkflowTemplates
|
argo/ # Argo WorkflowTemplates
|
||||||
├── batch-inference.yaml
|
├── batch-inference.yaml
|
||||||
@@ -127,8 +131,23 @@ kuberay-images/ # GPU worker images
|
|||||||
|
|
||||||
## 🔌 Service Endpoints (Internal)
|
## 🔌 Service Endpoints (Internal)
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Copy-paste ready for Go handler services
|
||||||
|
const (
|
||||||
|
NATSUrl = "nats://nats.ai-ml.svc.cluster.local:4222"
|
||||||
|
VLLMUrl = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
|
||||||
|
WhisperUrl = "http://whisper-predictor.ai-ml.svc.cluster.local"
|
||||||
|
TTSUrl = "http://tts-predictor.ai-ml.svc.cluster.local"
|
||||||
|
EmbeddingsUrl = "http://embeddings-predictor.ai-ml.svc.cluster.local"
|
||||||
|
RerankerUrl = "http://reranker-predictor.ai-ml.svc.cluster.local"
|
||||||
|
MilvusHost = "milvus.ai-ml.svc.cluster.local"
|
||||||
|
MilvusPort = 19530
|
||||||
|
ValkeyUrl = "redis://valkey.ai-ml.svc.cluster.local:6379"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Copy-paste ready for Python code
|
# For Python services (Ray Serve, Kubeflow pipelines, Gradio UIs)
|
||||||
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
|
NATS_URL = "nats://nats.ai-ml.svc.cluster.local:4222"
|
||||||
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
|
VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000/v1"
|
||||||
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
|
WHISPER_URL = "http://whisper-predictor.ai-ml.svc.cluster.local"
|
||||||
@@ -175,7 +194,7 @@ f"ai.pipeline.status.{request_id}" # Status updates
|
|||||||
|
|
||||||
### Add a New NATS Handler
|
### Add a New NATS Handler
|
||||||
|
|
||||||
1. Create handler repo or add to existing (use `handler-base` library)
|
1. Create Go handler repo using `handler-base` module (see [ADR-0061](decisions/0061-go-handler-refactor.md))
|
||||||
2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
|
2. Add K8s Deployment in `homelab-k8s2/kubernetes/apps/ai-ml/`
|
||||||
3. Push to main → Flux deploys automatically
|
3. Push to main → Flux deploys automatically
|
||||||
|
|
||||||
|
|||||||
@@ -44,7 +44,7 @@ The homelab is a production-grade Kubernetes cluster running on bare-metal hardw
|
|||||||
│ │ • AI_PIPELINE (24h, file) - Workflow triggers │ │
|
│ │ • AI_PIPELINE (24h, file) - Workflow triggers │ │
|
||||||
│ └─────────────────────────────────────────────────────────────────────┘ │
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
||||||
│ │
|
│ │
|
||||||
│ Message Format: MessagePack (binary, not JSON) │
|
│ Message Format: Protocol Buffers (binary, see ADR-0061) │
|
||||||
└─────────────────────────────────────────────────────────────────────────────┘
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
│
|
│
|
||||||
┌─────────────────────────┼─────────────────────────┐
|
┌─────────────────────────┼─────────────────────────┐
|
||||||
@@ -312,12 +312,12 @@ Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafan
|
|||||||
|----------|-----------|-----|
|
|----------|-----------|-----|
|
||||||
| Talos Linux | Immutable, API-driven, secure | [ADR-0002](decisions/0002-use-talos-linux.md) |
|
| Talos Linux | Immutable, API-driven, secure | [ADR-0002](decisions/0002-use-talos-linux.md) |
|
||||||
| NATS over Kafka | Simpler ops, sufficient throughput | [ADR-0003](decisions/0003-use-nats-for-messaging.md) |
|
| NATS over Kafka | Simpler ops, sufficient throughput | [ADR-0003](decisions/0003-use-nats-for-messaging.md) |
|
||||||
| MessagePack over JSON | Binary efficiency for audio | [ADR-0004](decisions/0004-use-messagepack-for-nats.md) |
|
| Protocol Buffers over MessagePack | Type-safe, schema-driven, Go-native | [ADR-0061](decisions/0061-go-handler-refactor.md) |
|
||||||
| Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) |
|
| Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) |
|
||||||
| GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
|
| GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
|
||||||
| KServe for inference | Standardized API, autoscaling | [ADR-0007](decisions/0007-use-kserve-for-inference.md) |
|
| KServe for inference | Standardized API, autoscaling | [ADR-0007](decisions/0007-use-kserve-for-inference.md) |
|
||||||
| KubeRay unified backend | Fractional GPU, single endpoint | [ADR-0011](decisions/0011-kuberay-unified-gpu-backend.md) |
|
| KubeRay unified backend | Fractional GPU, single endpoint | [ADR-0011](decisions/0011-kuberay-unified-gpu-backend.md) |
|
||||||
| Go handler refactor | Slim images for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) |
|
| Go handler refactor | Slim images, type-safe protobuf for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) |
|
||||||
|
|
||||||
## Related Documents
|
## Related Documents
|
||||||
|
|
||||||
|
|||||||
@@ -28,27 +28,29 @@ kubernetes/
|
|||||||
### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
|
### AI/ML Repos (git.daviestechlabs.io/daviestechlabs)
|
||||||
|
|
||||||
```
|
```
|
||||||
handler-base/ # Shared library for all handlers
|
handler-base/ # Shared Go module for all NATS handlers
|
||||||
├── handler_base/
|
├── clients/ # HTTP clients (LLM, STT, TTS, embeddings, reranker)
|
||||||
│ ├── handler.py # Base Handler class
|
├── config/ # Env-based configuration (struct tags)
|
||||||
│ ├── nats_client.py # NATS wrapper
|
├── gen/messagespb/ # Generated protobuf stubs
|
||||||
│ ├── config.py # Pydantic Settings
|
├── handler/ # Typed NATS message handler with OTel + health wiring
|
||||||
│ ├── health.py # K8s probes
|
├── health/ # HTTP health + readiness server
|
||||||
│ ├── telemetry.py # OpenTelemetry
|
├── messages/ # Type aliases from generated protobuf stubs
|
||||||
│ └── clients/ # Service clients
|
├── natsutil/ # NATS publish/request with protobuf encoding
|
||||||
├── tests/
|
├── proto/messages/v1/ # .proto schema source
|
||||||
└── pyproject.toml
|
├── go.mod
|
||||||
|
└── buf.yaml # buf protobuf toolchain config
|
||||||
|
|
||||||
chat-handler/ # Text chat service
|
chat-handler/ # Text chat service (Go)
|
||||||
voice-assistant/ # Voice pipeline service
|
voice-assistant/ # Voice pipeline service (Go)
|
||||||
pipeline-bridge/ # Workflow engine bridge
|
pipeline-bridge/ # Workflow engine bridge (Go)
|
||||||
├── {name}.py # Handler implementation (uses handler-base)
|
stt-module/ # Speech-to-text bridge (Go)
|
||||||
├── pyproject.toml # PEP 621 project metadata (see ADR-0012)
|
tts-module/ # Text-to-speech bridge (Go)
|
||||||
├── uv.lock # Deterministic lock file
|
├── main.go # Service entry point
|
||||||
├── tests/
|
├── main_test.go # Unit tests
|
||||||
│ ├── conftest.py
|
├── e2e_test.go # End-to-end tests
|
||||||
│ └── test_{name}.py
|
├── go.mod # Go module (depends on handler-base)
|
||||||
└── Dockerfile
|
├── Dockerfile # Distroless container (~20 MB)
|
||||||
|
└── renovate.json # Dependency update config
|
||||||
|
|
||||||
argo/ # Argo WorkflowTemplates
|
argo/ # Argo WorkflowTemplates
|
||||||
├── {workflow-name}.yaml
|
├── {workflow-name}.yaml
|
||||||
@@ -138,7 +140,20 @@ tts_task = synthesize_speech(text=llm_task.output) # noqa: F841
|
|||||||
|
|
||||||
### Project Structure
|
### Project Structure
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Go handler services use handler-base shared module
|
||||||
|
import (
|
||||||
|
"git.daviestechlabs.io/daviestechlabs/handler-base/clients"
|
||||||
|
"git.daviestechlabs.io/daviestechlabs/handler-base/config"
|
||||||
|
"git.daviestechlabs.io/daviestechlabs/handler-base/handler"
|
||||||
|
"git.daviestechlabs.io/daviestechlabs/handler-base/health"
|
||||||
|
"git.daviestechlabs.io/daviestechlabs/handler-base/messages"
|
||||||
|
"git.daviestechlabs.io/daviestechlabs/handler-base/natsutil"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
# Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs
|
||||||
# Use async/await for I/O
|
# Use async/await for I/O
|
||||||
async def handle_message(msg: Msg) -> None:
|
async def handle_message(msg: Msg) -> None:
|
||||||
...
|
...
|
||||||
@@ -149,10 +164,6 @@ class ChatRequest:
|
|||||||
user_id: str
|
user_id: str
|
||||||
message: str
|
message: str
|
||||||
enable_rag: bool = True
|
enable_rag: bool = True
|
||||||
|
|
||||||
# Use msgpack for NATS messages
|
|
||||||
import msgpack
|
|
||||||
data = msgpack.packb({"key": "value"})
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Naming
|
### Naming
|
||||||
@@ -200,31 +211,36 @@ except Exception as e:
|
|||||||
|
|
||||||
### NATS Message Handling
|
### NATS Message Handling
|
||||||
|
|
||||||
```python
|
All NATS handler services use Go with Protocol Buffers encoding (see [ADR-0061](decisions/0061-go-handler-refactor.md)):
|
||||||
import nats
|
|
||||||
import msgpack
|
|
||||||
|
|
||||||
async def message_handler(msg: Msg) -> None:
|
```go
|
||||||
try:
|
// Go NATS handler (production pattern)
|
||||||
# Decode MessagePack
|
func (h *Handler) handleMessage(msg *nats.Msg) {
|
||||||
data = msgpack.unpackb(msg.data, raw=False)
|
var req messages.ChatRequest
|
||||||
|
if err := proto.Unmarshal(msg.Data, &req); err != nil {
|
||||||
|
h.logger.Error("failed to unmarshal", "error", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
# Process
|
// Process
|
||||||
result = await process(data)
|
result, err := h.process(ctx, &req)
|
||||||
|
if err != nil {
|
||||||
|
h.logger.Error("handler error", "error", err)
|
||||||
|
msg.Nak()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
# Reply if request-reply pattern
|
// Reply if request-reply pattern
|
||||||
if msg.reply:
|
if msg.Reply != "" {
|
||||||
await msg.respond(msgpack.packb(result))
|
data, _ := proto.Marshal(result)
|
||||||
|
msg.Respond(data)
|
||||||
# Acknowledge for JetStream
|
}
|
||||||
await msg.ack()
|
msg.Ack()
|
||||||
|
}
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Handler error: {e}")
|
|
||||||
# NAK for retry (JetStream)
|
|
||||||
await msg.nak()
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> **Python NATS** is still used in Ray Serve `runtime_env` and Kubeflow pipeline components where needed, but all dedicated NATS handler services are Go.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Kubernetes Manifest Conventions
|
## Kubernetes Manifest Conventions
|
||||||
@@ -499,8 +515,9 @@ Each application should have a README with:
|
|||||||
| Use `latest` image tags | Pin to specific versions |
|
| Use `latest` image tags | Pin to specific versions |
|
||||||
| Skip health checks | Always define liveness/readiness |
|
| Skip health checks | Always define liveness/readiness |
|
||||||
| Ignore resource limits | Set appropriate requests/limits |
|
| Ignore resource limits | Set appropriate requests/limits |
|
||||||
| Use JSON for NATS messages | Use MessagePack (binary) |
|
| Use JSON for NATS messages | Use Protocol Buffers (see ADR-0061) |
|
||||||
| Synchronous I/O in handlers | Use async/await |
|
| Write handler services in Python | Use Go with handler-base module (ADR-0061) |
|
||||||
|
| Synchronous I/O in handlers | Use goroutines / async patterns |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -117,9 +117,14 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca
|
|||||||
|
|
||||||
| Application | Language | Framework | Purpose |
|
| Application | Language | Framework | Purpose |
|
||||||
|-------------|----------|-----------|---------|
|
|-------------|----------|-----------|---------|
|
||||||
| Companions | Go | net/http + HTMX | AI chat interface |
|
| Companions | Go | net/http + HTMX | AI chat interface (SSR) |
|
||||||
| Voice WebApp | Python | Gradio | Voice assistant UI |
|
| Chat Handler | Go | handler-base | RAG + LLM text pipeline |
|
||||||
| Various handlers | Python | asyncio + nats.py | NATS event handlers |
|
| Voice Assistant | Go | handler-base | STT → RAG → LLM → TTS pipeline |
|
||||||
|
| Pipeline Bridge | Go | handler-base | Kubeflow/Argo workflow triggers |
|
||||||
|
| STT Module | Go | handler-base | Speech-to-text bridge |
|
||||||
|
| TTS Module | Go | handler-base | Text-to-speech bridge |
|
||||||
|
| Voice WebApp | Python | Gradio | Voice assistant UI (dev/testing) |
|
||||||
|
| Ray Serve | Python | Ray Serve | GPU inference endpoints |
|
||||||
|
|
||||||
### Frontend
|
### Frontend
|
||||||
|
|
||||||
@@ -242,27 +247,41 @@ All AI inference runs on a unified Ray Serve endpoint with fractional GPU alloca
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Python Dependencies (handler-base)
|
## Go Dependencies (handler-base)
|
||||||
|
|
||||||
Core library for all NATS handlers: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
|
Shared Go module for all NATS handler services: [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base)
|
||||||
|
|
||||||
|
```go
|
||||||
|
// go.mod (handler-base v1.0.0)
|
||||||
|
require (
|
||||||
|
github.com/nats-io/nats.go // NATS client
|
||||||
|
google.golang.org/protobuf // Protocol Buffers encoding
|
||||||
|
github.com/zitadel/oidc/v3 // OIDC client
|
||||||
|
go.opentelemetry.io/otel // OpenTelemetry traces + metrics
|
||||||
|
github.com/milvus-io/milvus-sdk-go // Milvus vector search
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
See [ADR-0061](decisions/0061-go-handler-refactor.md) for the full refactoring rationale.
|
||||||
|
|
||||||
|
## Python Dependencies (ML/AI only)
|
||||||
|
|
||||||
|
Python is retained for ML inference, pipeline orchestration, and dev tools:
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
# Core
|
# ray-serve (GPU inference)
|
||||||
nats-py>=2.7.0 # NATS client
|
ray[serve]>=2.53.0
|
||||||
msgpack>=1.0.0 # Binary serialization
|
vllm>=0.8.0
|
||||||
httpx>=0.27.0 # HTTP client
|
faster-whisper>=1.0.0
|
||||||
|
TTS>=0.22.0
|
||||||
|
sentence-transformers>=3.0.0
|
||||||
|
|
||||||
# ML/AI
|
# kubeflow (pipeline definitions)
|
||||||
pymilvus>=2.4.0 # Milvus client
|
kfp>=2.12.1
|
||||||
openai>=1.0.0 # vLLM OpenAI API
|
|
||||||
|
|
||||||
# Observability
|
# mlflow (experiment tracking)
|
||||||
opentelemetry-api>=1.20.0
|
mlflow>=3.7.0
|
||||||
opentelemetry-sdk>=1.20.0
|
pymilvus>=2.4.0
|
||||||
mlflow>=2.10.0 # Experiment tracking
|
|
||||||
|
|
||||||
# Kubeflow (kubeflow repo)
|
|
||||||
kfp>=2.12.1 # Pipeline SDK
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
@@ -1,10 +1,12 @@
|
|||||||
# Python Module Deployment Strategy
|
# Python Module Deployment Strategy
|
||||||
|
|
||||||
* Status: accepted
|
* Status: superseded by [ADR-0061](0061-go-handler-refactor.md)
|
||||||
* Date: 2026-02-02
|
* Date: 2026-02-02
|
||||||
* Deciders: Billy
|
* Deciders: Billy
|
||||||
* Technical Story: Define how Python handler modules are packaged and deployed to Kubernetes
|
* Technical Story: Define how Python handler modules are packaged and deployed to Kubernetes
|
||||||
|
|
||||||
|
> **Note (2026-02-23):** This ADR described deploying Python handlers as Ray Serve applications inside the Ray cluster. [ADR-0061](0061-go-handler-refactor.md) supersedes this approach — all five handler services (chat-handler, voice-assistant, pipeline-bridge, tts-module, stt-module) have been rewritten in Go and now deploy as standalone Kubernetes Deployments with distroless container images (~20 MB each). The Ray cluster is exclusively used for GPU inference workloads. The handler-base shared library is now a Go module published at `git.daviestechlabs.io/daviestechlabs/handler-base` using Protocol Buffers for NATS message encoding.
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
||||||
We have Python modules for AI/ML workflows that need to run on our unified GPU cluster:
|
We have Python modules for AI/ML workflows that need to run on our unified GPU cluster:
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ How do we build a performant, maintainable frontend that integrates with the NAT
|
|||||||
## Decision Drivers
|
## Decision Drivers
|
||||||
|
|
||||||
* Real-time streaming for chat and voice (WebSocket required)
|
* Real-time streaming for chat and voice (WebSocket required)
|
||||||
* Direct integration with NATS JetStream (binary MessagePack protocol)
|
* Direct integration with NATS JetStream (Protocol Buffers encoding, see [ADR-0061](0061-go-handler-refactor.md))
|
||||||
* Minimal client-side JavaScript (~20KB gzipped target)
|
* Minimal client-side JavaScript (~20KB gzipped target)
|
||||||
* No frontend build step (no webpack/vite/node required)
|
* No frontend build step (no webpack/vite/node required)
|
||||||
* 3D avatar rendering for immersive experience
|
* 3D avatar rendering for immersive experience
|
||||||
@@ -39,8 +39,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
|
|||||||
* No npm, no webpack, no build step — assets served directly
|
* No npm, no webpack, no build step — assets served directly
|
||||||
* Server-side rendering via Go templates
|
* Server-side rendering via Go templates
|
||||||
* WebSocket handled natively in Go (gorilla/websocket)
|
* WebSocket handled natively in Go (gorilla/websocket)
|
||||||
* NATS integration with MessagePack in the same binary
|
* NATS integration with Protocol Buffers in the same binary
|
||||||
* Distroless container image for minimal attack surface
|
* Distroless container image for minimal attack surface
|
||||||
|
* Type-safe NATS messages via handler-base shared Go module (protobuf stubs)
|
||||||
|
|
||||||
### Negative Consequences
|
### Negative Consequences
|
||||||
|
|
||||||
@@ -58,8 +59,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
|
|||||||
| Client state | Alpine.js 3 | Lightweight reactive UI for local state |
|
| Client state | Alpine.js 3 | Lightweight reactive UI for local state |
|
||||||
| 3D Avatars | Three.js + VRM | 3D character rendering with lip-sync |
|
| 3D Avatars | Three.js + VRM | 3D character rendering with lip-sync |
|
||||||
| Styling | Tailwind CSS 4 + DaisyUI | Utility-first CSS with component library |
|
| Styling | Tailwind CSS 4 + DaisyUI | Utility-first CSS with component library |
|
||||||
| Messaging | NATS JetStream | Real-time pub/sub with MessagePack encoding |
|
| Messaging | NATS JetStream | Real-time pub/sub with Protocol Buffers encoding |
|
||||||
| Auth | golang-jwt/jwt/v5 | JWT token handling for OAuth flows |
|
| Auth | golang-jwt/jwt/v5 | JWT token handling for OAuth flows |
|
||||||
|
| Shared lib | handler-base (Go module) | NATS client, protobuf messages, health, OTel, HTTP clients |
|
||||||
| Database | PostgreSQL (lib/pq) + SQLite | Persistent + local session storage |
|
| Database | PostgreSQL (lib/pq) + SQLite | Persistent + local session storage |
|
||||||
| Observability | OpenTelemetry SDK | Traces, metrics via OTLP gRPC |
|
| Observability | OpenTelemetry SDK | Traces, metrics via OTLP gRPC |
|
||||||
|
|
||||||
@@ -88,7 +90,7 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
|
|||||||
│ ┌─────────┴─────────┐ │
|
│ ┌─────────┴─────────┐ │
|
||||||
│ │ NATS Client │ │
|
│ │ NATS Client │ │
|
||||||
│ │ (JetStream + │ │
|
│ │ (JetStream + │ │
|
||||||
│ │ MessagePack) │ │
|
│ │ Protobuf) │ │
|
||||||
│ └─────────┬─────────┘ │
|
│ └─────────┬─────────┘ │
|
||||||
└────────────────────────┼────────────────────────────────────────┘
|
└────────────────────────┼────────────────────────────────────────┘
|
||||||
│
|
│
|
||||||
@@ -130,8 +132,9 @@ Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provi
|
|||||||
## Links
|
## Links
|
||||||
|
|
||||||
* Related to [ADR-0003](0003-use-nats-for-messaging.md) (NATS messaging)
|
* Related to [ADR-0003](0003-use-nats-for-messaging.md) (NATS messaging)
|
||||||
* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding)
|
* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding — superseded by Protocol Buffers, see [ADR-0061](0061-go-handler-refactor.md))
|
||||||
* Related to [ADR-0011](0011-kuberay-unified-gpu-backend.md) (Ray Serve backend)
|
* Related to [ADR-0011](0011-kuberay-unified-gpu-backend.md) (Ray Serve backend)
|
||||||
* Related to [ADR-0028](0028-authentik-sso-strategy.md) (OAuth/OIDC)
|
* Related to [ADR-0028](0028-authentik-sso-strategy.md) (OAuth/OIDC)
|
||||||
|
* Related to [ADR-0061](0061-go-handler-refactor.md) (Go handler refactor — handler-base shared module, protobuf wire format)
|
||||||
* [HTMX Documentation](https://htmx.org/docs/)
|
* [HTMX Documentation](https://htmx.org/docs/)
|
||||||
* [VRM Specification](https://vrm.dev/en/)
|
* [VRM Specification](https://vrm.dev/en/)
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
# Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation
|
# Mac Mini M4 Pro (waterdeep) as Local AI Agent for 3D Avatar Creation
|
||||||
|
|
||||||
* Status: proposed
|
* Status: accepted
|
||||||
* Date: 2026-02-16
|
* Date: 2026-02-16
|
||||||
* Updated: 2026-02-21
|
* Updated: 2026-02-23
|
||||||
* Deciders: Billy
|
* Deciders: Billy
|
||||||
* Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role
|
* Technical Story: Use waterdeep as a dedicated local AI workstation for BlenderMCP-driven 3D avatar creation, replacing the previously proposed Ray worker role
|
||||||
|
|
||||||
@@ -25,14 +25,15 @@ How should we use waterdeep to maximise the 3D avatar creation pipeline for comp
|
|||||||
* Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance
|
* Blender on Kasm is CPU-rendered inside DinD — no Metal/Vulkan/CUDA GPU access, poor viewport performance
|
||||||
* waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting
|
* waterdeep has a 16-core Apple GPU with Metal support — Blender's Metal backend enables real-time viewport rendering, Cycles GPU rendering, and smooth sculpting
|
||||||
* 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping
|
* 48 GB unified memory means Blender, VS Code, and the MCP server can all run simultaneously without swapping
|
||||||
* VS Code with Copilot agent mode can drive BlenderMCP locally with zero-latency socket communication (localhost:9876)
|
* VS Code with Copilot agent mode and BlenderMCP server are installed on waterdeep — VS Code drives Blender via localhost:9876 with zero-latency socket communication
|
||||||
* Exported VRM models must reach gravenhollow for production serving ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md))
|
* Exported VRM models must reach gravenhollow for production serving ([ADR-0062](0062-blender-mcp-3d-avatar-workflow.md))
|
||||||
|
* **rclone** chosen for asset promotion to gravenhollow's RustFS S3 endpoint — simpler than NFS mounts on macOS, consistent with existing Kasm rclone patterns, and avoids autofs/NFS fstab complexity
|
||||||
* The Kasm Blender workflow from ADR-0062 remains available as a fallback (browser-based, no local install required)
|
* The Kasm Blender workflow from ADR-0062 remains available as a fallback (browser-based, no local install required)
|
||||||
* ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified
|
* ray cluster GPU fleet is fully allocated and stable — adding MPS complexity is not justified
|
||||||
|
|
||||||
## Considered Options
|
## Considered Options
|
||||||
|
|
||||||
1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via NFS/rclone
|
1. **Local AI agent on waterdeep** — Blender + BlenderMCP + VS Code natively on macOS, promoting assets to gravenhollow via rclone (S3)
|
||||||
2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training
|
2. **External Ray worker on macOS** (original proposal) — join the Ray cluster for inference and training
|
||||||
3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062
|
3. **Keep Kasm-only workflow** — rely entirely on the browser-based Kasm Blender workstation from ADR-0062
|
||||||
|
|
||||||
@@ -45,17 +46,18 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
|
|||||||
* Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting
|
* Metal GPU acceleration — real-time Eevee viewport, GPU-accelerated Cycles rendering, smooth 60fps sculpting
|
||||||
* Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution
|
* Zero-latency MCP — BlenderMCP socket (localhost:9876) has no network hop, instant command execution
|
||||||
* 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure
|
* 48 GB unified memory — large Blender scenes, multiple VRM models open simultaneously, no swap pressure
|
||||||
* VS Code + Copilot agent mode runs natively with full local context for both code and Blender commands
|
* VS Code + Copilot agent mode + BlenderMCP server installed natively — single editor drives both code and Blender commands
|
||||||
|
* rclone for asset promotion — consistent with Kasm rclone patterns, avoids macOS NFS/autofs complexity
|
||||||
* Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload
|
* Remaining a dev workstation — avatar creation is a creative dev workflow, not a server workload
|
||||||
* Kasm Blender remains available as a browser-based fallback for remote/mobile access
|
* Kasm Blender remains available as a browser-based fallback for remote/mobile access
|
||||||
* Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend
|
* Simpler than the Ray worker approach — no cluster integration, no GCS port exposure, no experimental MPS backend
|
||||||
|
|
||||||
### Negative Consequences
|
### Negative Consequences
|
||||||
|
|
||||||
* Blender + add-ons must be installed and maintained locally on waterdeep
|
* Blender, VS Code, and add-ons must be installed and maintained locally on waterdeep via Homebrew
|
||||||
* Assets created locally need explicit promotion to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
|
* Assets created locally need explicit `rclone copy` to promote to gravenhollow (vs Kasm's automatic rclone to Quobyte S3)
|
||||||
* waterdeep is a single machine — no redundancy for the 3D creation workflow
|
* waterdeep is a single machine — no redundancy for the 3D creation workflow
|
||||||
* Not managed by Kubernetes or GitOps — relies on manual or Homebrew-managed tooling
|
* Not managed by Kubernetes or GitOps — relies on Homebrew-managed tooling
|
||||||
|
|
||||||
## Pros and Cons of the Options
|
## Pros and Cons of the Options
|
||||||
|
|
||||||
@@ -67,8 +69,8 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
|
|||||||
* Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer
|
* Good, because no experimental backends (MPS/vLLM) — using Blender's mature Metal renderer
|
||||||
* Good, because waterdeep stays a dev workstation, aligning with its named role
|
* Good, because waterdeep stays a dev workstation, aligning with its named role
|
||||||
* Bad, because local-only — no browser-based remote access (use Kasm for that)
|
* Bad, because local-only — no browser-based remote access (use Kasm for that)
|
||||||
* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP)
|
* Bad, because manual tool installation (Blender, VRM add-on, BlenderMCP, VS Code)
|
||||||
* Bad, because asset promotion to gravenhollow requires explicit action
|
* Bad, because asset promotion to gravenhollow requires explicit rclone command
|
||||||
|
|
||||||
### Option 2: External Ray worker on macOS (original proposal)
|
### Option 2: External Ray worker on macOS (original proposal)
|
||||||
|
|
||||||
@@ -119,8 +121,8 @@ Chosen option: **Option 1 — Local AI agent on waterdeep**, because the Mac Min
|
|||||||
│ │ └── textures/ (shared texture library) │ │
|
│ │ └── textures/ (shared texture library) │ │
|
||||||
│ └──────────────────────────────────────────────────────┘ │
|
│ └──────────────────────────────────────────────────────┘ │
|
||||||
│ │ │
|
│ │ │
|
||||||
│ NFS mount or rclone │
|
│ rclone (S3 asset promotion) │
|
||||||
│ (asset promotion) │
|
│ gravenhollow RustFS :30292 │
|
||||||
└──────────────────────────┼──────────────────────────────────────────────┘
|
└──────────────────────────┼──────────────────────────────────────────────┘
|
||||||
│
|
│
|
||||||
▼
|
▼
|
||||||
@@ -200,24 +202,9 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
|
|||||||
uvx blender-mcp --help
|
uvx blender-mcp --help
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. NFS Mount for Asset Promotion
|
### 4. rclone for Asset Promotion
|
||||||
|
|
||||||
Mount gravenhollow's avatar-models directory for direct promotion of finished VRM exports:
|
Use rclone to promote finished VRM exports to gravenhollow's RustFS S3 endpoint. This is consistent with the Kasm rclone volume plugin pattern from [ADR-0062](0062-blender-mcp-3d-avatar-workflow.md) and avoids macOS NFS/autofs complexity.
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create mount point
|
|
||||||
sudo mkdir -p /Volumes/avatar-models
|
|
||||||
|
|
||||||
# Mount gravenhollow NFS (all-SSD, dual 10GbE)
|
|
||||||
sudo mount -t nfs \
|
|
||||||
gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models \
|
|
||||||
/Volumes/avatar-models
|
|
||||||
|
|
||||||
# Add to /etc/auto_master for persistent mount (macOS autofs)
|
|
||||||
# /Volumes/avatar-models -fstype=nfs gravenhollow.lab.daviestechlabs.io:/mnt/gravenhollow/kubernetes/avatar-models
|
|
||||||
```
|
|
||||||
|
|
||||||
Alternatively, use rclone for S3-based promotion:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Install rclone
|
# Install rclone
|
||||||
@@ -232,8 +219,13 @@ rclone config create gravenhollow s3 \
|
|||||||
|
|
||||||
# Promote a finished VRM
|
# Promote a finished VRM
|
||||||
rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/
|
rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models/
|
||||||
|
|
||||||
|
# Sync all exports (idempotent)
|
||||||
|
rclone sync ~/blender-avatars/exports/ gravenhollow:avatar-models/ --exclude "*.blend"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> **Why rclone over NFS?** macOS autofs/NFS mounts are fragile across reboots and network changes. rclone is a single binary, works over HTTPS, and matches the promotion pattern already used in Kasm workflows. The explicit `rclone copy` command also serves as a deliberate promotion gate — only intentionally promoted models reach production.
|
||||||
|
|
||||||
### 5. Avatar Creation Workflow (waterdeep)
|
### 5. Avatar Creation Workflow (waterdeep)
|
||||||
|
|
||||||
1. **Open Blender** on waterdeep (native Metal-accelerated)
|
1. **Open Blender** on waterdeep (native Metal-accelerated)
|
||||||
@@ -245,9 +237,9 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models
|
|||||||
- _"Rig this character for VRM export with standard humanoid bones"_
|
- _"Rig this character for VRM export with standard humanoid bones"_
|
||||||
- _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_
|
- _"Export as VRM to ~/blender-avatars/exports/Silver-Mage.vrm"_
|
||||||
5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps
|
5. **Preview** in real-time — Metal GPU renders Eevee viewport at 60fps
|
||||||
6. **Promote** the finished VRM to gravenhollow:
|
6. **Promote** the finished VRM to gravenhollow via rclone:
|
||||||
```bash
|
```bash
|
||||||
cp ~/blender-avatars/exports/Silver-Mage-v1.vrm /Volumes/avatar-models/
|
rclone copy ~/blender-avatars/exports/Silver-Mage-v1.vrm gravenhollow:avatar-models/
|
||||||
```
|
```
|
||||||
7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit
|
7. **Register** in companions-frontend — update `AllowedAvatarModels` in Go and JS allowlists, commit
|
||||||
|
|
||||||
@@ -260,7 +252,7 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models
|
|||||||
| **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container |
|
| **MCP latency** | localhost socket — sub-millisecond | Network hop to Kasm container |
|
||||||
| **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation |
|
| **Memory** | 48 GB unified, shared with GPU | Limited by Kasm container allocation |
|
||||||
| **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound |
|
| **Sculpting** | Smooth, hardware-accelerated | Laggy, CPU-bound |
|
||||||
| **Asset promotion** | NFS mount or rclone to gravenhollow | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
|
| **Asset promotion** | rclone to gravenhollow RustFS S3 | Auto rclone to Quobyte S3 → manual promote to gravenhollow |
|
||||||
| **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere |
|
| **Access** | Local only (waterdeep physical/VNC) | Any browser, anywhere |
|
||||||
| **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image |
|
| **Setup** | Homebrew + manual add-on install | Pre-baked in Kasm image |
|
||||||
| **Use when** | Primary creation workflow | Remote access, quick edits, mobile |
|
| **Use when** | Primary creation workflow | Remote access, quick edits, mobile |
|
||||||
@@ -278,7 +270,7 @@ rclone copy ~/blender-avatars/exports/Companion-A.vrm gravenhollow:avatar-models
|
|||||||
|
|
||||||
* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation
|
* **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): When acquired, DGX Spark handles training; waterdeep remains the 3D creation workstation
|
||||||
* **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature
|
* **Blender + MLX**: Apple's MLX framework could power local AI-generated textures or mesh deformation directly in Blender — worth evaluating as Blender add-ons mature
|
||||||
* **Automated promotion**: A file watcher (fswatch/launchd) could auto-promote VRM exports from `~/blender-avatars/exports/` to gravenhollow when a new file appears
|
* **Automated promotion**: A file watcher (fswatch/launchd) could auto-run `rclone sync` when a new VRM appears in `~/blender-avatars/exports/`
|
||||||
* **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow
|
* **VRM validation**: Add a pre-promotion check script that validates VRM humanoid rig completeness, expression morphs, and viseme shapes before copying to gravenhollow
|
||||||
|
|
||||||
## Links
|
## Links
|
||||||
|
|||||||
Reference in New Issue
Block a user