docs: accept ADR-0061 (Go handler refactor), supersede ADR-0004 (msgpack→protobuf)

All 5 handler services + companions-frontend migrated to handler-base v1.0.0 with protobuf wire format. golangci-lint clean across all repos.
docs(adr): add ADR-0061 Go handler refactor
2026-02-21 15:46:24 -05:00 · 2026-02-19 07:14:36 -05:00
3 changed files with 149 additions and 1 deletions
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -317,6 +317,7 @@ Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafan
 | GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
 | KServe for inference | Standardized API, autoscaling | [ADR-0007](decisions/0007-use-kserve-for-inference.md) |
 | KubeRay unified backend | Fractional GPU, single endpoint | [ADR-0011](decisions/0011-kuberay-unified-gpu-backend.md) |
+| Go handler refactor | Slim images for non-ML services | [ADR-0061](decisions/0061-go-handler-refactor.md) |

 ## Related Documents

--- a/decisions/0004-use-messagepack-for-nats.md
+++ b/decisions/0004-use-messagepack-for-nats.md
@@ -1,6 +1,6 @@
 # Use MessagePack for NATS Messages

-* Status: accepted
+* Status: superseded by [ADR-0061](0061-go-handler-refactor.md) (Protocol Buffers)
 * Date: 2025-12-01
 * Deciders: Billy Davies
 * Technical Story: Selecting serialization format for NATS messages
--- a/decisions/0061-go-handler-refactor.md
+++ b/decisions/0061-go-handler-refactor.md
@@ -0,0 +1,147 @@
+# Refactor NATS Handler Services from Python to Go
+
+* Status: accepted
+* Date: 2026-02-19
+* Decided: 2026-02-21
+* Deciders: Billy
+* Technical Story: Reduce container image sizes and resource consumption for non-ML handler services by rewriting them in Go
+
+## Context and Problem Statement
+
+The AI pipeline's non-inference services — `chat-handler`, `voice-assistant`, `pipeline-bridge`, `tts-module`, and the HTTP-forwarding variant of `stt-module` — are Python applications built on the `handler-base` shared library. None of these services perform local ML inference; they orchestrate calls to external Ray Serve endpoints over HTTP and route messages via NATS with MessagePack encoding.
+
+> **Implementation note (2026-02-21):** During the Go rewrite, the wire format was upgraded from MessagePack to **Protocol Buffers** (see [ADR-0004 superseded](0004-use-messagepack-for-nats.md)). The shared Go module is published as `handler-base` v1.0.0 (not `handler-go` as originally proposed).
+
+Despite doing only lightweight I/O orchestration, each service inherits the full Python runtime and its dependency tree through `handler-base` (which pulls in `numpy`, `pymilvus`, `redis`, `httpx`, `pydantic`, `opentelemetry-*`, `mlflow`, and `psycopg2-binary`). This results in container images of **500–700 MB each** — five services totalling **~3 GB** of registry storage — for workloads that are fundamentally HTTP/NATS glue code.
+
+The homelab already has two production Go services (`companions-frontend` and `ntfy-discord`) that prove the NATS + MessagePack + OpenTelemetry pattern works well in Go with images under 30 MB.
+
+How do we reduce the image footprint and resource consumption of the non-ML handler services without disrupting the ML inference layer?
+
+## Decision Drivers
+
+* Container images for glue services are 500–700 MB despite doing no ML work
+* Go produces static binaries yielding images of ~15–30 MB (scratch/distroless base)
+* Go services start in milliseconds vs. seconds for Python, improving pod scheduling
+* Go's memory footprint is ~10× lower for equivalent I/O-bound workloads
+* The NATS + msgpack + OTel pattern is already proven in `companions-frontend`
+* Go has first-class Kubernetes client support (`client-go`) — relevant for `pipeline-bridge`
+* ML inference services (Ray Serve, kuberay-images) must remain Python — only orchestration moves
+* Five services share a common base (`handler-base`) — a single Go module replaces it for all
+
+## Considered Options
+
+1. **Rewrite handler services in Go with a shared Go module**
+2. **Optimise Python images (multi-stage builds, slim deps, compiled wheels)**
+3. **Keep current Python stack unchanged**
+
+## Decision Outcome
+
+Chosen option: **Option 1 — Rewrite handler services in Go**, because the services are pure I/O orchestration with no ML dependencies, the Go pattern is already proven in-cluster, and the image + resource savings are an order of magnitude improvement that Python optimisation cannot match.
+
+### Positive Consequences
+
+* Five container images shrink from ~3 GB total to ~100–150 MB total
+* Sub-second cold start enables faster rollouts and autoscaling via KEDA
+* Lower memory footprint frees cluster resources for ML workloads
+* Eliminates Python runtime CVE surface area from non-ML services
+* Single `handler-go` module provides shared NATS, health, OTel, and client code
+* `pipeline-bridge` gains `client-go` — the canonical Kubernetes client library
+* Go's type system catches message schema drift at compile time
+
+### Negative Consequences
+
+* One-time rewrite effort across five services
+* Team must maintain Go **and** Python codebases (Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs)
+* `handler-go` needs feature parity with `handler-base` for the orchestration subset (NATS client, health server, OTel, HTTP clients, Milvus client)
+* Audio handling in `stt-module` (VAD) requires a Go webrtcvad binding or equivalent
+
+## Pros and Cons of the Options
+
+### Option 1 — Rewrite in Go
+
+* Good, because images shrink from ~600 MB → ~20 MB per service
+* Good, because memory usage drops from ~150 MB → ~15 MB per service
+* Good, because startup time drops from ~3 s → <100 ms
+* Good, because Go has mature libraries for every dependency (nats.go, client-go, otel-go, milvus-sdk-go)
+* Good, because two existing Go services in the cluster prove the pattern
+* Bad, because one-time engineering effort to rewrite five services
+* Bad, because two language ecosystems to maintain
+
+### Option 2 — Optimise Python images
+
+* Good, because no rewrite needed
+* Good, because multi-stage builds and dependency trimming can reduce images by 30–50%
+* Bad, because Python runtime + interpreter overhead remains (~200 MB floor)
+* Bad, because memory and startup improvements are marginal
+* Bad, because `handler-base` dependency tree is difficult to slim without breaking shared code
+
+### Option 3 — Keep current stack
+
+* Good, because zero effort
+* Bad, because images remain 500–700 MB for glue code
+* Bad, because resource waste reduces headroom for ML workloads
+* Bad, because slow cold starts limit KEDA autoscaling effectiveness
+
+## Implementation Plan
+
+### Phase 1: `handler-base` Go Module (COMPLETE)
+
+Published as `git.daviestechlabs.io/daviestechlabs/handler-base` v1.0.0 with:
+
+| Package | Purpose | Python Equivalent |
+|---------|---------|-------------------|
+| `natsutil/` | NATS publish/request/decode with protobuf encoding | `handler_base.nats_client` |
+| `health/` | HTTP health + readiness server | `handler_base.health` |
+| `telemetry/` | OTel traces + metrics setup | `handler_base.telemetry` |
+| `config/` | Env-based configuration (struct tags) | `handler_base.config` (pydantic-settings) |
+| `clients/` | HTTP clients for LLM, embeddings, reranker, STT, TTS | `handler_base.clients` |
+| `handler/` | Typed NATS message handler with OTel + health wiring | `handler_base.handler` |
+| `messages/` | Type aliases from generated protobuf stubs | `handler_base.messages` |
+| `gen/messagespb/` | protoc-generated Go stubs (21 message types) | — |
+| `proto/messages/v1/` | `.proto` schema source | — |
+
+### Phase 2: Service Ports (COMPLETE)
+
+All five services rewritten in Go and migrated to handler-base v1.0.0 with protobuf wire format:
+
+| Order | Service | Status | Notes |
+|-------|---------|--------|-------|
+| 1 | `pipeline-bridge` | ✅ Done | NATS + HTTP + k8s API calls. Parameters changed to `map[string]string`. |
+| 2 | `tts-module` | ✅ Done | NATS ↔ HTTP bridge. `[]*TTSVoiceInfo` pointer slices, `int32` casts. |
+| 3 | `chat-handler` | ✅ Done | Core text pipeline. `EffectiveQuery()` standalone func, `int32(TopK)`. |
+| 4 | `voice-assistant` | ✅ Done | Same pattern with `[]*DocumentSource` pointer slices. |
+| 5 | `stt-module` | ✅ Done | HTTP-forwarding variant. `SessionId`/`SpeakerId` field renames, `int32(Sequence)`. |
+
+`companions-frontend` also migrated: 129-line duplicate type definitions replaced with type aliases from handler-base/messages.
+
+### Phase 3: Cleanup (COMPLETE)
+
+* ~~Archive Python versions of ported services~~ — Python handler-base remains for Ray Serve/Kubeflow
+* CI pipelines use `golangci-lint` v2 with errcheck, govet, staticcheck, misspell, bodyclose, nilerr
+* All repos pass `golangci-lint run ./...` and `go test ./...`
+* Wire format upgraded from MessagePack to Protocol Buffers (ADR-0004 superseded)
+
+### What Stays in Python
+
+| Repository | Reason |
+|------------|--------|
+| `ray-serve` | PyTorch, vLLM, sentence-transformers — core ML inference |
+| `kuberay-images` | GPU runtime Docker images (ROCm, CUDA, IPEX) |
+| `gradio-ui` | Gradio is Python-only; dev/testing tool, not production |
+| `kubeflow/` | Kubeflow Pipelines SDK is Python-only |
+| `mlflow/` | MLflow SDK integration (tracking + model registry) |
+| `stt-module` (local Whisper variant) | PyTorch + openai-whisper on GPU |
+| `spark-analytics-jobs` | PySpark (being replaced by Flink anyway) |
+
+## Links
+
+* Related: [ADR-0003](0003-use-nats-for-messaging.md) — NATS as messaging backbone
+* Related: [ADR-0004](0004-use-messagepack-for-nats.md) — MessagePack binary encoding
+* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
+* Related: [ADR-0013](0013-gitea-actions-for-ci.md) — Gitea Actions CI
+* Related: [ADR-0014](0014-docker-build-best-practices.md) — Docker build best practices
+* Related: [ADR-0019](0019-handler-deployment-strategy.md) — Handler deployment strategy
+* Related: [ADR-0024](0024-ray-repository-structure.md) — Ray repository structure
+* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend (Go reference)
+* Related: [ADR-0051](0051-keda-event-driven-autoscaling.md) — KEDA autoscaling
Author	SHA1	Message	Date
Billy D.	299a416f51	docs: accept ADR-0061 (Go handler refactor), supersede ADR-0004 (msgpack→protobuf) All 5 handler services + companions-frontend migrated to handler-base v1.0.0 with protobuf wire format. golangci-lint clean across all repos.	2026-02-21 15:46:24 -05:00
Billy D.	e57d998d9a	docs(adr): add ADR-0061 Go handler refactor	2026-02-19 07:14:36 -05:00