Files
homelab-design/decisions/0061-go-handler-refactor.md
Billy D. 555b70b9d9
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m5s
docs: accept ADR-0061 (Go handler refactor), supersede ADR-0004 (msgpack→protobuf)
All 5 handler services + companions-frontend migrated to handler-base v1.0.0
with protobuf wire format. golangci-lint clean across all repos.
2026-02-21 15:46:43 -05:00

8.6 KiB
Raw Blame History

Refactor NATS Handler Services from Python to Go

  • Status: accepted
  • Date: 2026-02-19
  • Decided: 2026-02-21
  • Deciders: Billy
  • Technical Story: Reduce container image sizes and resource consumption for non-ML handler services by rewriting them in Go

Context and Problem Statement

The AI pipeline's non-inference services — chat-handler, voice-assistant, pipeline-bridge, tts-module, and the HTTP-forwarding variant of stt-module — are Python applications built on the handler-base shared library. None of these services perform local ML inference; they orchestrate calls to external Ray Serve endpoints over HTTP and route messages via NATS with MessagePack encoding.

Implementation note (2026-02-21): During the Go rewrite, the wire format was upgraded from MessagePack to Protocol Buffers (see ADR-0004 superseded). The shared Go module is published as handler-base v1.0.0 (not handler-go as originally proposed).

Despite doing only lightweight I/O orchestration, each service inherits the full Python runtime and its dependency tree through handler-base (which pulls in numpy, pymilvus, redis, httpx, pydantic, opentelemetry-*, mlflow, and psycopg2-binary). This results in container images of 500700 MB each — five services totalling ~3 GB of registry storage — for workloads that are fundamentally HTTP/NATS glue code.

The homelab already has two production Go services (companions-frontend and ntfy-discord) that prove the NATS + MessagePack + OpenTelemetry pattern works well in Go with images under 30 MB.

How do we reduce the image footprint and resource consumption of the non-ML handler services without disrupting the ML inference layer?

Decision Drivers

  • Container images for glue services are 500700 MB despite doing no ML work
  • Go produces static binaries yielding images of ~1530 MB (scratch/distroless base)
  • Go services start in milliseconds vs. seconds for Python, improving pod scheduling
  • Go's memory footprint is ~10× lower for equivalent I/O-bound workloads
  • The NATS + msgpack + OTel pattern is already proven in companions-frontend
  • Go has first-class Kubernetes client support (client-go) — relevant for pipeline-bridge
  • ML inference services (Ray Serve, kuberay-images) must remain Python — only orchestration moves
  • Five services share a common base (handler-base) — a single Go module replaces it for all

Considered Options

  1. Rewrite handler services in Go with a shared Go module
  2. Optimise Python images (multi-stage builds, slim deps, compiled wheels)
  3. Keep current Python stack unchanged

Decision Outcome

Chosen option: Option 1 — Rewrite handler services in Go, because the services are pure I/O orchestration with no ML dependencies, the Go pattern is already proven in-cluster, and the image + resource savings are an order of magnitude improvement that Python optimisation cannot match.

Positive Consequences

  • Five container images shrink from ~3 GB total to ~100150 MB total
  • Sub-second cold start enables faster rollouts and autoscaling via KEDA
  • Lower memory footprint frees cluster resources for ML workloads
  • Eliminates Python runtime CVE surface area from non-ML services
  • Single handler-go module provides shared NATS, health, OTel, and client code
  • pipeline-bridge gains client-go — the canonical Kubernetes client library
  • Go's type system catches message schema drift at compile time

Negative Consequences

  • One-time rewrite effort across five services
  • Team must maintain Go and Python codebases (Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs)
  • handler-go needs feature parity with handler-base for the orchestration subset (NATS client, health server, OTel, HTTP clients, Milvus client)
  • Audio handling in stt-module (VAD) requires a Go webrtcvad binding or equivalent

Pros and Cons of the Options

Option 1 — Rewrite in Go

  • Good, because images shrink from ~600 MB → ~20 MB per service
  • Good, because memory usage drops from ~150 MB → ~15 MB per service
  • Good, because startup time drops from ~3 s → <100 ms
  • Good, because Go has mature libraries for every dependency (nats.go, client-go, otel-go, milvus-sdk-go)
  • Good, because two existing Go services in the cluster prove the pattern
  • Bad, because one-time engineering effort to rewrite five services
  • Bad, because two language ecosystems to maintain

Option 2 — Optimise Python images

  • Good, because no rewrite needed
  • Good, because multi-stage builds and dependency trimming can reduce images by 3050%
  • Bad, because Python runtime + interpreter overhead remains (~200 MB floor)
  • Bad, because memory and startup improvements are marginal
  • Bad, because handler-base dependency tree is difficult to slim without breaking shared code

Option 3 — Keep current stack

  • Good, because zero effort
  • Bad, because images remain 500700 MB for glue code
  • Bad, because resource waste reduces headroom for ML workloads
  • Bad, because slow cold starts limit KEDA autoscaling effectiveness

Implementation Plan

Phase 1: handler-base Go Module (COMPLETE)

Published as git.daviestechlabs.io/daviestechlabs/handler-base v1.0.0 with:

Package Purpose Python Equivalent
natsutil/ NATS publish/request/decode with protobuf encoding handler_base.nats_client
health/ HTTP health + readiness server handler_base.health
telemetry/ OTel traces + metrics setup handler_base.telemetry
config/ Env-based configuration (struct tags) handler_base.config (pydantic-settings)
clients/ HTTP clients for LLM, embeddings, reranker, STT, TTS handler_base.clients
handler/ Typed NATS message handler with OTel + health wiring handler_base.handler
messages/ Type aliases from generated protobuf stubs handler_base.messages
gen/messagespb/ protoc-generated Go stubs (21 message types)
proto/messages/v1/ .proto schema source

Phase 2: Service Ports (COMPLETE)

All five services rewritten in Go and migrated to handler-base v1.0.0 with protobuf wire format:

Order Service Status Notes
1 pipeline-bridge Done NATS + HTTP + k8s API calls. Parameters changed to map[string]string.
2 tts-module Done NATS ↔ HTTP bridge. []*TTSVoiceInfo pointer slices, int32 casts.
3 chat-handler Done Core text pipeline. EffectiveQuery() standalone func, int32(TopK).
4 voice-assistant Done Same pattern with []*DocumentSource pointer slices.
5 stt-module Done HTTP-forwarding variant. SessionId/SpeakerId field renames, int32(Sequence).

companions-frontend also migrated: 129-line duplicate type definitions replaced with type aliases from handler-base/messages.

Phase 3: Cleanup (COMPLETE)

  • Archive Python versions of ported services — Python handler-base remains for Ray Serve/Kubeflow
  • CI pipelines use golangci-lint v2 with errcheck, govet, staticcheck, misspell, bodyclose, nilerr
  • All repos pass golangci-lint run ./... and go test ./...
  • Wire format upgraded from MessagePack to Protocol Buffers (ADR-0004 superseded)

What Stays in Python

Repository Reason
ray-serve PyTorch, vLLM, sentence-transformers — core ML inference
kuberay-images GPU runtime Docker images (ROCm, CUDA, IPEX)
gradio-ui Gradio is Python-only; dev/testing tool, not production
kubeflow/ Kubeflow Pipelines SDK is Python-only
mlflow/ MLflow SDK integration (tracking + model registry)
stt-module (local Whisper variant) PyTorch + openai-whisper on GPU
spark-analytics-jobs PySpark (being replaced by Flink anyway)
  • Related: ADR-0003 — NATS as messaging backbone
  • Related: ADR-0004 — MessagePack binary encoding
  • Related: ADR-0011 — KubeRay unified GPU backend
  • Related: ADR-0013 — Gitea Actions CI
  • Related: ADR-0014 — Docker build best practices
  • Related: ADR-0019 — Handler deployment strategy
  • Related: ADR-0024 — Ray repository structure
  • Related: ADR-0046 — Companions frontend (Go reference)
  • Related: ADR-0051 — KEDA autoscaling