# Refactor NATS Handler Services from Python to Go
* Status: accepted
* Date: 2026-02-19
* Decided: 2026-02-21
* Deciders: Billy
* Technical Story: Reduce container image sizes and resource consumption for non-ML handler services by rewriting them in Go
## Context and Problem Statement
The AI pipeline's non-inference services — `chat-handler`, `voice-assistant`, `pipeline-bridge`, `tts-module`, and the HTTP-forwarding variant of `stt-module` — are Python applications built on the `handler-base` shared library. None of these services perform local ML inference; they orchestrate calls to external Ray Serve endpoints over HTTP and route messages via NATS with MessagePack encoding.
> **Implementation note (2026-02-21):** During the Go rewrite, the wire format was upgraded from MessagePack to **Protocol Buffers** (see [ADR-0004 superseded](0004-use-messagepack-for-nats.md)). The shared Go module is published as `handler-base` v1.0.0 (not `handler-go` as originally proposed).
Despite doing only lightweight I/O orchestration, each service inherits the full Python runtime and its dependency tree through `handler-base` (which pulls in `numpy`, `pymilvus`, `redis`, `httpx`, `pydantic`, `opentelemetry-*`, `mlflow`, and `psycopg2-binary`). This results in container images of **500–700 MB each** — five services totalling **~3 GB** of registry storage — for workloads that are fundamentally HTTP/NATS glue code.
The homelab already has two production Go services (`companions-frontend` and `ntfy-discord`) that prove the NATS + MessagePack + OpenTelemetry pattern works well in Go with images under 30 MB.
How do we reduce the image footprint and resource consumption of the non-ML handler services without disrupting the ML inference layer?
## Decision Drivers
* Container images for glue services are 500–700 MB despite doing no ML work
* Go produces static binaries yielding images of ~15–30 MB (scratch/distroless base)
* Go services start in milliseconds vs. seconds for Python, improving pod scheduling
* Go's memory footprint is ~10× lower for equivalent I/O-bound workloads
* The NATS + msgpack + OTel pattern is already proven in `companions-frontend`
* Go has first-class Kubernetes client support (`client-go`) — relevant for `pipeline-bridge`
* ML inference services (Ray Serve, kuberay-images) must remain Python — only orchestration moves
* Five services share a common base (`handler-base`) — a single Go module replaces it for all
## Considered Options
1.**Rewrite handler services in Go with a shared Go module**
Chosen option: **Option 1 — Rewrite handler services in Go**, because the services are pure I/O orchestration with no ML dependencies, the Go pattern is already proven in-cluster, and the image + resource savings are an order of magnitude improvement that Python optimisation cannot match.
### Positive Consequences
* Five container images shrink from ~3 GB total to ~100–150 MB total
* Sub-second cold start enables faster rollouts and autoscaling via KEDA
* Lower memory footprint frees cluster resources for ML workloads
* Eliminates Python runtime CVE surface area from non-ML services
* Single `handler-go` module provides shared NATS, health, OTel, and client code
*`pipeline-bridge` gains `client-go` — the canonical Kubernetes client library
* Go's type system catches message schema drift at compile time
### Negative Consequences
* One-time rewrite effort across five services
* Team must maintain Go **and** Python codebases (Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs)
*`handler-go` needs feature parity with `handler-base` for the orchestration subset (NATS client, health server, OTel, HTTP clients, Milvus client)
* Audio handling in `stt-module` (VAD) requires a Go webrtcvad binding or equivalent
## Pros and Cons of the Options
### Option 1 — Rewrite in Go
* Good, because images shrink from ~600 MB → ~20 MB per service
* Good, because memory usage drops from ~150 MB → ~15 MB per service
* Good, because startup time drops from ~3 s → <100 ms
* Good, because Go has mature libraries for every dependency (nats.go, client-go, otel-go, milvus-sdk-go)
* Good, because two existing Go services in the cluster prove the pattern
* Bad, because one-time engineering effort to rewrite five services
* Bad, because two language ecosystems to maintain
### Option 2 — Optimise Python images
* Good, because no rewrite needed
* Good, because multi-stage builds and dependency trimming can reduce images by 30–50%
* Related: [ADR-0051](0051-keda-event-driven-autoscaling.md) — KEDA autoscaling
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.