Files
homelab-design/decisions/0061-go-handler-refactor.md
Billy D. 555b70b9d9
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 1m5s
docs: accept ADR-0061 (Go handler refactor), supersede ADR-0004 (msgpack→protobuf)
All 5 handler services + companions-frontend migrated to handler-base v1.0.0
with protobuf wire format. golangci-lint clean across all repos.
2026-02-21 15:46:43 -05:00

148 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Refactor NATS Handler Services from Python to Go
* Status: accepted
* Date: 2026-02-19
* Decided: 2026-02-21
* Deciders: Billy
* Technical Story: Reduce container image sizes and resource consumption for non-ML handler services by rewriting them in Go
## Context and Problem Statement
The AI pipeline's non-inference services — `chat-handler`, `voice-assistant`, `pipeline-bridge`, `tts-module`, and the HTTP-forwarding variant of `stt-module` — are Python applications built on the `handler-base` shared library. None of these services perform local ML inference; they orchestrate calls to external Ray Serve endpoints over HTTP and route messages via NATS with MessagePack encoding.
> **Implementation note (2026-02-21):** During the Go rewrite, the wire format was upgraded from MessagePack to **Protocol Buffers** (see [ADR-0004 superseded](0004-use-messagepack-for-nats.md)). The shared Go module is published as `handler-base` v1.0.0 (not `handler-go` as originally proposed).
Despite doing only lightweight I/O orchestration, each service inherits the full Python runtime and its dependency tree through `handler-base` (which pulls in `numpy`, `pymilvus`, `redis`, `httpx`, `pydantic`, `opentelemetry-*`, `mlflow`, and `psycopg2-binary`). This results in container images of **500700 MB each** — five services totalling **~3 GB** of registry storage — for workloads that are fundamentally HTTP/NATS glue code.
The homelab already has two production Go services (`companions-frontend` and `ntfy-discord`) that prove the NATS + MessagePack + OpenTelemetry pattern works well in Go with images under 30 MB.
How do we reduce the image footprint and resource consumption of the non-ML handler services without disrupting the ML inference layer?
## Decision Drivers
* Container images for glue services are 500700 MB despite doing no ML work
* Go produces static binaries yielding images of ~1530 MB (scratch/distroless base)
* Go services start in milliseconds vs. seconds for Python, improving pod scheduling
* Go's memory footprint is ~10× lower for equivalent I/O-bound workloads
* The NATS + msgpack + OTel pattern is already proven in `companions-frontend`
* Go has first-class Kubernetes client support (`client-go`) — relevant for `pipeline-bridge`
* ML inference services (Ray Serve, kuberay-images) must remain Python — only orchestration moves
* Five services share a common base (`handler-base`) — a single Go module replaces it for all
## Considered Options
1. **Rewrite handler services in Go with a shared Go module**
2. **Optimise Python images (multi-stage builds, slim deps, compiled wheels)**
3. **Keep current Python stack unchanged**
## Decision Outcome
Chosen option: **Option 1 — Rewrite handler services in Go**, because the services are pure I/O orchestration with no ML dependencies, the Go pattern is already proven in-cluster, and the image + resource savings are an order of magnitude improvement that Python optimisation cannot match.
### Positive Consequences
* Five container images shrink from ~3 GB total to ~100150 MB total
* Sub-second cold start enables faster rollouts and autoscaling via KEDA
* Lower memory footprint frees cluster resources for ML workloads
* Eliminates Python runtime CVE surface area from non-ML services
* Single `handler-go` module provides shared NATS, health, OTel, and client code
* `pipeline-bridge` gains `client-go` — the canonical Kubernetes client library
* Go's type system catches message schema drift at compile time
### Negative Consequences
* One-time rewrite effort across five services
* Team must maintain Go **and** Python codebases (Python remains for Ray Serve, Kubeflow pipelines, Gradio UIs)
* `handler-go` needs feature parity with `handler-base` for the orchestration subset (NATS client, health server, OTel, HTTP clients, Milvus client)
* Audio handling in `stt-module` (VAD) requires a Go webrtcvad binding or equivalent
## Pros and Cons of the Options
### Option 1 — Rewrite in Go
* Good, because images shrink from ~600 MB → ~20 MB per service
* Good, because memory usage drops from ~150 MB → ~15 MB per service
* Good, because startup time drops from ~3 s → <100 ms
* Good, because Go has mature libraries for every dependency (nats.go, client-go, otel-go, milvus-sdk-go)
* Good, because two existing Go services in the cluster prove the pattern
* Bad, because one-time engineering effort to rewrite five services
* Bad, because two language ecosystems to maintain
### Option 2 — Optimise Python images
* Good, because no rewrite needed
* Good, because multi-stage builds and dependency trimming can reduce images by 3050%
* Bad, because Python runtime + interpreter overhead remains (~200 MB floor)
* Bad, because memory and startup improvements are marginal
* Bad, because `handler-base` dependency tree is difficult to slim without breaking shared code
### Option 3 — Keep current stack
* Good, because zero effort
* Bad, because images remain 500700 MB for glue code
* Bad, because resource waste reduces headroom for ML workloads
* Bad, because slow cold starts limit KEDA autoscaling effectiveness
## Implementation Plan
### Phase 1: `handler-base` Go Module (COMPLETE)
Published as `git.daviestechlabs.io/daviestechlabs/handler-base` v1.0.0 with:
| Package | Purpose | Python Equivalent |
|---------|---------|-------------------|
| `natsutil/` | NATS publish/request/decode with protobuf encoding | `handler_base.nats_client` |
| `health/` | HTTP health + readiness server | `handler_base.health` |
| `telemetry/` | OTel traces + metrics setup | `handler_base.telemetry` |
| `config/` | Env-based configuration (struct tags) | `handler_base.config` (pydantic-settings) |
| `clients/` | HTTP clients for LLM, embeddings, reranker, STT, TTS | `handler_base.clients` |
| `handler/` | Typed NATS message handler with OTel + health wiring | `handler_base.handler` |
| `messages/` | Type aliases from generated protobuf stubs | `handler_base.messages` |
| `gen/messagespb/` | protoc-generated Go stubs (21 message types) | — |
| `proto/messages/v1/` | `.proto` schema source | — |
### Phase 2: Service Ports (COMPLETE)
All five services rewritten in Go and migrated to handler-base v1.0.0 with protobuf wire format:
| Order | Service | Status | Notes |
|-------|---------|--------|-------|
| 1 | `pipeline-bridge` | ✅ Done | NATS + HTTP + k8s API calls. Parameters changed to `map[string]string`. |
| 2 | `tts-module` | ✅ Done | NATS ↔ HTTP bridge. `[]*TTSVoiceInfo` pointer slices, `int32` casts. |
| 3 | `chat-handler` | ✅ Done | Core text pipeline. `EffectiveQuery()` standalone func, `int32(TopK)`. |
| 4 | `voice-assistant` | ✅ Done | Same pattern with `[]*DocumentSource` pointer slices. |
| 5 | `stt-module` | ✅ Done | HTTP-forwarding variant. `SessionId`/`SpeakerId` field renames, `int32(Sequence)`. |
`companions-frontend` also migrated: 129-line duplicate type definitions replaced with type aliases from handler-base/messages.
### Phase 3: Cleanup (COMPLETE)
* ~~Archive Python versions of ported services~~ — Python handler-base remains for Ray Serve/Kubeflow
* CI pipelines use `golangci-lint` v2 with errcheck, govet, staticcheck, misspell, bodyclose, nilerr
* All repos pass `golangci-lint run ./...` and `go test ./...`
* Wire format upgraded from MessagePack to Protocol Buffers (ADR-0004 superseded)
### What Stays in Python
| Repository | Reason |
|------------|--------|
| `ray-serve` | PyTorch, vLLM, sentence-transformers — core ML inference |
| `kuberay-images` | GPU runtime Docker images (ROCm, CUDA, IPEX) |
| `gradio-ui` | Gradio is Python-only; dev/testing tool, not production |
| `kubeflow/` | Kubeflow Pipelines SDK is Python-only |
| `mlflow/` | MLflow SDK integration (tracking + model registry) |
| `stt-module` (local Whisper variant) | PyTorch + openai-whisper on GPU |
| `spark-analytics-jobs` | PySpark (being replaced by Flink anyway) |
## Links
* Related: [ADR-0003](0003-use-nats-for-messaging.md) — NATS as messaging backbone
* Related: [ADR-0004](0004-use-messagepack-for-nats.md) — MessagePack binary encoding
* Related: [ADR-0011](0011-kuberay-unified-gpu-backend.md) — KubeRay unified GPU backend
* Related: [ADR-0013](0013-gitea-actions-for-ci.md) — Gitea Actions CI
* Related: [ADR-0014](0014-docker-build-best-practices.md) — Docker build best practices
* Related: [ADR-0019](0019-handler-deployment-strategy.md) — Handler deployment strategy
* Related: [ADR-0024](0024-ray-repository-structure.md) — Ray repository structure
* Related: [ADR-0046](0046-companions-frontend-architecture.md) — Companions frontend (Go reference)
* Related: [ADR-0051](0051-keda-event-driven-autoscaling.md) — KEDA autoscaling