Files
homelab-design/decisions/0046-companions-frontend-architecture.md
Billy D. 5846d0dc16
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
docs: add ADRs 0043-0053 covering remaining architecture gaps
New ADRs:
- 0043: Cilium CNI and Network Fabric
- 0044: DNS and External Access Architecture
- 0045: TLS Certificate Strategy (cert-manager)
- 0046: Companions Frontend Architecture
- 0047: MLflow Experiment Tracking and Model Registry
- 0048: Entertainment and Media Stack
- 0049: Self-Hosted Productivity Suite
- 0050: Argo Rollouts Progressive Delivery
- 0051: KEDA Event-Driven Autoscaling
- 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS)
- 0053: Vaultwarden Password Management

README updated with table entries and badge count (53 total).
2026-02-09 18:37:14 -05:00

7.8 KiB

Companions Frontend Architecture

  • Status: accepted
  • Date: 2026-02-09
  • Deciders: Billy
  • Technical Story: Design the primary user interface for the AI/ML platform, supporting real-time chat, voice, and 3D avatar interactions

Context and Problem Statement

The homelab AI platform needs a web interface for users to interact with chat (RAG + LLM), voice (STT → LLM → TTS), and embedding services. The interface must support real-time streaming responses, WebSocket connections for NATS message bus integration, and an engaging visual experience.

How do we build a performant, maintainable frontend that integrates with the NATS-based backend without a heavy JavaScript framework build step?

Decision Drivers

  • Real-time streaming for chat and voice (WebSocket required)
  • Direct integration with NATS JetStream (binary MessagePack protocol)
  • Minimal client-side JavaScript (~20KB gzipped target)
  • No frontend build step (no webpack/vite/node required)
  • 3D avatar rendering for immersive experience
  • OAuth integration with multiple providers
  • Single binary deployment (Go)

Considered Options

  1. Go + HTMX + Alpine.js + Three.js — Server-rendered with minimal JS
  2. Next.js / React SPA — Full JavaScript framework
  3. SvelteKit — Compiled JS framework
  4. Go + Templ + raw WebSocket — Pure Go templates, no JS framework

Decision Outcome

Chosen option: Option 1 - Go + HTMX + Alpine.js + Three.js, because it provides a zero-build-step frontend with server-rendered HTML, minimal JavaScript, and rich 3D avatar support, all served from a single Go binary.

Positive Consequences

  • Single binary deployment — Go server serves everything
  • ~20KB gzipped total JS payload (CDN-served HTMX + Alpine + Three.js)
  • No npm, no webpack, no build step — assets served directly
  • Server-side rendering via Go templates
  • WebSocket handled natively in Go (gorilla/websocket)
  • NATS integration with MessagePack in the same binary
  • Distroless container image for minimal attack surface

Negative Consequences

  • Three.js adds complexity for 3D avatar rendering
  • HTMX pattern less familiar to developers expecting React/Vue
  • Limited client-side state management (by design)

Technology Stack

Layer Technology Purpose
Server Go 1.25 HTTP server, WebSocket, NATS client, OAuth
Templates Go html/template Server-side HTML rendering
Interactivity HTMX 2.0 AJAX, WebSocket, server-sent events
Client state Alpine.js 3 Lightweight reactive UI for local state
3D Avatars Three.js + VRM 3D character rendering with lip-sync
Styling Tailwind CSS 4 + DaisyUI Utility-first CSS with component library
Messaging NATS JetStream Real-time pub/sub with MessagePack encoding
Auth golang-jwt/jwt/v5 JWT token handling for OAuth flows
Database PostgreSQL (lib/pq) + SQLite Persistent + local session storage
Observability OpenTelemetry SDK Traces, metrics via OTLP gRPC

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Browser                                  │
│                                                                 │
│  HTMX (server-rendered HTML) ←→ Go Server (WebSocket)          │
│  Alpine.js (local UI state)                                     │
│  Three.js (VRM 3D avatars with lip-sync)                        │
└───────────────────────┬─────────────────────────────────────────┘
                        │ HTTP/WebSocket
                        ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Go Server (single binary)                     │
│                                                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │  Routes   │  │  OAuth   │  │WebSocket │  │  OTEL    │       │
│  │ (HTTP)    │  │ Handlers │  │ Hub      │  │ Tracing  │       │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘       │
│        │              │            │                             │
│        └──────────────┴────────────┘                            │
│                        │                                        │
│              ┌─────────┴─────────┐                              │
│              │  NATS Client      │                              │
│              │  (JetStream +     │                              │
│              │   MessagePack)    │                              │
│              └─────────┬─────────┘                              │
└────────────────────────┼────────────────────────────────────────┘
                         │
            ┌────────────┴────────────────┐
            ▼                             ▼
┌──────────────────┐           ┌──────────────────┐
│  NATS JetStream  │           │  Ray Serve       │
│  ai.chat.*       │           │  (STT, TTS, LLM, │
│  ai.voice.*      │           │   Embeddings)    │
└──────────────────┘           └──────────────────┘

Key Features

Feature Implementation
Real-time chat WebSocket → NATS pub/sub per-user channels
Voice assistant Streaming STT → LLM → TTS via Ray Serve endpoints
3D avatars VRM models rendered in Three.js with audio-driven lip-sync
OAuth login Google, Discord, GitHub, Twitch + Authentik OIDC
RAG search Milvus vector search for premium users
Session state PostgreSQL (CNPG) for persistent data, SQLite for local cache

Kubernetes Deployment

Namespace ai-ml
Replicas 1
Image ghcr.io/billy-davies-2/companions-frontend (distroless)
Resources 50m/128Mi request → 500m/512Mi limit

OTEL sidecar: otel/opentelemetry-collector-contrib:0.145.0 exports traces to ClickStack.

Backend routing: All AI inference requests (STT, TTS, LLM, embeddings, reranking) route to Ray Serve at ai-inference-serve-svc.ai-ml.svc.cluster.local:8000. Auxiliary HTTPRoutes in the auxiliary kustomization provide direct model endpoint access at embeddings.lab, whisper.lab, tts.lab, llm.lab, reranker.lab.

Access: companions-chat.lab.daviestechlabs.io via envoy-internal with Authentik OIDC proxy auth.