Files

Update README with ADR Index / update-readme (push) Successful in 6s

Details

docs: add ADRs 0043-0053 covering remaining architecture gaps

New ADRs:
- 0043: Cilium CNI and Network Fabric
- 0044: DNS and External Access Architecture
- 0045: TLS Certificate Strategy (cert-manager)
- 0046: Companions Frontend Architecture
- 0047: MLflow Experiment Tracking and Model Registry
- 0048: Entertainment and Media Stack
- 0049: Self-Hosted Productivity Suite
- 0050: Argo Rollouts Progressive Delivery
- 0051: KEDA Event-Driven Autoscaling
- 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS)
- 0053: Vaultwarden Password Management

README updated with table entries and badge count (53 total).

2026-02-09 18:37:14 -05:00

7.8 KiB

Raw Blame History

Companions Frontend Architecture

Status: accepted
Date: 2026-02-09
Deciders: Billy
Technical Story: Design the primary user interface for the AI/ML platform, supporting real-time chat, voice, and 3D avatar interactions

Context and Problem Statement

The homelab AI platform needs a web interface for users to interact with chat (RAG + LLM), voice (STT → LLM → TTS), and embedding services. The interface must support real-time streaming responses, WebSocket connections for NATS message bus integration, and an engaging visual experience.

How do we build a performant, maintainable frontend that integrates with the NATS-based backend without a heavy JavaScript framework build step?

Decision Drivers

Real-time streaming for chat and voice (WebSocket required)
Direct integration with NATS JetStream (binary MessagePack protocol)
Minimal client-side JavaScript (~20KB gzipped target)
No frontend build step (no webpack/vite/node required)
3D avatar rendering for immersive experience
OAuth integration with multiple providers
Single binary deployment (Go)

Considered Options

Go + HTMX + Alpine.js + Three.js — Server-rendered with minimal JS
Next.js / React SPA — Full JavaScript framework
SvelteKit — Compiled JS framework
Go + Templ + raw WebSocket — Pure Go templates, no JS framework

Decision Outcome

Chosen option: Option 1 - Go + HTMX + Alpine.js + Three.js, because it provides a zero-build-step frontend with server-rendered HTML, minimal JavaScript, and rich 3D avatar support, all served from a single Go binary.

Positive Consequences

Single binary deployment — Go server serves everything
~20KB gzipped total JS payload (CDN-served HTMX + Alpine + Three.js)
No npm, no webpack, no build step — assets served directly
Server-side rendering via Go templates
WebSocket handled natively in Go (gorilla/websocket)
NATS integration with MessagePack in the same binary
Distroless container image for minimal attack surface

Negative Consequences

Three.js adds complexity for 3D avatar rendering
HTMX pattern less familiar to developers expecting React/Vue
Limited client-side state management (by design)

Technology Stack

Layer	Technology	Purpose
Server	Go 1.25	HTTP server, WebSocket, NATS client, OAuth
Templates	Go `html/template`	Server-side HTML rendering
Interactivity	HTMX 2.0	AJAX, WebSocket, server-sent events
Client state	Alpine.js 3	Lightweight reactive UI for local state
3D Avatars	Three.js + VRM	3D character rendering with lip-sync
Styling	Tailwind CSS 4 + DaisyUI	Utility-first CSS with component library
Messaging	NATS JetStream	Real-time pub/sub with MessagePack encoding
Auth	golang-jwt/jwt/v5	JWT token handling for OAuth flows
Database	PostgreSQL (lib/pq) + SQLite	Persistent + local session storage
Observability	OpenTelemetry SDK	Traces, metrics via OTLP gRPC

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Browser                                  │
│                                                                 │
│  HTMX (server-rendered HTML) ←→ Go Server (WebSocket)          │
│  Alpine.js (local UI state)                                     │
│  Three.js (VRM 3D avatars with lip-sync)                        │
└───────────────────────┬─────────────────────────────────────────┘
                        │ HTTP/WebSocket
                        ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Go Server (single binary)                     │
│                                                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │  Routes   │  │  OAuth   │  │WebSocket │  │  OTEL    │       │
│  │ (HTTP)    │  │ Handlers │  │ Hub      │  │ Tracing  │       │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘       │
│        │              │            │                             │
│        └──────────────┴────────────┘                            │
│                        │                                        │
│              ┌─────────┴─────────┐                              │
│              │  NATS Client      │                              │
│              │  (JetStream +     │                              │
│              │   MessagePack)    │                              │
│              └─────────┬─────────┘                              │
└────────────────────────┼────────────────────────────────────────┘
                         │
            ┌────────────┴────────────────┐
            ▼                             ▼
┌──────────────────┐           ┌──────────────────┐
│  NATS JetStream  │           │  Ray Serve       │
│  ai.chat.*       │           │  (STT, TTS, LLM, │
│  ai.voice.*      │           │   Embeddings)    │
└──────────────────┘           └──────────────────┘

Key Features

Feature	Implementation
Real-time chat	WebSocket → NATS pub/sub per-user channels
Voice assistant	Streaming STT → LLM → TTS via Ray Serve endpoints
3D avatars	VRM models rendered in Three.js with audio-driven lip-sync
OAuth login	Google, Discord, GitHub, Twitch + Authentik OIDC
RAG search	Milvus vector search for premium users
Session state	PostgreSQL (CNPG) for persistent data, SQLite for local cache

Kubernetes Deployment


Namespace	`ai-ml`
Replicas	1
Image	`ghcr.io/billy-davies-2/companions-frontend` (distroless)
Resources	50m/128Mi request → 500m/512Mi limit

OTEL sidecar: otel/opentelemetry-collector-contrib:0.145.0 exports traces to ClickStack.

Backend routing: All AI inference requests (STT, TTS, LLM, embeddings, reranking) route to Ray Serve at ai-inference-serve-svc.ai-ml.svc.cluster.local:8000. Auxiliary HTTPRoutes in the auxiliary kustomization provide direct model endpoint access at embeddings.lab, whisper.lab, tts.lab, llm.lab, reranker.lab.

Access: companions-chat.lab.daviestechlabs.io via envoy-internal with Authentik OIDC proxy auth.

7.8 KiB Raw Blame History