Files
homelab-design/decisions/0046-companions-frontend-architecture.md
Billy D. 5846d0dc16
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
docs: add ADRs 0043-0053 covering remaining architecture gaps
New ADRs:
- 0043: Cilium CNI and Network Fabric
- 0044: DNS and External Access Architecture
- 0045: TLS Certificate Strategy (cert-manager)
- 0046: Companions Frontend Architecture
- 0047: MLflow Experiment Tracking and Model Registry
- 0048: Entertainment and Media Stack
- 0049: Self-Hosted Productivity Suite
- 0050: Argo Rollouts Progressive Delivery
- 0051: KEDA Event-Driven Autoscaling
- 0052: Cluster Utilities (Spegel, Descheduler, Reloader, CSI-NFS)
- 0053: Vaultwarden Password Management

README updated with table entries and badge count (53 total).
2026-02-09 18:37:14 -05:00

138 lines
7.8 KiB
Markdown

# Companions Frontend Architecture
* Status: accepted
* Date: 2026-02-09
* Deciders: Billy
* Technical Story: Design the primary user interface for the AI/ML platform, supporting real-time chat, voice, and 3D avatar interactions
## Context and Problem Statement
The homelab AI platform needs a web interface for users to interact with chat (RAG + LLM), voice (STT → LLM → TTS), and embedding services. The interface must support real-time streaming responses, WebSocket connections for NATS message bus integration, and an engaging visual experience.
How do we build a performant, maintainable frontend that integrates with the NATS-based backend without a heavy JavaScript framework build step?
## Decision Drivers
* Real-time streaming for chat and voice (WebSocket required)
* Direct integration with NATS JetStream (binary MessagePack protocol)
* Minimal client-side JavaScript (~20KB gzipped target)
* No frontend build step (no webpack/vite/node required)
* 3D avatar rendering for immersive experience
* OAuth integration with multiple providers
* Single binary deployment (Go)
## Considered Options
1. **Go + HTMX + Alpine.js + Three.js** — Server-rendered with minimal JS
2. **Next.js / React SPA** — Full JavaScript framework
3. **SvelteKit** — Compiled JS framework
4. **Go + Templ + raw WebSocket** — Pure Go templates, no JS framework
## Decision Outcome
Chosen option: **Option 1 - Go + HTMX + Alpine.js + Three.js**, because it provides a zero-build-step frontend with server-rendered HTML, minimal JavaScript, and rich 3D avatar support, all served from a single Go binary.
### Positive Consequences
* Single binary deployment — Go server serves everything
* ~20KB gzipped total JS payload (CDN-served HTMX + Alpine + Three.js)
* No npm, no webpack, no build step — assets served directly
* Server-side rendering via Go templates
* WebSocket handled natively in Go (gorilla/websocket)
* NATS integration with MessagePack in the same binary
* Distroless container image for minimal attack surface
### Negative Consequences
* Three.js adds complexity for 3D avatar rendering
* HTMX pattern less familiar to developers expecting React/Vue
* Limited client-side state management (by design)
## Technology Stack
| Layer | Technology | Purpose |
|-------|-----------|---------|
| Server | Go 1.25 | HTTP server, WebSocket, NATS client, OAuth |
| Templates | Go `html/template` | Server-side HTML rendering |
| Interactivity | HTMX 2.0 | AJAX, WebSocket, server-sent events |
| Client state | Alpine.js 3 | Lightweight reactive UI for local state |
| 3D Avatars | Three.js + VRM | 3D character rendering with lip-sync |
| Styling | Tailwind CSS 4 + DaisyUI | Utility-first CSS with component library |
| Messaging | NATS JetStream | Real-time pub/sub with MessagePack encoding |
| Auth | golang-jwt/jwt/v5 | JWT token handling for OAuth flows |
| Database | PostgreSQL (lib/pq) + SQLite | Persistent + local session storage |
| Observability | OpenTelemetry SDK | Traces, metrics via OTLP gRPC |
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Browser │
│ │
│ HTMX (server-rendered HTML) ←→ Go Server (WebSocket) │
│ Alpine.js (local UI state) │
│ Three.js (VRM 3D avatars with lip-sync) │
└───────────────────────┬─────────────────────────────────────────┘
│ HTTP/WebSocket
┌─────────────────────────────────────────────────────────────────┐
│ Go Server (single binary) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Routes │ │ OAuth │ │WebSocket │ │ OTEL │ │
│ │ (HTTP) │ │ Handlers │ │ Hub │ │ Tracing │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ └──────────────┴────────────┘ │
│ │ │
│ ┌─────────┴─────────┐ │
│ │ NATS Client │ │
│ │ (JetStream + │ │
│ │ MessagePack) │ │
│ └─────────┬─────────┘ │
└────────────────────────┼────────────────────────────────────────┘
┌────────────┴────────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ NATS JetStream │ │ Ray Serve │
│ ai.chat.* │ │ (STT, TTS, LLM, │
│ ai.voice.* │ │ Embeddings) │
└──────────────────┘ └──────────────────┘
```
## Key Features
| Feature | Implementation |
|---------|---------------|
| Real-time chat | WebSocket → NATS pub/sub per-user channels |
| Voice assistant | Streaming STT → LLM → TTS via Ray Serve endpoints |
| 3D avatars | VRM models rendered in Three.js with audio-driven lip-sync |
| OAuth login | Google, Discord, GitHub, Twitch + Authentik OIDC |
| RAG search | Milvus vector search for premium users |
| Session state | PostgreSQL (CNPG) for persistent data, SQLite for local cache |
## Kubernetes Deployment
| | |
|---|---|
| **Namespace** | `ai-ml` |
| **Replicas** | 1 |
| **Image** | `ghcr.io/billy-davies-2/companions-frontend` (distroless) |
| **Resources** | 50m/128Mi request → 500m/512Mi limit |
**OTEL sidecar:** `otel/opentelemetry-collector-contrib:0.145.0` exports traces to ClickStack.
**Backend routing:** All AI inference requests (STT, TTS, LLM, embeddings, reranking) route to Ray Serve at `ai-inference-serve-svc.ai-ml.svc.cluster.local:8000`. Auxiliary HTTPRoutes in the `auxiliary` kustomization provide direct model endpoint access at `embeddings.lab`, `whisper.lab`, `tts.lab`, `llm.lab`, `reranker.lab`.
**Access:** `companions-chat.lab.daviestechlabs.io` via envoy-internal with Authentik OIDC proxy auth.
## Links
* Related to [ADR-0003](0003-use-nats-for-messaging.md) (NATS messaging)
* Related to [ADR-0004](0004-use-messagepack-for-nats.md) (MessagePack encoding)
* Related to [ADR-0011](0011-kuberay-unified-gpu-backend.md) (Ray Serve backend)
* Related to [ADR-0028](0028-authentik-sso-strategy.md) (OAuth/OIDC)
* [HTMX Documentation](https://htmx.org/docs/)
* [VRM Specification](https://vrm.dev/en/)