Files
homelab-design/decisions/0003-use-nats-for-messaging.md
Billy D. 832cda34bd feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00

113 lines
3.1 KiB
Markdown

# Use NATS for AI/ML Messaging
* Status: accepted
* Date: 2025-12-01
* Deciders: Billy Davies
* Technical Story: Selecting message bus for AI service orchestration
## Context and Problem Statement
The AI/ML platform requires a messaging system for:
- Real-time chat message routing
- Voice request/response streaming
- Pipeline triggers and status updates
- Event-driven workflow orchestration
We need a messaging system that handles both ephemeral real-time messages and persistent streams.
## Decision Drivers
* Low latency for real-time chat/voice
* Persistence for audit and replay
* Simple operations for homelab
* Support for request-reply pattern
* Wildcard subscriptions for routing
* Binary message support (audio data)
## Considered Options
* Apache Kafka
* RabbitMQ
* Redis Pub/Sub + Streams
* NATS with JetStream
* Apache Pulsar
## Decision Outcome
Chosen option: "NATS with JetStream", because it provides both fire-and-forget messaging and persistent streams with significantly simpler operations than alternatives.
### Positive Consequences
* Sub-millisecond latency for real-time messages
* JetStream provides persistence when needed
* Simple deployment (single binary)
* Excellent Kubernetes integration
* Request-reply pattern built-in
* Wildcard subscriptions for flexible routing
* Low resource footprint
### Negative Consequences
* Less ecosystem than Kafka
* JetStream less mature than Kafka Streams
* No built-in schema registry
* Smaller community than RabbitMQ
## Pros and Cons of the Options
### Apache Kafka
* Good, because industry standard for streaming
* Good, because rich ecosystem (Kafka Streams, Connect)
* Good, because schema registry
* Good, because excellent for high throughput
* Bad, because operationally complex (ZooKeeper/KRaft)
* Bad, because high resource requirements
* Bad, because overkill for homelab scale
* Bad, because higher latency for real-time messages
### RabbitMQ
* Good, because mature and stable
* Good, because flexible routing
* Good, because good management UI
* Bad, because AMQP protocol overhead
* Bad, because not designed for streaming
* Bad, because more complex clustering
### Redis Pub/Sub + Streams
* Good, because simple
* Good, because already might use Redis
* Good, because low latency
* Bad, because pub/sub not persistent
* Bad, because streams API less intuitive
* Bad, because not primary purpose of Redis
### NATS with JetStream
* Good, because extremely low latency
* Good, because simple operations
* Good, because both pub/sub and persistence
* Good, because request-reply built-in
* Good, because wildcard subscriptions
* Good, because low resource usage
* Good, because excellent Go/Python clients
* Bad, because smaller ecosystem
* Bad, because JetStream newer than Kafka
### Apache Pulsar
* Good, because unified messaging + streaming
* Good, because multi-tenancy
* Good, because geo-replication
* Bad, because complex architecture
* Bad, because high resource requirements
* Bad, because smaller community
## Links
* [NATS.io](https://nats.io)
* [JetStream Documentation](https://docs.nats.io/nats-concepts/jetstream)
* Related: [ADR-0004](0004-use-messagepack-for-nats.md) - Message format