# Use NATS for AI/ML Messaging * Status: accepted * Date: 2025-12-01 * Deciders: Billy Davies * Technical Story: Selecting message bus for AI service orchestration ## Context and Problem Statement The AI/ML platform requires a messaging system for: - Real-time chat message routing - Voice request/response streaming - Pipeline triggers and status updates - Event-driven workflow orchestration We need a messaging system that handles both ephemeral real-time messages and persistent streams. ## Decision Drivers * Low latency for real-time chat/voice * Persistence for audit and replay * Simple operations for homelab * Support for request-reply pattern * Wildcard subscriptions for routing * Binary message support (audio data) ## Considered Options * Apache Kafka * RabbitMQ * Redis Pub/Sub + Streams * NATS with JetStream * Apache Pulsar ## Decision Outcome Chosen option: "NATS with JetStream", because it provides both fire-and-forget messaging and persistent streams with significantly simpler operations than alternatives. ### Positive Consequences * Sub-millisecond latency for real-time messages * JetStream provides persistence when needed * Simple deployment (single binary) * Excellent Kubernetes integration * Request-reply pattern built-in * Wildcard subscriptions for flexible routing * Low resource footprint ### Negative Consequences * Less ecosystem than Kafka * JetStream less mature than Kafka Streams * No built-in schema registry * Smaller community than RabbitMQ ## Pros and Cons of the Options ### Apache Kafka * Good, because industry standard for streaming * Good, because rich ecosystem (Kafka Streams, Connect) * Good, because schema registry * Good, because excellent for high throughput * Bad, because operationally complex (ZooKeeper/KRaft) * Bad, because high resource requirements * Bad, because overkill for homelab scale * Bad, because higher latency for real-time messages ### RabbitMQ * Good, because mature and stable * Good, because flexible routing * Good, because good management UI * Bad, because AMQP protocol overhead * Bad, because not designed for streaming * Bad, because more complex clustering ### Redis Pub/Sub + Streams * Good, because simple * Good, because already might use Redis * Good, because low latency * Bad, because pub/sub not persistent * Bad, because streams API less intuitive * Bad, because not primary purpose of Redis ### NATS with JetStream * Good, because extremely low latency * Good, because simple operations * Good, because both pub/sub and persistence * Good, because request-reply built-in * Good, because wildcard subscriptions * Good, because low resource usage * Good, because excellent Go/Python clients * Bad, because smaller ecosystem * Bad, because JetStream newer than Kafka ### Apache Pulsar * Good, because unified messaging + streaming * Good, because multi-tenancy * Good, because geo-replication * Bad, because complex architecture * Bad, because high resource requirements * Bad, because smaller community ## Links * [NATS.io](https://nats.io) * [JetStream Documentation](https://docs.nats.io/nats-concepts/jetstream) * Related: [ADR-0004](0004-use-messagepack-for-nats.md) - Message format