Files

Billy D. 832cda34bd feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.

2026-02-01 14:30:05 -05:00

2.7 KiB

Raw Blame History

Use Milvus for Vector Storage

Status: accepted
Date: 2025-12-15
Deciders: Billy Davies
Technical Story: Selecting vector database for RAG system

Context and Problem Statement

The RAG (Retrieval-Augmented Generation) system requires a vector database to store document embeddings and perform similarity search. We need to store millions of embeddings and query them with low latency.

Decision Drivers

Query performance (< 100ms for top-k search)
Scalability to millions of vectors
Kubernetes-native deployment
Active development and community
Support for metadata filtering
Backup and restore capabilities

Considered Options

Milvus
Pinecone (managed)
Qdrant
Weaviate
pgvector (PostgreSQL extension)
Chroma

Decision Outcome

Chosen option: "Milvus", because it provides production-grade vector search with excellent Kubernetes support, scalability, and active development.

Positive Consequences

High-performance similarity search
Horizontal scalability
Rich filtering and hybrid search
Helm chart for Kubernetes
Active CNCF sandbox project
GPU acceleration available

Negative Consequences

Complex architecture (multiple components)
Higher resource usage than simpler alternatives
Requires object storage (MinIO)
Learning curve for optimization

Pros and Cons of the Options

Milvus

Good, because production-proven at scale
Good, because rich query API
Good, because Kubernetes-native
Good, because hybrid search (vector + scalar)
Good, because CNCF project
Bad, because complex architecture
Bad, because higher resource usage

Pinecone

Good, because fully managed
Good, because simple API
Good, because reliable
Bad, because external dependency
Bad, because cost at scale
Bad, because data sovereignty concerns

Qdrant

Good, because simpler than Milvus
Good, because Rust performance
Good, because good filtering
Bad, because smaller community
Bad, because less enterprise features

Weaviate

Good, because built-in vectorization
Good, because GraphQL API
Good, because modules system
Bad, because more opinionated
Bad, because schema requirements

pgvector

Good, because familiar PostgreSQL
Good, because simple deployment
Good, because ACID transactions
Bad, because limited scale
Bad, because slower for large datasets
Bad, because no specialized optimizations

Chroma

Good, because simple
Good, because embedded option
Bad, because not production-ready at scale
Bad, because limited features

2.7 KiB Raw Blame History