- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
2.7 KiB
2.7 KiB
Use Milvus for Vector Storage
- Status: accepted
- Date: 2025-12-15
- Deciders: Billy Davies
- Technical Story: Selecting vector database for RAG system
Context and Problem Statement
The RAG (Retrieval-Augmented Generation) system requires a vector database to store document embeddings and perform similarity search. We need to store millions of embeddings and query them with low latency.
Decision Drivers
- Query performance (< 100ms for top-k search)
- Scalability to millions of vectors
- Kubernetes-native deployment
- Active development and community
- Support for metadata filtering
- Backup and restore capabilities
Considered Options
- Milvus
- Pinecone (managed)
- Qdrant
- Weaviate
- pgvector (PostgreSQL extension)
- Chroma
Decision Outcome
Chosen option: "Milvus", because it provides production-grade vector search with excellent Kubernetes support, scalability, and active development.
Positive Consequences
- High-performance similarity search
- Horizontal scalability
- Rich filtering and hybrid search
- Helm chart for Kubernetes
- Active CNCF sandbox project
- GPU acceleration available
Negative Consequences
- Complex architecture (multiple components)
- Higher resource usage than simpler alternatives
- Requires object storage (MinIO)
- Learning curve for optimization
Pros and Cons of the Options
Milvus
- Good, because production-proven at scale
- Good, because rich query API
- Good, because Kubernetes-native
- Good, because hybrid search (vector + scalar)
- Good, because CNCF project
- Bad, because complex architecture
- Bad, because higher resource usage
Pinecone
- Good, because fully managed
- Good, because simple API
- Good, because reliable
- Bad, because external dependency
- Bad, because cost at scale
- Bad, because data sovereignty concerns
Qdrant
- Good, because simpler than Milvus
- Good, because Rust performance
- Good, because good filtering
- Bad, because smaller community
- Bad, because less enterprise features
Weaviate
- Good, because built-in vectorization
- Good, because GraphQL API
- Good, because modules system
- Bad, because more opinionated
- Bad, because schema requirements
pgvector
- Good, because familiar PostgreSQL
- Good, because simple deployment
- Good, because ACID transactions
- Bad, because limited scale
- Bad, because slower for large datasets
- Bad, because no specialized optimizations
Chroma
- Good, because simple
- Good, because embedded option
- Bad, because not production-ready at scale
- Bad, because limited features
Links
- Milvus
- Milvus Helm Chart
- Related: DOMAIN-MODEL.md - Chunk/Embedding entities