- Add AGENT-ONBOARDING.md for AI agents - Add ARCHITECTURE.md with full system overview - Add TECH-STACK.md with complete technology inventory - Add DOMAIN-MODEL.md with entities and bounded contexts - Add CODING-CONVENTIONS.md with patterns and practices - Add GLOSSARY.md with terminology reference - Add C4 diagrams (Context and Container levels) - Add 10 ADRs documenting key decisions: - Talos Linux, NATS, MessagePack, Multi-GPU strategy - GitOps with Flux, KServe, Milvus, Dual workflow engines - Envoy Gateway - Add specs directory with JetStream configuration - Add diagrams for GPU allocation and data flows Based on analysis of homelab-k8s2 and llm-workflows repositories and kubectl cluster-info dump data.
108 lines
2.7 KiB
Markdown
108 lines
2.7 KiB
Markdown
# Use Milvus for Vector Storage
|
|
|
|
* Status: accepted
|
|
* Date: 2025-12-15
|
|
* Deciders: Billy Davies
|
|
* Technical Story: Selecting vector database for RAG system
|
|
|
|
## Context and Problem Statement
|
|
|
|
The RAG (Retrieval-Augmented Generation) system requires a vector database to store document embeddings and perform similarity search. We need to store millions of embeddings and query them with low latency.
|
|
|
|
## Decision Drivers
|
|
|
|
* Query performance (< 100ms for top-k search)
|
|
* Scalability to millions of vectors
|
|
* Kubernetes-native deployment
|
|
* Active development and community
|
|
* Support for metadata filtering
|
|
* Backup and restore capabilities
|
|
|
|
## Considered Options
|
|
|
|
* Milvus
|
|
* Pinecone (managed)
|
|
* Qdrant
|
|
* Weaviate
|
|
* pgvector (PostgreSQL extension)
|
|
* Chroma
|
|
|
|
## Decision Outcome
|
|
|
|
Chosen option: "Milvus", because it provides production-grade vector search with excellent Kubernetes support, scalability, and active development.
|
|
|
|
### Positive Consequences
|
|
|
|
* High-performance similarity search
|
|
* Horizontal scalability
|
|
* Rich filtering and hybrid search
|
|
* Helm chart for Kubernetes
|
|
* Active CNCF sandbox project
|
|
* GPU acceleration available
|
|
|
|
### Negative Consequences
|
|
|
|
* Complex architecture (multiple components)
|
|
* Higher resource usage than simpler alternatives
|
|
* Requires object storage (MinIO)
|
|
* Learning curve for optimization
|
|
|
|
## Pros and Cons of the Options
|
|
|
|
### Milvus
|
|
|
|
* Good, because production-proven at scale
|
|
* Good, because rich query API
|
|
* Good, because Kubernetes-native
|
|
* Good, because hybrid search (vector + scalar)
|
|
* Good, because CNCF project
|
|
* Bad, because complex architecture
|
|
* Bad, because higher resource usage
|
|
|
|
### Pinecone
|
|
|
|
* Good, because fully managed
|
|
* Good, because simple API
|
|
* Good, because reliable
|
|
* Bad, because external dependency
|
|
* Bad, because cost at scale
|
|
* Bad, because data sovereignty concerns
|
|
|
|
### Qdrant
|
|
|
|
* Good, because simpler than Milvus
|
|
* Good, because Rust performance
|
|
* Good, because good filtering
|
|
* Bad, because smaller community
|
|
* Bad, because less enterprise features
|
|
|
|
### Weaviate
|
|
|
|
* Good, because built-in vectorization
|
|
* Good, because GraphQL API
|
|
* Good, because modules system
|
|
* Bad, because more opinionated
|
|
* Bad, because schema requirements
|
|
|
|
### pgvector
|
|
|
|
* Good, because familiar PostgreSQL
|
|
* Good, because simple deployment
|
|
* Good, because ACID transactions
|
|
* Bad, because limited scale
|
|
* Bad, because slower for large datasets
|
|
* Bad, because no specialized optimizations
|
|
|
|
### Chroma
|
|
|
|
* Good, because simple
|
|
* Good, because embedded option
|
|
* Bad, because not production-ready at scale
|
|
* Bad, because limited features
|
|
|
|
## Links
|
|
|
|
* [Milvus](https://milvus.io)
|
|
* [Milvus Helm Chart](https://github.com/milvus-io/milvus-helm)
|
|
* Related: [DOMAIN-MODEL.md](../DOMAIN-MODEL.md) - Chunk/Embedding entities
|