Files
homelab-design/decisions/0008-use-milvus-for-vectors.md
Billy D. 832cda34bd feat: add comprehensive architecture documentation
- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
2026-02-01 14:30:05 -05:00

2.7 KiB

Use Milvus for Vector Storage

  • Status: accepted
  • Date: 2025-12-15
  • Deciders: Billy Davies
  • Technical Story: Selecting vector database for RAG system

Context and Problem Statement

The RAG (Retrieval-Augmented Generation) system requires a vector database to store document embeddings and perform similarity search. We need to store millions of embeddings and query them with low latency.

Decision Drivers

  • Query performance (< 100ms for top-k search)
  • Scalability to millions of vectors
  • Kubernetes-native deployment
  • Active development and community
  • Support for metadata filtering
  • Backup and restore capabilities

Considered Options

  • Milvus
  • Pinecone (managed)
  • Qdrant
  • Weaviate
  • pgvector (PostgreSQL extension)
  • Chroma

Decision Outcome

Chosen option: "Milvus", because it provides production-grade vector search with excellent Kubernetes support, scalability, and active development.

Positive Consequences

  • High-performance similarity search
  • Horizontal scalability
  • Rich filtering and hybrid search
  • Helm chart for Kubernetes
  • Active CNCF sandbox project
  • GPU acceleration available

Negative Consequences

  • Complex architecture (multiple components)
  • Higher resource usage than simpler alternatives
  • Requires object storage (MinIO)
  • Learning curve for optimization

Pros and Cons of the Options

Milvus

  • Good, because production-proven at scale
  • Good, because rich query API
  • Good, because Kubernetes-native
  • Good, because hybrid search (vector + scalar)
  • Good, because CNCF project
  • Bad, because complex architecture
  • Bad, because higher resource usage

Pinecone

  • Good, because fully managed
  • Good, because simple API
  • Good, because reliable
  • Bad, because external dependency
  • Bad, because cost at scale
  • Bad, because data sovereignty concerns

Qdrant

  • Good, because simpler than Milvus
  • Good, because Rust performance
  • Good, because good filtering
  • Bad, because smaller community
  • Bad, because less enterprise features

Weaviate

  • Good, because built-in vectorization
  • Good, because GraphQL API
  • Good, because modules system
  • Bad, because more opinionated
  • Bad, because schema requirements

pgvector

  • Good, because familiar PostgreSQL
  • Good, because simple deployment
  • Good, because ACID transactions
  • Bad, because limited scale
  • Bad, because slower for large datasets
  • Bad, because no specialized optimizations

Chroma

  • Good, because simple
  • Good, because embedded option
  • Bad, because not production-ready at scale
  • Bad, because limited features