feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
This commit is contained in:
2026-02-01 14:30:05 -05:00
parent 4d4f6f464c
commit 832cda34bd
26 changed files with 3805 additions and 2 deletions

View File

@@ -0,0 +1,107 @@
# Use Milvus for Vector Storage
* Status: accepted
* Date: 2025-12-15
* Deciders: Billy Davies
* Technical Story: Selecting vector database for RAG system
## Context and Problem Statement
The RAG (Retrieval-Augmented Generation) system requires a vector database to store document embeddings and perform similarity search. We need to store millions of embeddings and query them with low latency.
## Decision Drivers
* Query performance (< 100ms for top-k search)
* Scalability to millions of vectors
* Kubernetes-native deployment
* Active development and community
* Support for metadata filtering
* Backup and restore capabilities
## Considered Options
* Milvus
* Pinecone (managed)
* Qdrant
* Weaviate
* pgvector (PostgreSQL extension)
* Chroma
## Decision Outcome
Chosen option: "Milvus", because it provides production-grade vector search with excellent Kubernetes support, scalability, and active development.
### Positive Consequences
* High-performance similarity search
* Horizontal scalability
* Rich filtering and hybrid search
* Helm chart for Kubernetes
* Active CNCF sandbox project
* GPU acceleration available
### Negative Consequences
* Complex architecture (multiple components)
* Higher resource usage than simpler alternatives
* Requires object storage (MinIO)
* Learning curve for optimization
## Pros and Cons of the Options
### Milvus
* Good, because production-proven at scale
* Good, because rich query API
* Good, because Kubernetes-native
* Good, because hybrid search (vector + scalar)
* Good, because CNCF project
* Bad, because complex architecture
* Bad, because higher resource usage
### Pinecone
* Good, because fully managed
* Good, because simple API
* Good, because reliable
* Bad, because external dependency
* Bad, because cost at scale
* Bad, because data sovereignty concerns
### Qdrant
* Good, because simpler than Milvus
* Good, because Rust performance
* Good, because good filtering
* Bad, because smaller community
* Bad, because less enterprise features
### Weaviate
* Good, because built-in vectorization
* Good, because GraphQL API
* Good, because modules system
* Bad, because more opinionated
* Bad, because schema requirements
### pgvector
* Good, because familiar PostgreSQL
* Good, because simple deployment
* Good, because ACID transactions
* Bad, because limited scale
* Bad, because slower for large datasets
* Bad, because no specialized optimizations
### Chroma
* Good, because simple
* Good, because embedded option
* Bad, because not production-ready at scale
* Bad, because limited features
## Links
* [Milvus](https://milvus.io)
* [Milvus Helm Chart](https://github.com/milvus-io/milvus-helm)
* Related: [DOMAIN-MODEL.md](../DOMAIN-MODEL.md) - Chunk/Embedding entities