feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.
This commit is contained in:
2026-02-01 14:30:05 -05:00
parent 4d4f6f464c
commit 832cda34bd
26 changed files with 3805 additions and 2 deletions

345
DOMAIN-MODEL.md Normal file
View File

@@ -0,0 +1,345 @@
# 📊 Domain Model
> **Core entities, bounded contexts, and relationships in the DaviesTechLabs homelab**
## Bounded Contexts
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ BOUNDED CONTEXTS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │
│ │ CHAT CONTEXT │ │ VOICE CONTEXT │ │ WORKFLOW CONTEXT │ │
│ ├───────────────────┤ ├───────────────────┤ ├───────────────────┤ │
│ │ • ChatSession │ │ • VoiceSession │ │ • Pipeline │ │
│ │ • ChatMessage │ │ • AudioChunk │ │ • PipelineRun │ │
│ │ • Conversation │ │ • Transcription │ │ • Artifact │ │
│ │ • User │ │ • SynthesizedAudio│ │ • Experiment │ │
│ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │
│ │ │ │ │
│ └───────────────────────┼───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ INFERENCE CONTEXT │ │
│ ├───────────────────────────────────────────────────────────────────┤ │
│ │ • InferenceRequest • Model • Embedding • Document • Chunk │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Core Entities
### User Context
```yaml
User:
id: string (UUID)
username: string
premium: boolean
preferences:
voice_id: string
model_preference: string
enable_rag: boolean
created_at: timestamp
Session:
id: string (UUID)
user_id: string
type: "chat" | "voice"
started_at: timestamp
last_activity: timestamp
metadata: object
```
### Chat Context
```yaml
ChatMessage:
id: string (UUID)
session_id: string
user_id: string
role: "user" | "assistant" | "system"
content: string
created_at: timestamp
metadata:
tokens_used: integer
latency_ms: float
rag_sources: string[]
model_used: string
Conversation:
id: string (UUID)
user_id: string
messages: ChatMessage[]
title: string (auto-generated)
created_at: timestamp
updated_at: timestamp
```
### Voice Context
```yaml
VoiceRequest:
id: string (UUID)
user_id: string
audio_b64: string (base64)
format: "wav" | "webm" | "mp3"
language: string
premium: boolean
enable_rag: boolean
VoiceResponse:
id: string (UUID)
request_id: string
transcription: string
response_text: string
audio_b64: string (base64)
audio_format: string
latency_ms: float
rag_docs_used: integer
```
### Inference Context
```yaml
InferenceRequest:
id: string (UUID)
service: "llm" | "stt" | "tts" | "embeddings" | "reranker"
input: string | bytes
parameters: object
priority: "standard" | "premium"
InferenceResponse:
id: string (UUID)
request_id: string
output: string | bytes | float[]
metadata:
model: string
latency_ms: float
tokens: integer (if applicable)
```
### RAG Context
```yaml
Document:
id: string (UUID)
collection: string
title: string
content: string
source_url: string
ingested_at: timestamp
Chunk:
id: string (UUID)
document_id: string
content: string
embedding: float[1024] # BGE-large dimensions
metadata:
position: integer
page: integer
RAGQuery:
query: string
collection: string
top_k: integer (default: 5)
rerank: boolean (default: true)
rerank_top_k: integer (default: 3)
RAGResult:
chunks: Chunk[]
scores: float[]
reranked: boolean
```
### Workflow Context
```yaml
Pipeline:
id: string
name: string
version: string
engine: "kubeflow" | "argo"
definition: object (YAML)
PipelineRun:
id: string (UUID)
pipeline_id: string
status: "pending" | "running" | "succeeded" | "failed"
started_at: timestamp
completed_at: timestamp
parameters: object
artifacts: Artifact[]
Artifact:
id: string (UUID)
run_id: string
name: string
type: "model" | "dataset" | "metrics" | "logs"
uri: string (s3://)
metadata: object
Experiment:
id: string (UUID)
name: string
runs: PipelineRun[]
metrics: object
created_at: timestamp
```
---
## Entity Relationships
```mermaid
erDiagram
USER ||--o{ SESSION : has
USER ||--o{ CONVERSATION : owns
SESSION ||--o{ CHAT_MESSAGE : contains
CONVERSATION ||--o{ CHAT_MESSAGE : contains
USER ||--o{ VOICE_REQUEST : makes
VOICE_REQUEST ||--|| VOICE_RESPONSE : produces
DOCUMENT ||--o{ CHUNK : contains
CHUNK }|--|| EMBEDDING : has
PIPELINE ||--o{ PIPELINE_RUN : executed_as
PIPELINE_RUN ||--o{ ARTIFACT : produces
EXPERIMENT ||--o{ PIPELINE_RUN : tracks
INFERENCE_REQUEST }|--|| INFERENCE_RESPONSE : produces
```
---
## Aggregate Roots
| Aggregate | Root Entity | Child Entities |
|-----------|-------------|----------------|
| Chat | Conversation | ChatMessage |
| Voice | VoiceRequest | VoiceResponse |
| RAG | Document | Chunk, Embedding |
| Workflow | PipelineRun | Artifact |
| User | User | Session, Preferences |
---
## Event Flow
### Chat Event Stream
```
UserLogin
└─► SessionCreated
└─► MessageReceived
├─► RAGQueryExecuted (optional)
├─► InferenceRequested
└─► ResponseGenerated
└─► MessageStored
```
### Voice Event Stream
```
VoiceRequestReceived
└─► TranscriptionStarted
└─► TranscriptionCompleted
└─► RAGQueryExecuted (optional)
└─► LLMInferenceStarted
└─► LLMResponseGenerated
└─► TTSSynthesisStarted
└─► AudioResponseReady
```
### Workflow Event Stream
```
PipelineTriggerReceived
└─► PipelineRunCreated
└─► StepStarted (repeated)
└─► StepCompleted (repeated)
└─► ArtifactProduced (repeated)
└─► PipelineRunCompleted
```
---
## Data Retention
| Entity | Retention | Storage |
|--------|-----------|---------|
| ChatMessage | 30 days | JetStream → PostgreSQL |
| VoiceRequest/Response | 1 hour (audio), 30 days (text) | JetStream → PostgreSQL |
| Chunk/Embedding | Permanent | Milvus |
| PipelineRun | Permanent | PostgreSQL |
| Artifact | Permanent | MinIO |
| Session | 7 days | Valkey |
---
## Invariants
### Chat Context
- A ChatMessage must belong to exactly one Conversation
- A Conversation must have at least one ChatMessage
- Messages are immutable once created
### Voice Context
- VoiceResponse must have corresponding VoiceRequest
- Audio format must be one of: wav, webm, mp3
- Transcription cannot be empty for valid audio
### RAG Context
- Chunk must belong to exactly one Document
- Embedding dimensions must match model (1024 for BGE-large)
- Document must have at least one Chunk
### Workflow Context
- PipelineRun must reference valid Pipeline
- Artifacts must have valid S3 URIs
- Run status transitions: pending → running → (succeeded|failed)
---
## Value Objects
```python
# Immutable value objects
@dataclass(frozen=True)
class MessageContent:
text: str
tokens: int
@dataclass(frozen=True)
class AudioData:
data: bytes
format: str
duration_ms: int
sample_rate: int
@dataclass(frozen=True)
class EmbeddingVector:
values: tuple[float, ...]
model: str
dimensions: int
@dataclass(frozen=True)
class RAGContext:
chunks: tuple[str, ...]
scores: tuple[float, ...]
query: str
```
---
## Related Documents
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
- [GLOSSARY.md](GLOSSARY.md) - Term definitions
- [decisions/0004-use-messagepack-for-nats.md](decisions/0004-use-messagepack-for-nats.md) - Message format decision