Files

Billy D. 832cda34bd feat: add comprehensive architecture documentation

- Add AGENT-ONBOARDING.md for AI agents
- Add ARCHITECTURE.md with full system overview
- Add TECH-STACK.md with complete technology inventory
- Add DOMAIN-MODEL.md with entities and bounded contexts
- Add CODING-CONVENTIONS.md with patterns and practices
- Add GLOSSARY.md with terminology reference
- Add C4 diagrams (Context and Container levels)
- Add 10 ADRs documenting key decisions:
  - Talos Linux, NATS, MessagePack, Multi-GPU strategy
  - GitOps with Flux, KServe, Milvus, Dual workflow engines
  - Envoy Gateway
- Add specs directory with JetStream configuration
- Add diagrams for GPU allocation and data flows

Based on analysis of homelab-k8s2 and llm-workflows repositories
and kubectl cluster-info dump data.

2026-02-01 14:30:05 -05:00

9.8 KiB

Raw Blame History

📊 Domain Model

Core entities, bounded contexts, and relationships in the DaviesTechLabs homelab

Bounded Contexts

┌─────────────────────────────────────────────────────────────────────────────┐
│                           BOUNDED CONTEXTS                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌───────────────────┐   ┌───────────────────┐   ┌───────────────────┐     │
│  │    CHAT CONTEXT   │   │   VOICE CONTEXT   │   │ WORKFLOW CONTEXT  │     │
│  ├───────────────────┤   ├───────────────────┤   ├───────────────────┤     │
│  │ • ChatSession     │   │ • VoiceSession    │   │ • Pipeline        │     │
│  │ • ChatMessage     │   │ • AudioChunk      │   │ • PipelineRun     │     │
│  │ • Conversation    │   │ • Transcription   │   │ • Artifact        │     │
│  │ • User            │   │ • SynthesizedAudio│   │ • Experiment      │     │
│  └─────────┬─────────┘   └─────────┬─────────┘   └─────────┬─────────┘     │
│            │                       │                       │                │
│            └───────────────────────┼───────────────────────┘                │
│                                    │                                        │
│                                    ▼                                        │
│  ┌───────────────────────────────────────────────────────────────────┐     │
│  │                    INFERENCE CONTEXT                               │     │
│  ├───────────────────────────────────────────────────────────────────┤     │
│  │ • InferenceRequest  • Model  • Embedding  • Document  • Chunk     │     │
│  └───────────────────────────────────────────────────────────────────┘     │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Core Entities

User Context

User:
  id: string (UUID)
  username: string
  premium: boolean
  preferences:
    voice_id: string
    model_preference: string
    enable_rag: boolean
  created_at: timestamp
  
Session:
  id: string (UUID)
  user_id: string
  type: "chat" | "voice"
  started_at: timestamp
  last_activity: timestamp
  metadata: object

Chat Context

ChatMessage:
  id: string (UUID)
  session_id: string
  user_id: string
  role: "user" | "assistant" | "system"
  content: string
  created_at: timestamp
  metadata:
    tokens_used: integer
    latency_ms: float
    rag_sources: string[]
    model_used: string

Conversation:
  id: string (UUID)
  user_id: string
  messages: ChatMessage[]
  title: string (auto-generated)
  created_at: timestamp
  updated_at: timestamp

Voice Context

VoiceRequest:
  id: string (UUID)
  user_id: string
  audio_b64: string (base64)
  format: "wav" | "webm" | "mp3"
  language: string
  premium: boolean
  enable_rag: boolean

VoiceResponse:
  id: string (UUID)
  request_id: string
  transcription: string
  response_text: string
  audio_b64: string (base64)
  audio_format: string
  latency_ms: float
  rag_docs_used: integer

Inference Context

InferenceRequest:
  id: string (UUID)
  service: "llm" | "stt" | "tts" | "embeddings" | "reranker"
  input: string | bytes
  parameters: object
  priority: "standard" | "premium"

InferenceResponse:
  id: string (UUID)
  request_id: string
  output: string | bytes | float[]
  metadata:
    model: string
    latency_ms: float
    tokens: integer (if applicable)

RAG Context

Document:
  id: string (UUID)
  collection: string
  title: string
  content: string
  source_url: string
  ingested_at: timestamp

Chunk:
  id: string (UUID)
  document_id: string
  content: string
  embedding: float[1024]  # BGE-large dimensions
  metadata:
    position: integer
    page: integer

RAGQuery:
  query: string
  collection: string
  top_k: integer (default: 5)
  rerank: boolean (default: true)
  rerank_top_k: integer (default: 3)

RAGResult:
  chunks: Chunk[]
  scores: float[]
  reranked: boolean

Workflow Context

Pipeline:
  id: string
  name: string
  version: string
  engine: "kubeflow" | "argo"
  definition: object (YAML)
  
PipelineRun:
  id: string (UUID)
  pipeline_id: string
  status: "pending" | "running" | "succeeded" | "failed"
  started_at: timestamp
  completed_at: timestamp
  parameters: object
  artifacts: Artifact[]

Artifact:
  id: string (UUID)
  run_id: string
  name: string
  type: "model" | "dataset" | "metrics" | "logs"
  uri: string (s3://)
  metadata: object

Experiment:
  id: string (UUID)
  name: string
  runs: PipelineRun[]
  metrics: object
  created_at: timestamp

Entity Relationships

erDiagram
    USER ||--o{ SESSION : has
    USER ||--o{ CONVERSATION : owns
    SESSION ||--o{ CHAT_MESSAGE : contains
    CONVERSATION ||--o{ CHAT_MESSAGE : contains
    
    USER ||--o{ VOICE_REQUEST : makes
    VOICE_REQUEST ||--|| VOICE_RESPONSE : produces
    
    DOCUMENT ||--o{ CHUNK : contains
    CHUNK }|--|| EMBEDDING : has
    
    PIPELINE ||--o{ PIPELINE_RUN : executed_as
    PIPELINE_RUN ||--o{ ARTIFACT : produces
    EXPERIMENT ||--o{ PIPELINE_RUN : tracks
    
    INFERENCE_REQUEST }|--|| INFERENCE_RESPONSE : produces

Aggregate Roots

Aggregate	Root Entity	Child Entities
Chat	Conversation	ChatMessage
Voice	VoiceRequest	VoiceResponse
RAG	Document	Chunk, Embedding
Workflow	PipelineRun	Artifact
User	User	Session, Preferences

Event Flow

Chat Event Stream

UserLogin
  └─► SessionCreated
        └─► MessageReceived
              ├─► RAGQueryExecuted (optional)
              ├─► InferenceRequested
              └─► ResponseGenerated
                    └─► MessageStored

Voice Event Stream

VoiceRequestReceived
  └─► TranscriptionStarted
        └─► TranscriptionCompleted
              └─► RAGQueryExecuted (optional)
                    └─► LLMInferenceStarted
                          └─► LLMResponseGenerated
                                └─► TTSSynthesisStarted
                                      └─► AudioResponseReady

Workflow Event Stream

PipelineTriggerReceived
  └─► PipelineRunCreated
        └─► StepStarted (repeated)
              └─► StepCompleted (repeated)
                    └─► ArtifactProduced (repeated)
                          └─► PipelineRunCompleted

Data Retention

Entity	Retention	Storage
ChatMessage	30 days	JetStream → PostgreSQL
VoiceRequest/Response	1 hour (audio), 30 days (text)	JetStream → PostgreSQL
Chunk/Embedding	Permanent	Milvus
PipelineRun	Permanent	PostgreSQL
Artifact	Permanent	MinIO
Session	7 days	Valkey

Invariants

Chat Context

A ChatMessage must belong to exactly one Conversation
A Conversation must have at least one ChatMessage
Messages are immutable once created

Voice Context

VoiceResponse must have corresponding VoiceRequest
Audio format must be one of: wav, webm, mp3
Transcription cannot be empty for valid audio

RAG Context

Chunk must belong to exactly one Document
Embedding dimensions must match model (1024 for BGE-large)
Document must have at least one Chunk

Workflow Context

PipelineRun must reference valid Pipeline
Artifacts must have valid S3 URIs
Run status transitions: pending → running → (succeeded|failed)

Value Objects

# Immutable value objects
@dataclass(frozen=True)
class MessageContent:
    text: str
    tokens: int

@dataclass(frozen=True)  
class AudioData:
    data: bytes
    format: str
    duration_ms: int
    sample_rate: int

@dataclass(frozen=True)
class EmbeddingVector:
    values: tuple[float, ...]
    model: str
    dimensions: int

@dataclass(frozen=True)
class RAGContext:
    chunks: tuple[str, ...]
    scores: tuple[float, ...]
    query: str

ARCHITECTURE.md - System architecture
GLOSSARY.md - Term definitions
decisions/0004-use-messagepack-for-nats.md - Message format decision

9.8 KiB Raw Blame History

📊 Domain Model

Bounded Contexts

Core Entities

User Context

Chat Context

Voice Context

Inference Context

RAG Context

Workflow Context

Entity Relationships

Aggregate Roots

Event Flow

Chat Event Stream

Voice Event Stream

Workflow Event Stream

Data Retention

Invariants

Chat Context

Voice Context

RAG Context

Workflow Context

Value Objects

Related Documents

9.8 KiB

Raw Blame History