updating to match everything in my homelab.
This commit is contained in:
@@ -4,11 +4,32 @@ This directory contains additional architecture diagrams beyond the main C4 diag
|
||||
|
||||
## Available Diagrams
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| [gpu-allocation.mmd](gpu-allocation.mmd) | GPU workload distribution |
|
||||
| [data-flow-chat.mmd](data-flow-chat.mmd) | Chat request data flow |
|
||||
| [data-flow-voice.mmd](data-flow-voice.mmd) | Voice request data flow |
|
||||
| File | Description | Related ADR |
|
||||
|------|-------------|-------------|
|
||||
| [gpu-allocation.mmd](gpu-allocation.mmd) | GPU workload distribution | ADR-0005 |
|
||||
| [data-flow-chat.mmd](data-flow-chat.mmd) | Chat request data flow | ADR-0003 |
|
||||
| [data-flow-voice.mmd](data-flow-voice.mmd) | Voice request data flow | ADR-0003 |
|
||||
| [gitops-flux.mmd](gitops-flux.mmd) | GitOps reconciliation loop | ADR-0006 |
|
||||
| [dual-workflow-engines.mmd](dual-workflow-engines.mmd) | Argo vs Kubeflow decision flow | ADR-0009 |
|
||||
| [kuberay-unified-backend.mmd](kuberay-unified-backend.mmd) | RayService endpoints and GPU allocation | ADR-0011 |
|
||||
| [secrets-management.mmd](secrets-management.mmd) | SOPS bootstrap vs Vault runtime | ADR-0017 |
|
||||
| [security-policy-enforcement.mmd](security-policy-enforcement.mmd) | Gatekeeper admission + Trivy scanning | ADR-0018 |
|
||||
| [handler-deployment.mmd](handler-deployment.mmd) | Ray cluster platform layers | ADR-0019 |
|
||||
| [internal-registry.mmd](internal-registry.mmd) | Internal vs external registry paths | ADR-0020 |
|
||||
| [notification-architecture.mmd](notification-architecture.mmd) | ntfy hub with sources and consumers | ADR-0021 |
|
||||
| [ntfy-discord-bridge.mmd](ntfy-discord-bridge.mmd) | ntfy to Discord message flow | ADR-0022 |
|
||||
| [ray-repository-structure.mmd](ray-repository-structure.mmd) | Ray package build and loading | ADR-0024 |
|
||||
| [observability-stack.mmd](observability-stack.mmd) | Prometheus + ClickStack telemetry flow | ADR-0025 |
|
||||
| [storage-strategy.mmd](storage-strategy.mmd) | Longhorn + NFS dual-tier storage | ADR-0026 |
|
||||
| [database-strategy.mmd](database-strategy.mmd) | CloudNativePG cluster management | ADR-0027 |
|
||||
| [authentik-sso.mmd](authentik-sso.mmd) | Authentik authentication flow | ADR-0028 |
|
||||
| [user-registration-workflow.mmd](user-registration-workflow.mmd) | User registration and approval | ADR-0029 |
|
||||
| [velero-backup.mmd](velero-backup.mmd) | Velero backup and restore flow | ADR-0032 |
|
||||
| [analytics-lakehouse.mmd](analytics-lakehouse.mmd) | Data analytics lakehouse architecture | ADR-0033 |
|
||||
| [volcano-scheduling.mmd](volcano-scheduling.mmd) | Volcano batch scheduler and queues | ADR-0034 |
|
||||
| [cluster-topology.mmd](cluster-topology.mmd) | Node topology (x86/ARM64/GPU) | ADR-0035 |
|
||||
| [renovate-workflow.mmd](renovate-workflow.mmd) | Renovate dependency update cycle | ADR-0036 |
|
||||
| [node-naming.mmd](node-naming.mmd) | D&D-themed node naming conventions | ADR-0037 |
|
||||
|
||||
## Rendering Diagrams
|
||||
|
||||
|
||||
85
diagrams/analytics-lakehouse.mmd
Normal file
85
diagrams/analytics-lakehouse.mmd
Normal file
@@ -0,0 +1,85 @@
|
||||
%% Data Analytics Lakehouse Architecture
|
||||
%% Related: ADR-0033
|
||||
|
||||
flowchart TB
|
||||
subgraph Ingestion["Data Ingestion"]
|
||||
Kafka["Kafka<br/>Event Streams"]
|
||||
APIs["REST APIs<br/>Batch Loads"]
|
||||
Files["File Drops<br/>S3/NFS"]
|
||||
end
|
||||
|
||||
subgraph Processing["Processing Layer"]
|
||||
subgraph Batch["Batch Processing"]
|
||||
Spark["Apache Spark<br/>spark-operator"]
|
||||
end
|
||||
subgraph Stream["Stream Processing"]
|
||||
Flink["Apache Flink<br/>flink-operator"]
|
||||
end
|
||||
subgraph Realtime["Real-time"]
|
||||
RisingWave["RisingWave<br/>Streaming SQL"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph Catalog["Lakehouse Catalog"]
|
||||
Nessie["Nessie<br/>Git-like Versioning"]
|
||||
Iceberg["Apache Iceberg<br/>Table Format"]
|
||||
end
|
||||
|
||||
subgraph Storage["Storage Layer"]
|
||||
S3["S3 (MinIO)<br/>Object Storage"]
|
||||
Parquet["Parquet Files<br/>Columnar Format"]
|
||||
end
|
||||
|
||||
subgraph Query["Query Layer"]
|
||||
Trino["Trino<br/>Distributed SQL"]
|
||||
end
|
||||
|
||||
subgraph Serve["Serving Layer"]
|
||||
Grafana["Grafana<br/>Dashboards"]
|
||||
Jupyter["JupyterHub<br/>Notebooks"]
|
||||
Apps["Applications<br/>REST APIs"]
|
||||
end
|
||||
|
||||
subgraph Metadata["Metadata Store"]
|
||||
PostgreSQL["CloudNativePG<br/>analytics-db"]
|
||||
end
|
||||
|
||||
Kafka --> Flink
|
||||
Kafka --> RisingWave
|
||||
APIs --> Spark
|
||||
Files --> Spark
|
||||
|
||||
Spark --> Nessie
|
||||
Flink --> Nessie
|
||||
RisingWave --> Nessie
|
||||
|
||||
Nessie --> Iceberg
|
||||
Iceberg --> S3
|
||||
S3 --> Parquet
|
||||
|
||||
Nessie --> PostgreSQL
|
||||
|
||||
Trino --> Nessie
|
||||
Trino --> Iceberg
|
||||
|
||||
Trino --> Grafana
|
||||
Trino --> Jupyter
|
||||
Trino --> Apps
|
||||
|
||||
classDef ingest fill:#4a5568,stroke:#718096,color:#fff
|
||||
classDef batch fill:#3182ce,stroke:#2b6cb0,color:#fff
|
||||
classDef stream fill:#38a169,stroke:#2f855a,color:#fff
|
||||
classDef catalog fill:#d69e2e,stroke:#b7791f,color:#fff
|
||||
classDef storage fill:#718096,stroke:#4a5568,color:#fff
|
||||
classDef query fill:#805ad5,stroke:#6b46c1,color:#fff
|
||||
classDef serve fill:#e53e3e,stroke:#c53030,color:#fff
|
||||
classDef meta fill:#319795,stroke:#2c7a7b,color:#fff
|
||||
|
||||
class Kafka,APIs,Files ingest
|
||||
class Spark batch
|
||||
class Flink,RisingWave stream
|
||||
class Nessie,Iceberg catalog
|
||||
class S3,Parquet storage
|
||||
class Trino query
|
||||
class Grafana,Jupyter,Apps serve
|
||||
class PostgreSQL meta
|
||||
84
diagrams/authentik-sso.mmd
Normal file
84
diagrams/authentik-sso.mmd
Normal file
@@ -0,0 +1,84 @@
|
||||
```plaintext
|
||||
%% Authentik SSO Strategy (ADR-0028)
|
||||
%% Flowchart showing authentication flow stages
|
||||
|
||||
flowchart TB
|
||||
subgraph user["👤 User"]
|
||||
browser["Browser"]
|
||||
end
|
||||
|
||||
subgraph ingress["🌐 Ingress"]
|
||||
traefik["Envoy Gateway"]
|
||||
end
|
||||
|
||||
subgraph apps["📦 Applications"]
|
||||
direction LR
|
||||
oidc_app["OIDC Apps<br/>Gitea, Grafana,<br/>ArgoCD, Affine"]
|
||||
proxy_app["Proxy Apps<br/>MLflow, Kubeflow"]
|
||||
end
|
||||
|
||||
subgraph authentik["🔐 Authentik"]
|
||||
direction TB
|
||||
|
||||
subgraph components["Components"]
|
||||
server["Server<br/>(API)"]
|
||||
worker["Worker<br/>(Tasks)"]
|
||||
outpost["Outpost<br/>(Proxy Auth)"]
|
||||
end
|
||||
|
||||
subgraph flow["Authentication Flow"]
|
||||
direction LR
|
||||
f1["1️⃣ Login<br/>Stage"]
|
||||
f2["2️⃣ Username<br/>Identification"]
|
||||
f3["3️⃣ Password<br/>Validation"]
|
||||
f4["4️⃣ MFA<br/>Challenge"]
|
||||
f5["5️⃣ Session<br/>Created"]
|
||||
end
|
||||
|
||||
subgraph providers["Providers"]
|
||||
oidc_prov["OIDC Provider"]
|
||||
proxy_prov["Proxy Provider"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph storage["💾 Storage"]
|
||||
redis["Redis<br/>(Cache)"]
|
||||
postgres["PostgreSQL<br/>(CNPG)"]
|
||||
end
|
||||
|
||||
%% User flow
|
||||
browser --> traefik
|
||||
traefik --> apps
|
||||
|
||||
%% OIDC flow
|
||||
oidc_app -->|"Redirect to auth"| server
|
||||
server --> flow
|
||||
f1 --> f2 --> f3 --> f4 --> f5
|
||||
flow --> oidc_prov
|
||||
oidc_prov -->|"JWT token"| oidc_app
|
||||
|
||||
%% Proxy flow
|
||||
proxy_app -->|"Forward auth"| outpost
|
||||
outpost --> server
|
||||
server --> flow
|
||||
proxy_prov --> outpost
|
||||
|
||||
%% Storage
|
||||
server --> redis
|
||||
server --> postgres
|
||||
|
||||
classDef user fill:#3498db,color:white
|
||||
classDef ingress fill:#f39c12,color:black
|
||||
classDef app fill:#27ae60,color:white
|
||||
classDef authentik fill:#9b59b6,color:white
|
||||
classDef storage fill:#e74c3c,color:white
|
||||
classDef flow fill:#1abc9c,color:white
|
||||
|
||||
class browser user
|
||||
class traefik ingress
|
||||
class oidc_app,proxy_app app
|
||||
class server,worker,outpost,oidc_prov,proxy_prov authentik
|
||||
class redis,postgres storage
|
||||
class f1,f2,f3,f4,f5 flow
|
||||
|
||||
```
|
||||
66
diagrams/cluster-topology.mmd
Normal file
66
diagrams/cluster-topology.mmd
Normal file
@@ -0,0 +1,66 @@
|
||||
%% Cluster Node Topology
|
||||
%% Related: ADR-0035, ADR-0011, ADR-0037
|
||||
|
||||
flowchart TB
|
||||
subgraph Cluster["Homelab Kubernetes Cluster (14 nodes)"]
|
||||
subgraph ControlPlane["👑 Control Plane (Companions of the Hall)"]
|
||||
Bruenor["bruenor<br/>Intel N100"]
|
||||
Catti["catti<br/>Intel N100"]
|
||||
Storm["storm<br/>Intel N100"]
|
||||
end
|
||||
|
||||
subgraph GPUNodes["🧙 Wizards (GPU Workers)"]
|
||||
Khelben["khelben<br/>Radeon 8060S 64GB<br/>🎮 Primary AI"]
|
||||
Elminster["elminster<br/>RTX 2070 8GB<br/>🎮 CUDA"]
|
||||
Drizzt["drizzt<br/>Radeon 680M<br/>🎮 ROCm"]
|
||||
Danilo["danilo<br/>Intel Arc A770<br/>🎮 Intel"]
|
||||
Regis["regis<br/>NVIDIA GPU<br/>🎮 CUDA"]
|
||||
end
|
||||
|
||||
subgraph CPUNodes["⚔️ Fighters (CPU Workers)"]
|
||||
Wulfgar["wulfgar<br/>Intel x86_64"]
|
||||
end
|
||||
|
||||
subgraph ARMWorkers["🗡️ Rogues (ARM64 Raspberry Pi)"]
|
||||
Durnan["durnan<br/>Pi 4 8GB"]
|
||||
Elaith["elaith<br/>Pi 4 8GB"]
|
||||
Jarlaxle["jarlaxle<br/>Pi 4 8GB"]
|
||||
Mirt["mirt<br/>Pi 4 8GB"]
|
||||
Volo["volo<br/>Pi 4 8GB"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph Workloads["Workload Placement"]
|
||||
AIInference["AI Inference<br/>→ Khelben"]
|
||||
MLTraining["ML Training<br/>→ GPU Nodes"]
|
||||
EdgeServices["Lightweight Services<br/>→ ARM64"]
|
||||
General["General Workloads<br/>→ CPU + ARM64"]
|
||||
end
|
||||
|
||||
subgraph Storage["Storage Affinity"]
|
||||
Longhorn["Longhorn<br/>x86_64 only"]
|
||||
NFS["NFS<br/>All nodes"]
|
||||
end
|
||||
|
||||
AIInference -.-> Khelben
|
||||
MLTraining -.-> GPUNodes
|
||||
EdgeServices -.-> ARMWorkers
|
||||
General -.-> CPUNodes
|
||||
General -.-> ARMWorkers
|
||||
|
||||
Longhorn -.->|Excluded| ARMWorkers
|
||||
NFS --> Cluster
|
||||
|
||||
classDef control fill:#2563eb,stroke:#1d4ed8,color:#fff
|
||||
classDef gpu fill:#7c3aed,stroke:#5b21b6,color:#fff
|
||||
classDef cpu fill:#dc2626,stroke:#b91c1c,color:#fff
|
||||
classDef arm fill:#059669,stroke:#047857,color:#fff
|
||||
classDef workload fill:#9f7aea,stroke:#805ad5,color:#fff
|
||||
classDef storage fill:#ed8936,stroke:#dd6b20,color:#fff
|
||||
|
||||
class Bruenor,Catti,Storm control
|
||||
class Khelben,Elminster,Drizzt,Danilo,Regis gpu
|
||||
class Wulfgar cpu
|
||||
class Durnan,Elaith,Jarlaxle,Mirt,Volo arm
|
||||
class AIInference,MLTraining,EdgeServices,General workload
|
||||
class Longhorn,NFS storage
|
||||
96
diagrams/database-strategy.mmd
Normal file
96
diagrams/database-strategy.mmd
Normal file
@@ -0,0 +1,96 @@
|
||||
```plaintext
|
||||
%% Database Strategy with CloudNativePG (ADR-0027)
|
||||
%% C4 Component diagram showing CNPG operator and clusters
|
||||
|
||||
flowchart TB
|
||||
subgraph operator["🎛️ CNPG Operator"]
|
||||
cnpg["CloudNativePG<br/>Controller<br/>(cnpg-system)"]
|
||||
end
|
||||
|
||||
subgraph clusters["📊 PostgreSQL Clusters"]
|
||||
direction LR
|
||||
|
||||
subgraph gitea_pg["gitea-pg"]
|
||||
direction TB
|
||||
g_primary["🔵 Primary"]
|
||||
g_replica1["⚪ Replica"]
|
||||
g_replica2["⚪ Replica"]
|
||||
g_bouncer["🔗 PgBouncer"]
|
||||
end
|
||||
|
||||
subgraph authentik_db["authentik-db"]
|
||||
direction TB
|
||||
a_primary["🔵 Primary"]
|
||||
a_replica1["⚪ Replica"]
|
||||
a_replica2["⚪ Replica"]
|
||||
a_bouncer["🔗 PgBouncer"]
|
||||
end
|
||||
|
||||
subgraph companions_db["companions-db"]
|
||||
direction TB
|
||||
c_primary["🔵 Primary"]
|
||||
c_replica1["⚪ Replica"]
|
||||
c_replica2["⚪ Replica"]
|
||||
c_bouncer["🔗 PgBouncer"]
|
||||
end
|
||||
|
||||
subgraph mlflow_db["mlflow-db"]
|
||||
direction TB
|
||||
m_primary["🔵 Primary"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph storage["💾 Storage"]
|
||||
longhorn["Longhorn PVCs<br/>(NVMe/SSD)"]
|
||||
s3["S3 Backups<br/>(barman)"]
|
||||
end
|
||||
|
||||
subgraph services["🔌 Service Discovery"]
|
||||
direction TB
|
||||
rw["-rw (read-write)"]
|
||||
ro["-ro (read-only)"]
|
||||
pooler["-pooler-rw<br/>(PgBouncer)"]
|
||||
end
|
||||
|
||||
subgraph apps["📦 Applications"]
|
||||
gitea["Gitea"]
|
||||
authentik["Authentik"]
|
||||
companions["Companions"]
|
||||
mlflow["MLflow"]
|
||||
end
|
||||
|
||||
%% Operator manages clusters
|
||||
cnpg -->|"Manages"| clusters
|
||||
|
||||
%% Storage connections
|
||||
clusters --> longhorn
|
||||
clusters -->|"WAL archiving"| s3
|
||||
|
||||
%% Service routing
|
||||
g_bouncer --> rw
|
||||
a_bouncer --> rw
|
||||
c_bouncer --> rw
|
||||
g_replica1 --> ro
|
||||
g_replica2 --> ro
|
||||
|
||||
%% App connections
|
||||
gitea -->|"gitea-pg-pooler-rw"| g_bouncer
|
||||
authentik -->|"authentik-db-pooler-rw"| a_bouncer
|
||||
companions -->|"companions-db-pooler-rw"| c_bouncer
|
||||
mlflow -->|"mlflow-db-rw"| m_primary
|
||||
|
||||
classDef operator fill:#e74c3c,color:white
|
||||
classDef primary fill:#3498db,color:white
|
||||
classDef replica fill:#95a5a6,color:white
|
||||
classDef bouncer fill:#9b59b6,color:white
|
||||
classDef storage fill:#27ae60,color:white
|
||||
classDef app fill:#f39c12,color:black
|
||||
|
||||
class cnpg operator
|
||||
class g_primary,a_primary,c_primary,m_primary primary
|
||||
class g_replica1,g_replica2,a_replica1,a_replica2,c_replica1,c_replica2 replica
|
||||
class g_bouncer,a_bouncer,c_bouncer bouncer
|
||||
class longhorn,s3 storage
|
||||
class gitea,authentik,companions,mlflow app
|
||||
|
||||
```
|
||||
73
diagrams/dual-workflow-engines.mmd
Normal file
73
diagrams/dual-workflow-engines.mmd
Normal file
@@ -0,0 +1,73 @@
|
||||
```plaintext
|
||||
%% Dual Workflow Engine Strategy (ADR-0009)
|
||||
%% Flowchart showing Argo vs Kubeflow decision and integration
|
||||
|
||||
flowchart TB
|
||||
subgraph trigger["🎯 Workflow Triggers"]
|
||||
nats["NATS Event"]
|
||||
api["API Call"]
|
||||
schedule["Cron Schedule"]
|
||||
end
|
||||
|
||||
subgraph decision["❓ Which Engine?"]
|
||||
question{{"Workflow Type?"}}
|
||||
end
|
||||
|
||||
subgraph kubeflow["🔬 Kubeflow Pipelines"]
|
||||
direction TB
|
||||
kfp_train["ML Training<br/>✅ Component caching"]
|
||||
kfp_eval["Model Evaluation<br/>✅ Metric tracking"]
|
||||
kfp_exp["Experiment Comparison<br/>✅ MLflow integration"]
|
||||
end
|
||||
|
||||
subgraph argo["⚡ Argo Workflows"]
|
||||
direction TB
|
||||
argo_dag["Complex DAG<br/>✅ Advanced control flow"]
|
||||
argo_batch["Batch Processing<br/>✅ Parallelization"]
|
||||
argo_ingest["Document Ingestion<br/>✅ Simple steps"]
|
||||
end
|
||||
|
||||
subgraph hybrid["🔗 Hybrid Pattern"]
|
||||
direction TB
|
||||
argo_orch["Argo Orchestrates"]
|
||||
kfp_step["KFP via API"]
|
||||
argo_orch --> kfp_step
|
||||
end
|
||||
|
||||
subgraph integration["📡 Integration Layer"]
|
||||
direction TB
|
||||
events["Argo Events<br/>EventSource + Sensor"]
|
||||
end
|
||||
|
||||
%% Flow from triggers
|
||||
nats --> events
|
||||
api --> decision
|
||||
schedule --> events
|
||||
events --> decision
|
||||
|
||||
%% Decision branches
|
||||
question -->|"ML training<br/>with caching"| kubeflow
|
||||
question -->|"Complex DAG<br/>batch jobs"| argo
|
||||
question -->|"ML + complex<br/>orchestration"| hybrid
|
||||
|
||||
%% Kubeflow use cases
|
||||
kfp_train --> kfp_eval
|
||||
kfp_eval --> kfp_exp
|
||||
|
||||
%% Argo use cases
|
||||
argo_dag --> argo_batch
|
||||
argo_batch --> argo_ingest
|
||||
|
||||
classDef trigger fill:#f39c12,color:black
|
||||
classDef kubeflow fill:#4a90d9,color:white
|
||||
classDef argo fill:#ef6c00,color:white
|
||||
classDef hybrid fill:#8e44ad,color:white
|
||||
classDef integration fill:#27ae60,color:white
|
||||
|
||||
class nats,api,schedule trigger
|
||||
class kfp_train,kfp_eval,kfp_exp kubeflow
|
||||
class argo_dag,argo_batch,argo_ingest argo
|
||||
class argo_orch,kfp_step hybrid
|
||||
class events integration
|
||||
|
||||
```
|
||||
57
diagrams/gitops-flux.mmd
Normal file
57
diagrams/gitops-flux.mmd
Normal file
@@ -0,0 +1,57 @@
|
||||
```plaintext
|
||||
%% GitOps Reconciliation Loop (ADR-0006)
|
||||
%% Flowchart showing Flux CD GitOps workflow
|
||||
|
||||
flowchart TB
|
||||
subgraph git["📂 Git Repositories"]
|
||||
direction TB
|
||||
homelab["homelab-k8s2<br/>(cluster config)"]
|
||||
apps["Application Repos<br/>(argo, kubeflow, etc.)"]
|
||||
end
|
||||
|
||||
subgraph flux["⚙️ Flux Controllers"]
|
||||
direction TB
|
||||
source["Source Controller<br/>📥 Fetches repos"]
|
||||
kustomize["Kustomize Controller<br/>🔧 Applies manifests"]
|
||||
helm["Helm Controller<br/>📦 Manages charts"]
|
||||
notification["Notification Controller<br/>📢 Alerts"]
|
||||
end
|
||||
|
||||
subgraph k8s["☸️ Kubernetes Cluster"]
|
||||
direction TB
|
||||
secrets["🔐 SOPS Secrets<br/>(Age decrypted)"]
|
||||
resources["📋 Deployed Resources<br/>(Pods, Services, etc.)"]
|
||||
drift["🔄 Drift Detection"]
|
||||
end
|
||||
|
||||
subgraph notify["📱 Notifications"]
|
||||
ntfy["ntfy<br/>(push alerts)"]
|
||||
end
|
||||
|
||||
%% GitOps flow
|
||||
homelab -->|"GitRepository CR"| source
|
||||
apps -->|"GitRepository CR"| source
|
||||
source -->|"Fetches every 5m"| kustomize
|
||||
source -->|"Fetches charts"| helm
|
||||
|
||||
kustomize -->|"Decrypts with Age"| secrets
|
||||
kustomize -->|"kubectl apply"| resources
|
||||
helm -->|"helm upgrade"| resources
|
||||
|
||||
resources -->|"Actual state"| drift
|
||||
drift -->|"Compares to Git"| kustomize
|
||||
drift -->|"Auto-corrects"| resources
|
||||
|
||||
notification -->|"Success/failure"| ntfy
|
||||
|
||||
classDef repo fill:#f5a623,color:black
|
||||
classDef controller fill:#4a90d9,color:white
|
||||
classDef cluster fill:#50c878,color:white
|
||||
classDef alert fill:#9b59b6,color:white
|
||||
|
||||
class homelab,apps repo
|
||||
class source,kustomize,helm,notification controller
|
||||
class secrets,resources,drift cluster
|
||||
class ntfy alert
|
||||
|
||||
```
|
||||
67
diagrams/handler-deployment.mmd
Normal file
67
diagrams/handler-deployment.mmd
Normal file
@@ -0,0 +1,67 @@
|
||||
```plaintext
|
||||
%% Handler Deployment Strategy (ADR-0019)
|
||||
%% C4 Component diagram showing platform layers with Ray cluster
|
||||
|
||||
flowchart TB
|
||||
subgraph platform["🏗️ Platform Layer"]
|
||||
direction LR
|
||||
kubeflow["📊 Kubeflow<br/>Pipelines"]
|
||||
kserve["🎯 KServe<br/>(visibility)"]
|
||||
mlflow["📈 MLflow<br/>(registry)"]
|
||||
end
|
||||
|
||||
subgraph ray["⚡ Ray Cluster"]
|
||||
direction TB
|
||||
|
||||
subgraph gpu_apps["🎮 GPU Inference (Workers)"]
|
||||
direction LR
|
||||
llm["/llm<br/>vLLM<br/>🟢 khelben 0.95 GPU"]
|
||||
whisper["/whisper<br/>Whisper<br/>🟡 elminster 0.5 GPU"]
|
||||
tts["/tts<br/>XTTS<br/>🟡 elminster 0.5 GPU"]
|
||||
embeddings["/embeddings<br/>BGE<br/>🔴 drizzt 0.8 GPU"]
|
||||
reranker["/reranker<br/>BGE<br/>🔵 danilo 0.8 GPU"]
|
||||
end
|
||||
|
||||
subgraph cpu_apps["🖥️ CPU Handlers (Head Node)"]
|
||||
direction LR
|
||||
chat["/chat<br/>ChatHandler<br/>0 GPU"]
|
||||
voice["/voice<br/>VoiceHandler<br/>0 GPU"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph support["🔧 Supporting Services"]
|
||||
direction LR
|
||||
nats["📨 NATS<br/>(events)"]
|
||||
milvus["🔍 Milvus<br/>(vectors)"]
|
||||
valkey["💾 Valkey<br/>(cache)"]
|
||||
end
|
||||
|
||||
subgraph pypi["📦 Package Registry"]
|
||||
gitea_pypi["Gitea PyPI<br/>• handler-base<br/>• chat-handler<br/>• voice-assistant"]
|
||||
end
|
||||
|
||||
%% Connections
|
||||
kubeflow --> ray
|
||||
kserve --> ray
|
||||
mlflow --> ray
|
||||
|
||||
cpu_apps -->|"Ray internal calls"| gpu_apps
|
||||
cpu_apps --> nats
|
||||
cpu_apps --> milvus
|
||||
cpu_apps --> valkey
|
||||
|
||||
gitea_pypi -->|"pip install<br/>runtime_env"| cpu_apps
|
||||
|
||||
classDef platform fill:#9b59b6,color:white
|
||||
classDef gpu fill:#e74c3c,color:white
|
||||
classDef cpu fill:#3498db,color:white
|
||||
classDef support fill:#27ae60,color:white
|
||||
classDef registry fill:#f39c12,color:black
|
||||
|
||||
class kubeflow,kserve,mlflow platform
|
||||
class llm,whisper,tts,embeddings,reranker gpu
|
||||
class chat,voice cpu
|
||||
class nats,milvus,valkey support
|
||||
class gitea_pypi registry
|
||||
|
||||
```
|
||||
53
diagrams/internal-registry.mmd
Normal file
53
diagrams/internal-registry.mmd
Normal file
@@ -0,0 +1,53 @@
|
||||
```plaintext
|
||||
%% Internal Registry for CI/CD (ADR-0020)
|
||||
%% Flowchart showing dual-path for external vs internal access
|
||||
|
||||
flowchart TB
|
||||
subgraph external["🌐 External Access"]
|
||||
internet["Internet"]
|
||||
cloudflare["☁️ Cloudflare<br/>⚠️ 100MB upload limit"]
|
||||
external_url["git.daviestechlabs.io"]
|
||||
end
|
||||
|
||||
subgraph internal["🏠 Internal Access"]
|
||||
internal_url["registry.lab.daviestechlabs.io<br/>✅ No upload limits"]
|
||||
end
|
||||
|
||||
subgraph gitea["📦 Gitea Instance"]
|
||||
direction TB
|
||||
git_server["Git Server"]
|
||||
docker_registry["Docker Registry"]
|
||||
pypi_registry["PyPI Registry"]
|
||||
end
|
||||
|
||||
subgraph runners["🏃 CI/CD Runners"]
|
||||
gitea_runner["Gitea Actions Runner<br/>(in-cluster)"]
|
||||
end
|
||||
|
||||
subgraph operations["📋 Operations"]
|
||||
small_ops["Small Operations<br/>• git clone/push<br/>• pip install<br/>• docker pull"]
|
||||
large_ops["Large Uploads<br/>• docker push (20GB+)<br/>• pypi upload"]
|
||||
end
|
||||
|
||||
%% External path (limited)
|
||||
internet --> cloudflare
|
||||
cloudflare -->|"100MB limit"| external_url
|
||||
external_url --> gitea
|
||||
small_ops --> cloudflare
|
||||
|
||||
%% Internal path (unlimited)
|
||||
gitea_runner -->|"Direct"| internal_url
|
||||
internal_url --> gitea
|
||||
large_ops --> internal_url
|
||||
|
||||
classDef external fill:#e74c3c,color:white
|
||||
classDef internal fill:#27ae60,color:white
|
||||
classDef gitea fill:#f39c12,color:black
|
||||
classDef runner fill:#3498db,color:white
|
||||
|
||||
class internet,cloudflare,external_url external
|
||||
class internal_url internal
|
||||
class git_server,docker_registry,pypi_registry gitea
|
||||
class gitea_runner runner
|
||||
|
||||
```
|
||||
77
diagrams/kuberay-unified-backend.mmd
Normal file
77
diagrams/kuberay-unified-backend.mmd
Normal file
@@ -0,0 +1,77 @@
|
||||
```plaintext
|
||||
%% KubeRay Unified GPU Backend (ADR-0011)
|
||||
%% C4 Component diagram showing RayService endpoints and GPU allocation
|
||||
|
||||
flowchart TB
|
||||
subgraph clients["🔌 Clients"]
|
||||
chat["Chat Handler"]
|
||||
voice["Voice Handler"]
|
||||
end
|
||||
|
||||
subgraph rayservice["⚡ KubeRay RayService"]
|
||||
endpoint["ai-inference-serve-svc:8000"]
|
||||
|
||||
subgraph deployments["Ray Serve Deployments"]
|
||||
direction TB
|
||||
|
||||
subgraph strixhalo["🟢 khelben (Strix Halo 64GB)"]
|
||||
llm["/llm<br/>vLLM 70B<br/>0.95 GPU"]
|
||||
end
|
||||
|
||||
subgraph rtx2070["🟡 elminster (RTX 2070 8GB)"]
|
||||
whisper["/whisper<br/>Whisper v3<br/>0.5 GPU"]
|
||||
tts["/tts<br/>XTTS<br/>0.5 GPU"]
|
||||
end
|
||||
|
||||
subgraph radeon680m["🔴 drizzt (Radeon 680M 12GB)"]
|
||||
embeddings["/embeddings<br/>BGE-Large<br/>0.8 GPU"]
|
||||
end
|
||||
|
||||
subgraph intelarc["🔵 danilo (Intel Arc)"]
|
||||
reranker["/reranker<br/>BGE-Reranker<br/>0.8 GPU"]
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
subgraph kserve["🎯 KServe Compatibility Layer"]
|
||||
direction TB
|
||||
svc1["whisper-predictor.ai-ml"]
|
||||
svc2["tts-predictor.ai-ml"]
|
||||
svc3["llm-predictor.ai-ml"]
|
||||
svc4["embeddings-predictor.ai-ml"]
|
||||
svc5["reranker-predictor.ai-ml"]
|
||||
end
|
||||
|
||||
%% Client connections
|
||||
chat --> endpoint
|
||||
voice --> endpoint
|
||||
|
||||
%% Path routing
|
||||
endpoint --> llm
|
||||
endpoint --> whisper
|
||||
endpoint --> tts
|
||||
endpoint --> embeddings
|
||||
endpoint --> reranker
|
||||
|
||||
%% KServe aliases
|
||||
svc1 -->|"ExternalName"| endpoint
|
||||
svc2 -->|"ExternalName"| endpoint
|
||||
svc3 -->|"ExternalName"| endpoint
|
||||
svc4 -->|"ExternalName"| endpoint
|
||||
svc5 -->|"ExternalName"| endpoint
|
||||
|
||||
classDef client fill:#3498db,color:white
|
||||
classDef endpoint fill:#9b59b6,color:white
|
||||
classDef amd fill:#ED1C24,color:white
|
||||
classDef nvidia fill:#76B900,color:white
|
||||
classDef intel fill:#0071C5,color:white
|
||||
classDef kserve fill:#f39c12,color:black
|
||||
|
||||
class chat,voice client
|
||||
class endpoint endpoint
|
||||
class llm,embeddings amd
|
||||
class whisper,tts nvidia
|
||||
class reranker intel
|
||||
class svc1,svc2,svc3,svc4,svc5 kserve
|
||||
|
||||
```
|
||||
64
diagrams/node-naming.mmd
Normal file
64
diagrams/node-naming.mmd
Normal file
@@ -0,0 +1,64 @@
|
||||
%% Node Naming Conventions - D&D Theme
|
||||
%% Related: ADR-0037
|
||||
|
||||
flowchart TB
|
||||
subgraph Cluster["Homelab Kubernetes Cluster (14 nodes)"]
|
||||
subgraph ControlPlane["👑 Control Plane (Companions of the Hall)"]
|
||||
Bruenor["bruenor<br/>Intel N100<br/><i>Dwarf King</i>"]
|
||||
Catti["catti<br/>Intel N100<br/><i>Catti-brie</i>"]
|
||||
Storm["storm<br/>Intel N100<br/><i>Storm Silverhand</i>"]
|
||||
end
|
||||
|
||||
subgraph Wizards["🧙 Wizards (GPU Spellcasters)"]
|
||||
Khelben["khelben<br/>Radeon 8060S 64GB<br/><i>The Blackstaff</i>"]
|
||||
Elminster["elminster<br/>RTX 2070 8GB<br/><i>Sage of Shadowdale</i>"]
|
||||
Drizzt["drizzt<br/>Radeon 680M<br/><i>Ranger-Mage</i>"]
|
||||
Danilo["danilo<br/>Intel Arc A770<br/><i>Bard-Wizard</i>"]
|
||||
Regis["regis<br/>NVIDIA GPU<br/><i>Halfling Spellthief</i>"]
|
||||
end
|
||||
|
||||
subgraph Rogues["🗡️ Rogues (ARM64 Edge Nodes)"]
|
||||
Durnan["durnan<br/>Pi 4 8GB<br/><i>Yawning Portal</i>"]
|
||||
Elaith["elaith<br/>Pi 4 8GB<br/><i>The Serpent</i>"]
|
||||
Jarlaxle["jarlaxle<br/>Pi 4 8GB<br/><i>Bregan D'aerthe</i>"]
|
||||
Mirt["mirt<br/>Pi 4 8GB<br/><i>Old Wolf</i>"]
|
||||
Volo["volo<br/>Pi 4 8GB<br/><i>Famous Author</i>"]
|
||||
end
|
||||
|
||||
subgraph Fighters["⚔️ Fighters (x86 CPU Workers)"]
|
||||
Wulfgar["wulfgar<br/>Intel x86_64<br/><i>Barbarian of Icewind Dale</i>"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph Infrastructure["🏰 Locations (Off-Cluster Infrastructure)"]
|
||||
Candlekeep["📚 candlekeep<br/>Synology NAS<br/>nfs-default<br/><i>Library Fortress</i>"]
|
||||
Neverwinter["❄️ neverwinter<br/>TrueNAS Scale (SSD)<br/>nfs-fast<br/><i>Jewel of the North</i>"]
|
||||
Waterdeep["🏙️ waterdeep<br/>Mac Mini<br/>Dev Workstation<br/><i>City of Splendors</i>"]
|
||||
end
|
||||
|
||||
subgraph Workloads["Workload Routing"]
|
||||
AI["AI/ML Inference"] --> Wizards
|
||||
Edge["Edge Services"] --> Rogues
|
||||
Compute["General Compute"] --> Fighters
|
||||
Storage["Storage I/O"] --> Infrastructure
|
||||
end
|
||||
|
||||
ControlPlane -.->|"etcd"| ControlPlane
|
||||
Wizards -.->|"Fast Storage"| Neverwinter
|
||||
Wizards -.->|"Backups"| Candlekeep
|
||||
Rogues -.->|"NFS Mounts"| Candlekeep
|
||||
Fighters -.->|"NFS Mounts"| Candlekeep
|
||||
|
||||
classDef control fill:#2563eb,stroke:#1d4ed8,color:#fff
|
||||
classDef wizard fill:#7c3aed,stroke:#5b21b6,color:#fff
|
||||
classDef rogue fill:#059669,stroke:#047857,color:#fff
|
||||
classDef fighter fill:#dc2626,stroke:#b91c1c,color:#fff
|
||||
classDef location fill:#d97706,stroke:#b45309,color:#fff
|
||||
classDef workload fill:#4b5563,stroke:#374151,color:#fff
|
||||
|
||||
class Bruenor,Catti,Storm control
|
||||
class Khelben,Elminster,Drizzt,Danilo,Regis wizard
|
||||
class Durnan,Elaith,Jarlaxle,Mirt,Volo rogue
|
||||
class Wulfgar fighter
|
||||
class Candlekeep,Neverwinter,Waterdeep location
|
||||
class AI,Edge,Compute,Storage workload
|
||||
63
diagrams/notification-architecture.mmd
Normal file
63
diagrams/notification-architecture.mmd
Normal file
@@ -0,0 +1,63 @@
|
||||
```plaintext
|
||||
%% Notification Architecture (ADR-0021)
|
||||
%% C4 Component diagram showing notification sources and hub
|
||||
|
||||
flowchart LR
|
||||
subgraph sources["📤 Notification Sources"]
|
||||
direction TB
|
||||
ci["🔧 Gitea Actions<br/>CI/CD builds"]
|
||||
alertmanager["🔔 Alertmanager<br/>Prometheus alerts"]
|
||||
gatus["❤️ Gatus<br/>Health monitoring"]
|
||||
flux["🔄 Flux<br/>GitOps events"]
|
||||
end
|
||||
|
||||
subgraph hub["📡 Central Hub"]
|
||||
ntfy["📢 ntfy<br/>Notification Server"]
|
||||
end
|
||||
|
||||
subgraph topics["🏷️ Topics"]
|
||||
direction TB
|
||||
t_ci["gitea-ci"]
|
||||
t_alerts["alertmanager-alerts"]
|
||||
t_gatus["gatus"]
|
||||
t_flux["flux"]
|
||||
t_deploy["deployments"]
|
||||
end
|
||||
|
||||
subgraph consumers["📱 Consumers"]
|
||||
direction TB
|
||||
mobile["📱 ntfy App<br/>(iOS/Android)"]
|
||||
bridge["🌉 ntfy-discord<br/>Bridge"]
|
||||
discord["💬 Discord<br/>Webhooks"]
|
||||
end
|
||||
|
||||
%% Source to hub
|
||||
ci -->|"POST"| ntfy
|
||||
alertmanager -->|"webhook"| ntfy
|
||||
gatus -->|"webhook"| ntfy
|
||||
flux -->|"notification-controller"| ntfy
|
||||
|
||||
%% Hub to topics
|
||||
ntfy --> topics
|
||||
|
||||
%% Topics to consumers
|
||||
t_ci --> mobile
|
||||
t_alerts --> mobile
|
||||
t_gatus --> mobile
|
||||
t_flux --> mobile
|
||||
t_deploy --> mobile
|
||||
|
||||
topics --> bridge
|
||||
bridge --> discord
|
||||
|
||||
classDef source fill:#3498db,color:white
|
||||
classDef hub fill:#e74c3c,color:white
|
||||
classDef topic fill:#9b59b6,color:white
|
||||
classDef consumer fill:#27ae60,color:white
|
||||
|
||||
class ci,alertmanager,gatus,flux source
|
||||
class ntfy hub
|
||||
class t_ci,t_alerts,t_gatus,t_flux,t_deploy topic
|
||||
class mobile,bridge,discord consumer
|
||||
|
||||
```
|
||||
45
diagrams/ntfy-discord-bridge.mmd
Normal file
45
diagrams/ntfy-discord-bridge.mmd
Normal file
@@ -0,0 +1,45 @@
|
||||
```plaintext
|
||||
%% ntfy-Discord Bridge (ADR-0022)
|
||||
%% Sequence diagram showing message flow and transformation
|
||||
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant S as Notification Source<br/>(CI/Alertmanager)
|
||||
participant N as ntfy<br/>Notification Hub
|
||||
participant B as ntfy-discord<br/>Go Bridge
|
||||
participant D as Discord<br/>Webhook
|
||||
|
||||
Note over S,N: Events published to ntfy topics
|
||||
|
||||
S->>N: POST /gitea-ci<br/>{title, message, priority}
|
||||
|
||||
Note over N,B: SSE subscription for real-time
|
||||
|
||||
N-->>B: SSE JSON stream<br/>{topic, message, priority, tags}
|
||||
|
||||
Note over B: Message transformation
|
||||
|
||||
rect rgb(240, 240, 240)
|
||||
B->>B: Map priority to embed color<br/>urgent=red, high=orange<br/>default=blue, low=gray
|
||||
B->>B: Format as Discord embed<br/>{embeds: [{title, description, color}]}
|
||||
end
|
||||
|
||||
B->>D: POST webhook URL<br/>Discord embed format
|
||||
|
||||
Note over B: Hot-reload support
|
||||
|
||||
rect rgb(230, 245, 230)
|
||||
B->>B: fsnotify watches secrets
|
||||
B->>B: Reload config without restart
|
||||
end
|
||||
|
||||
Note over B,D: Retry with exponential backoff
|
||||
|
||||
alt Webhook fails
|
||||
B-->>B: Retry (2s, 4s, 8s...)
|
||||
B->>D: Retry POST
|
||||
end
|
||||
|
||||
D-->>D: Display in channel
|
||||
|
||||
```
|
||||
72
diagrams/observability-stack.mmd
Normal file
72
diagrams/observability-stack.mmd
Normal file
@@ -0,0 +1,72 @@
|
||||
```plaintext
|
||||
%% Observability Stack Architecture (ADR-0025)
|
||||
%% C4 Component diagram showing telemetry flow
|
||||
|
||||
flowchart TB
|
||||
subgraph apps["📦 Applications"]
|
||||
direction LR
|
||||
go["Go Apps<br/>(OTEL SDK)"]
|
||||
python["Python Apps<br/>(OTEL SDK)"]
|
||||
node["Node.js Apps<br/>(OTEL SDK)"]
|
||||
java["Java Apps<br/>(OTEL SDK)"]
|
||||
end
|
||||
|
||||
subgraph collection["📡 Telemetry Collection"]
|
||||
otel["OpenTelemetry<br/>Collector<br/>━━━━━━━━<br/>OTLP gRPC :4317<br/>OTLP HTTP :4318"]
|
||||
end
|
||||
|
||||
subgraph storage["💾 Storage Layer"]
|
||||
direction LR
|
||||
|
||||
subgraph metrics_store["Metrics"]
|
||||
prometheus["📊 Prometheus<br/>14d retention<br/>50GB"]
|
||||
end
|
||||
|
||||
subgraph logs_traces["Logs & Traces"]
|
||||
clickstack["📋 ClickStack<br/>(ClickHouse)"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph visualization["📈 Visualization"]
|
||||
grafana["🎨 Grafana<br/>Dashboards<br/>& Exploration"]
|
||||
end
|
||||
|
||||
subgraph alerting["🔔 Alerting Pipeline"]
|
||||
alertmanager["⚠️ Alertmanager"]
|
||||
ntfy["📱 ntfy<br/>(Push)"]
|
||||
discord["💬 Discord"]
|
||||
end
|
||||
|
||||
%% App to collector
|
||||
go -->|"OTLP"| otel
|
||||
python -->|"OTLP"| otel
|
||||
node -->|"OTLP"| otel
|
||||
java -->|"OTLP"| otel
|
||||
|
||||
%% Collector to storage
|
||||
otel -->|"Metrics"| prometheus
|
||||
otel -->|"Logs"| clickstack
|
||||
otel -->|"Traces"| clickstack
|
||||
|
||||
%% Storage to visualization
|
||||
prometheus --> grafana
|
||||
clickstack --> grafana
|
||||
|
||||
%% Alerting flow
|
||||
prometheus -->|"PrometheusRules"| alertmanager
|
||||
alertmanager --> ntfy
|
||||
ntfy --> discord
|
||||
|
||||
classDef app fill:#3498db,color:white
|
||||
classDef otel fill:#e74c3c,color:white
|
||||
classDef storage fill:#27ae60,color:white
|
||||
classDef viz fill:#9b59b6,color:white
|
||||
classDef alert fill:#f39c12,color:black
|
||||
|
||||
class go,python,node,java app
|
||||
class otel otel
|
||||
class prometheus,clickstack storage
|
||||
class grafana viz
|
||||
class alertmanager,ntfy,discord alert
|
||||
|
||||
```
|
||||
66
diagrams/ray-repository-structure.mmd
Normal file
66
diagrams/ray-repository-structure.mmd
Normal file
@@ -0,0 +1,66 @@
|
||||
```plaintext
|
||||
%% Ray Repository Structure (ADR-0024)
|
||||
%% Flowchart showing build and dynamic loading flow
|
||||
|
||||
flowchart TB
|
||||
subgraph repos["📁 Repositories"]
|
||||
direction LR
|
||||
kuberay["kuberay-images<br/>🐳 Docker images<br/>(infrequent updates)"]
|
||||
rayserve["ray-serve<br/>📦 PyPI package<br/>(frequent updates)"]
|
||||
end
|
||||
|
||||
subgraph ci["🔧 CI/CD Pipelines"]
|
||||
direction LR
|
||||
build_images["Build Docker<br/>nvidia, rdna2,<br/>strixhalo, intel"]
|
||||
build_pypi["Build wheel<br/>uv build"]
|
||||
end
|
||||
|
||||
subgraph registries["📦 Registries"]
|
||||
direction LR
|
||||
container_reg["🐳 Container Registry<br/>registry.lab.daviestechlabs.io"]
|
||||
pypi_reg["📦 PyPI Registry<br/>git.daviestechlabs.io/pypi"]
|
||||
end
|
||||
|
||||
subgraph ray["⚡ Ray Cluster"]
|
||||
direction TB
|
||||
head["🧠 Head Node"]
|
||||
workers["🖥️ Worker Nodes<br/>(GPU-specific)"]
|
||||
|
||||
subgraph runtime["🔄 Runtime Loading"]
|
||||
pull_image["docker pull<br/>ray-worker-*"]
|
||||
pip_install["pip install ray-serve<br/>runtime_env"]
|
||||
end
|
||||
|
||||
serve_apps["Ray Serve Apps<br/>/llm, /whisper, etc."]
|
||||
end
|
||||
|
||||
subgraph k8s["☸️ Kubernetes"]
|
||||
manifests["RayService CR<br/>(homelab-k8s2)"]
|
||||
end
|
||||
|
||||
%% Build flows
|
||||
kuberay --> build_images
|
||||
rayserve --> build_pypi
|
||||
build_images --> container_reg
|
||||
build_pypi --> pypi_reg
|
||||
|
||||
%% Deployment flow
|
||||
manifests --> ray
|
||||
container_reg --> pull_image
|
||||
pull_image --> workers
|
||||
pypi_reg --> pip_install
|
||||
pip_install --> serve_apps
|
||||
|
||||
classDef repo fill:#3498db,color:white
|
||||
classDef ci fill:#f39c12,color:black
|
||||
classDef registry fill:#9b59b6,color:white
|
||||
classDef ray fill:#27ae60,color:white
|
||||
classDef k8s fill:#e74c3c,color:white
|
||||
|
||||
class kuberay,rayserve repo
|
||||
class build_images,build_pypi ci
|
||||
class container_reg,pypi_reg registry
|
||||
class head,workers,pull_image,pip_install,serve_apps ray
|
||||
class manifests k8s
|
||||
|
||||
```
|
||||
86
diagrams/renovate-workflow.mmd
Normal file
86
diagrams/renovate-workflow.mmd
Normal file
@@ -0,0 +1,86 @@
|
||||
%% Renovate Dependency Update Workflow
|
||||
%% Related: ADR-0036
|
||||
|
||||
flowchart TB
|
||||
subgraph Schedule["Schedule"]
|
||||
Cron["CronJob<br/>Every 8 hours"]
|
||||
end
|
||||
|
||||
subgraph Renovate["Renovate (ci-cd namespace)"]
|
||||
Job["Renovate Job"]
|
||||
|
||||
subgraph Scan["Repository Scan"]
|
||||
Discover["Autodiscover<br/>Gitea Repos"]
|
||||
Parse["Parse Dependencies<br/>40+ managers"]
|
||||
Compare["Compare Versions<br/>Check registries"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph Registries["Version Sources"]
|
||||
DockerHub["Docker Hub"]
|
||||
GHCR["GHCR"]
|
||||
PyPI["PyPI"]
|
||||
GoProxy["Go Proxy"]
|
||||
Helm["Helm Repos"]
|
||||
end
|
||||
|
||||
subgraph Gitea["Gitea Repositories"]
|
||||
subgraph Repos["Scanned Repos"]
|
||||
K8s["homelab-k8s2"]
|
||||
Handler["chat-handler"]
|
||||
KubeRay["kuberay-images"]
|
||||
More["...20+ repos"]
|
||||
end
|
||||
|
||||
subgraph PRs["Generated PRs"]
|
||||
Grouped["Grouped PR<br/>all-non-major"]
|
||||
Security["Security PR<br/>CVE fixes"]
|
||||
Major["Major PR<br/>breaking changes"]
|
||||
end
|
||||
|
||||
Dashboard["Dependency Dashboard<br/>Issue #1"]
|
||||
end
|
||||
|
||||
subgraph Merge["Merge Strategy"]
|
||||
AutoMerge["Auto-merge<br/>patch + minor"]
|
||||
Review["Manual Review<br/>major updates"]
|
||||
end
|
||||
|
||||
Cron --> Job
|
||||
Job --> Discover
|
||||
Discover --> Parse
|
||||
Parse --> Compare
|
||||
|
||||
Compare --> DockerHub
|
||||
Compare --> GHCR
|
||||
Compare --> PyPI
|
||||
Compare --> GoProxy
|
||||
Compare --> Helm
|
||||
|
||||
Discover --> K8s
|
||||
Discover --> Handler
|
||||
Discover --> KubeRay
|
||||
Discover --> More
|
||||
|
||||
Compare --> Grouped
|
||||
Compare --> Security
|
||||
Compare --> Major
|
||||
Job --> Dashboard
|
||||
|
||||
Grouped --> AutoMerge
|
||||
Security --> AutoMerge
|
||||
Major --> Review
|
||||
|
||||
classDef schedule fill:#4a5568,stroke:#718096,color:#fff
|
||||
classDef renovate fill:#667eea,stroke:#5a67d8,color:#fff
|
||||
classDef registry fill:#ed8936,stroke:#dd6b20,color:#fff
|
||||
classDef repo fill:#38a169,stroke:#2f855a,color:#fff
|
||||
classDef pr fill:#9f7aea,stroke:#805ad5,color:#fff
|
||||
classDef merge fill:#e53e3e,stroke:#c53030,color:#fff
|
||||
|
||||
class Cron schedule
|
||||
class Job,Discover,Parse,Compare renovate
|
||||
class DockerHub,GHCR,PyPI,GoProxy,Helm registry
|
||||
class K8s,Handler,KubeRay,More repo
|
||||
class Grouped,Security,Major,Dashboard pr
|
||||
class AutoMerge,Review merge
|
||||
51
diagrams/secrets-management.mmd
Normal file
51
diagrams/secrets-management.mmd
Normal file
@@ -0,0 +1,51 @@
|
||||
```plaintext
|
||||
%% Secrets Management Strategy (ADR-0017)
|
||||
%% Flowchart showing dual secret paths: SOPS bootstrap vs Vault runtime
|
||||
|
||||
flowchart TB
|
||||
subgraph bootstrap["🚀 Bootstrap Secrets (Git-encrypted)"]
|
||||
direction TB
|
||||
sops_files["*.sops.yaml<br/>📄 Encrypted in Git"]
|
||||
age_key["🔑 Age Key<br/>(backed up externally)"]
|
||||
sops_dec["SOPS Decryption"]
|
||||
flux_dec["Flux Controller"]
|
||||
bs_secrets["🔐 Bootstrap Secrets<br/>• Talos machine secrets<br/>• GitHub deploy key<br/>• Initial Vault unseal"]
|
||||
end
|
||||
|
||||
subgraph runtime["⚙️ Runtime Secrets (Vault-managed)"]
|
||||
direction TB
|
||||
vault["🏦 HashiCorp Vault<br/>HA (3 replicas) + Raft"]
|
||||
eso["External Secrets<br/>Operator"]
|
||||
app_secrets["🔑 Application Secrets<br/>• Database credentials<br/>• API keys<br/>• OAuth secrets"]
|
||||
end
|
||||
|
||||
subgraph apps["📦 Applications"]
|
||||
direction TB
|
||||
pods["Workload Pods"]
|
||||
end
|
||||
|
||||
%% Bootstrap flow
|
||||
sops_files -->|"Commit to Git"| flux_dec
|
||||
age_key -->|"Decrypts"| sops_dec
|
||||
flux_dec --> sops_dec
|
||||
sops_dec -->|"Creates"| bs_secrets
|
||||
|
||||
%% Runtime flow
|
||||
vault -->|"ExternalSecret CR"| eso
|
||||
eso -->|"Syncs to"| app_secrets
|
||||
|
||||
%% Consumption
|
||||
bs_secrets -->|"Mounted"| pods
|
||||
app_secrets -->|"Mounted"| pods
|
||||
|
||||
classDef bootstrap fill:#3498db,color:white
|
||||
classDef vault fill:#27ae60,color:white
|
||||
classDef secrets fill:#e74c3c,color:white
|
||||
classDef app fill:#9b59b6,color:white
|
||||
|
||||
class sops_files,age_key,sops_dec,flux_dec bootstrap
|
||||
class vault,eso vault
|
||||
class bs_secrets,app_secrets secrets
|
||||
class pods app
|
||||
|
||||
```
|
||||
81
diagrams/security-policy-enforcement.mmd
Normal file
81
diagrams/security-policy-enforcement.mmd
Normal file
@@ -0,0 +1,81 @@
|
||||
```plaintext
|
||||
%% Security Policy Enforcement (ADR-0018)
|
||||
%% Flowchart showing admission control and vulnerability scanning
|
||||
|
||||
flowchart TB
|
||||
subgraph deploy["🚀 Deployment Sources"]
|
||||
kubectl["kubectl"]
|
||||
flux["Flux CD"]
|
||||
end
|
||||
|
||||
subgraph admission["🛡️ Admission Control"]
|
||||
api["Kubernetes<br/>API Server"]
|
||||
gatekeeper["Gatekeeper (OPA)<br/>⚖️ Policy Validation"]
|
||||
end
|
||||
|
||||
subgraph policies["📋 Policies"]
|
||||
direction TB
|
||||
p1["No privileged containers"]
|
||||
p2["Required labels"]
|
||||
p3["Resource limits"]
|
||||
p4["Image registry whitelist"]
|
||||
end
|
||||
|
||||
subgraph enforcement["🎯 Enforcement Modes"]
|
||||
warn["⚠️ warn<br/>(log only)"]
|
||||
dryrun["📊 dryrun<br/>(audit)"]
|
||||
deny["🚫 deny<br/>(block)"]
|
||||
end
|
||||
|
||||
subgraph workloads["☸️ Running Workloads"]
|
||||
pods["Pods<br/>Deployments<br/>StatefulSets"]
|
||||
end
|
||||
|
||||
subgraph scanning["🔍 Continuous Scanning"]
|
||||
trivy["Trivy Operator"]
|
||||
reports["VulnerabilityReports<br/>(CRDs)"]
|
||||
end
|
||||
|
||||
subgraph observability["📈 Observability"]
|
||||
prometheus["Prometheus<br/>📊 Metrics"]
|
||||
grafana["Grafana<br/>📉 Dashboards"]
|
||||
alertmanager["Alertmanager<br/>🔔 Alerts"]
|
||||
ntfy["ntfy<br/>📱 Notifications"]
|
||||
end
|
||||
|
||||
%% Admission flow
|
||||
kubectl --> api
|
||||
flux --> api
|
||||
api -->|"Intercepts"| gatekeeper
|
||||
gatekeeper -->|"Evaluates"| policies
|
||||
policies --> enforcement
|
||||
warn -->|"Allows"| workloads
|
||||
dryrun -->|"Allows"| workloads
|
||||
deny -->|"Blocks"| api
|
||||
enforcement -->|"Violations"| prometheus
|
||||
|
||||
%% Scanning flow
|
||||
workloads -->|"Scans images"| trivy
|
||||
trivy -->|"Creates"| reports
|
||||
reports -->|"Exports"| prometheus
|
||||
|
||||
%% Observability flow
|
||||
prometheus --> grafana
|
||||
prometheus --> alertmanager
|
||||
alertmanager --> ntfy
|
||||
|
||||
classDef source fill:#f39c12,color:black
|
||||
classDef admission fill:#3498db,color:white
|
||||
classDef policy fill:#9b59b6,color:white
|
||||
classDef workload fill:#27ae60,color:white
|
||||
classDef scan fill:#e74c3c,color:white
|
||||
classDef observe fill:#1abc9c,color:white
|
||||
|
||||
class kubectl,flux source
|
||||
class api,gatekeeper admission
|
||||
class p1,p2,p3,p4,warn,dryrun,deny policy
|
||||
class pods workload
|
||||
class trivy,reports scan
|
||||
class prometheus,grafana,alertmanager,ntfy observe
|
||||
|
||||
```
|
||||
67
diagrams/storage-strategy.mmd
Normal file
67
diagrams/storage-strategy.mmd
Normal file
@@ -0,0 +1,67 @@
|
||||
```plaintext
|
||||
%% Tiered Storage Strategy (ADR-0026)
|
||||
%% C4 Component diagram showing Longhorn + NFS dual-tier
|
||||
|
||||
flowchart TB
|
||||
subgraph tier1["🚀 TIER 1: LONGHORN (Fast Distributed Block)"]
|
||||
direction TB
|
||||
|
||||
subgraph nodes["Cluster Nodes"]
|
||||
direction LR
|
||||
khelben["🖥️ khelben<br/>/var/mnt/longhorn<br/>NVMe"]
|
||||
mystra["🖥️ mystra<br/>/var/mnt/longhorn<br/>SSD"]
|
||||
selune["🖥️ selune<br/>/var/mnt/longhorn<br/>SSD"]
|
||||
end
|
||||
|
||||
longhorn_mgr["⚙️ Longhorn Manager<br/>(Schedules 2-3 replicas)"]
|
||||
|
||||
subgraph longhorn_pvcs["Performance Workloads"]
|
||||
direction LR
|
||||
pg["🐘 PostgreSQL"]
|
||||
vault["🔐 Vault"]
|
||||
prom["📊 Prometheus"]
|
||||
click["📋 ClickHouse"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph tier2["💾 TIER 2: NFS-SLOW (High-Capacity Bulk)"]
|
||||
direction TB
|
||||
|
||||
nas["🗄️ candlekeep.lab.daviestechlabs.io<br/>External NAS<br/>/kubernetes"]
|
||||
|
||||
nfs_csi["📂 NFS CSI Driver"]
|
||||
|
||||
subgraph nfs_pvcs["Bulk Storage Workloads"]
|
||||
direction LR
|
||||
jellyfin["🎬 Jellyfin<br/>(1TB+ media)"]
|
||||
nextcloud["☁️ Nextcloud"]
|
||||
immich["📷 Immich"]
|
||||
kavita["📚 Kavita"]
|
||||
mlflow["📈 MLflow<br/>Artifacts"]
|
||||
ray_models["🤖 Ray<br/>Model Weights"]
|
||||
end
|
||||
end
|
||||
|
||||
%% Tier 1 connections
|
||||
nodes --> longhorn_mgr
|
||||
longhorn_mgr --> longhorn_pvcs
|
||||
|
||||
%% Tier 2 connections
|
||||
nas --> nfs_csi
|
||||
nfs_csi --> nfs_pvcs
|
||||
|
||||
classDef tier1_node fill:#3498db,color:white
|
||||
classDef tier1_mgr fill:#2980b9,color:white
|
||||
classDef tier1_pvc fill:#1abc9c,color:white
|
||||
classDef tier2_nas fill:#e74c3c,color:white
|
||||
classDef tier2_csi fill:#c0392b,color:white
|
||||
classDef tier2_pvc fill:#f39c12,color:black
|
||||
|
||||
class khelben,mystra,selune tier1_node
|
||||
class longhorn_mgr tier1_mgr
|
||||
class pg,vault,prom,click tier1_pvc
|
||||
class nas tier2_nas
|
||||
class nfs_csi tier2_csi
|
||||
class jellyfin,nextcloud,immich,kavita,mlflow,ray_models tier2_pvc
|
||||
|
||||
```
|
||||
93
diagrams/user-registration-workflow.mmd
Normal file
93
diagrams/user-registration-workflow.mmd
Normal file
@@ -0,0 +1,93 @@
|
||||
```plaintext
|
||||
%% User Registration and Approval Workflow (ADR-0029)
|
||||
%% Flowchart showing registration, approval, and access control
|
||||
|
||||
flowchart TB
|
||||
subgraph registration["📝 Registration Flow"]
|
||||
direction TB
|
||||
request["👤 User Requests<br/>Account"]
|
||||
form["📋 Enrollment<br/>Form"]
|
||||
created["✅ Account<br/>Created"]
|
||||
pending["⏳ pending-approval<br/>Group"]
|
||||
end
|
||||
|
||||
subgraph approval["✋ Admin Approval"]
|
||||
direction TB
|
||||
notify["📧 Admin<br/>Notification"]
|
||||
review["👁️ Admin<br/>Reviews"]
|
||||
decision{{"Decision"}}
|
||||
end
|
||||
|
||||
subgraph groups["👥 Group Assignment"]
|
||||
direction LR
|
||||
reject["❌ Rejected"]
|
||||
guests["🎫 homelab-guests<br/>Limited access"]
|
||||
users["👥 homelab-users<br/>Full access"]
|
||||
admins["👑 homelab-admins<br/>Admin access"]
|
||||
end
|
||||
|
||||
subgraph access["🔓 Application Access"]
|
||||
direction TB
|
||||
|
||||
subgraph admin_apps["Admin Apps"]
|
||||
authentik_admin["Authentik Admin"]
|
||||
gitea["Gitea"]
|
||||
flux_ui["Flux UI"]
|
||||
end
|
||||
|
||||
subgraph user_apps["User Apps"]
|
||||
affine["Affine"]
|
||||
immich["Immich"]
|
||||
nextcloud["Nextcloud"]
|
||||
vaultwarden["Vaultwarden"]
|
||||
end
|
||||
|
||||
subgraph guest_apps["Guest Apps"]
|
||||
kavita["Kavita"]
|
||||
end
|
||||
|
||||
subgraph no_access["No Access"]
|
||||
profile["Authentik Profile<br/>(only)"]
|
||||
end
|
||||
end
|
||||
|
||||
%% Registration flow
|
||||
request --> form
|
||||
form --> created
|
||||
created --> pending
|
||||
pending --> notify
|
||||
|
||||
%% Approval flow
|
||||
notify --> review
|
||||
review --> decision
|
||||
decision -->|"Reject"| reject
|
||||
decision -->|"Basic"| guests
|
||||
decision -->|"Full"| users
|
||||
decision -->|"Admin"| admins
|
||||
|
||||
%% Access mapping
|
||||
reject --> profile
|
||||
guests --> guest_apps
|
||||
users --> user_apps
|
||||
users --> guest_apps
|
||||
admins --> admin_apps
|
||||
admins --> user_apps
|
||||
admins --> guest_apps
|
||||
|
||||
classDef registration fill:#3498db,color:white
|
||||
classDef approval fill:#f39c12,color:black
|
||||
classDef group fill:#9b59b6,color:white
|
||||
classDef admin fill:#e74c3c,color:white
|
||||
classDef user fill:#27ae60,color:white
|
||||
classDef guest fill:#1abc9c,color:white
|
||||
classDef none fill:#95a5a6,color:white
|
||||
|
||||
class request,form,created,pending registration
|
||||
class notify,review approval
|
||||
class reject,guests,users,admins group
|
||||
class authentik_admin,gitea,flux_ui admin
|
||||
class affine,immich,nextcloud,vaultwarden user
|
||||
class kavita guest
|
||||
class profile none
|
||||
|
||||
```
|
||||
60
diagrams/velero-backup.mmd
Normal file
60
diagrams/velero-backup.mmd
Normal file
@@ -0,0 +1,60 @@
|
||||
%% Velero Backup Architecture
|
||||
%% Related: ADR-0032
|
||||
|
||||
flowchart TB
|
||||
subgraph Schedule["Backup Schedule"]
|
||||
Nightly["Nightly Backup<br/>2:00 AM"]
|
||||
Hourly["Hourly Snapshots<br/>Critical Namespaces"]
|
||||
end
|
||||
|
||||
subgraph Velero["Velero (velero namespace)"]
|
||||
Server["Velero Server"]
|
||||
NodeAgent["Node Agent<br/>(DaemonSet)"]
|
||||
end
|
||||
|
||||
subgraph Sources["Backup Sources"]
|
||||
PVs["Persistent Volumes<br/>(Longhorn)"]
|
||||
Resources["Kubernetes Resources<br/>(Secrets, ConfigMaps)"]
|
||||
DBs["Database Dumps<br/>(Pre-backup hooks)"]
|
||||
end
|
||||
|
||||
subgraph Targets["Backup Destinations"]
|
||||
subgraph Primary["Primary: S3"]
|
||||
MinIO["MinIO<br/>On-premises S3"]
|
||||
end
|
||||
subgraph Secondary["Secondary: NFS"]
|
||||
NAS["Synology NAS<br/>Long-term retention"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph Restore["Restore Options"]
|
||||
Full["Full Cluster Restore"]
|
||||
Namespace["Namespace Restore"]
|
||||
Selective["Selective Resource Restore"]
|
||||
end
|
||||
|
||||
Nightly --> Server
|
||||
Hourly --> Server
|
||||
Server --> NodeAgent
|
||||
NodeAgent --> PVs
|
||||
Server --> Resources
|
||||
Server --> DBs
|
||||
|
||||
Server --> MinIO
|
||||
MinIO -.->|Replicated| NAS
|
||||
|
||||
Server --> Full
|
||||
Server --> Namespace
|
||||
Server --> Selective
|
||||
|
||||
classDef schedule fill:#4a5568,stroke:#718096,color:#fff
|
||||
classDef velero fill:#667eea,stroke:#5a67d8,color:#fff
|
||||
classDef source fill:#48bb78,stroke:#38a169,color:#fff
|
||||
classDef target fill:#ed8936,stroke:#dd6b20,color:#fff
|
||||
classDef restore fill:#9f7aea,stroke:#805ad5,color:#fff
|
||||
|
||||
class Nightly,Hourly schedule
|
||||
class Server,NodeAgent velero
|
||||
class PVs,Resources,DBs source
|
||||
class MinIO,NAS target
|
||||
class Full,Namespace,Selective restore
|
||||
81
diagrams/volcano-scheduling.mmd
Normal file
81
diagrams/volcano-scheduling.mmd
Normal file
@@ -0,0 +1,81 @@
|
||||
%% Volcano Batch Scheduling Architecture
|
||||
%% Related: ADR-0034
|
||||
|
||||
flowchart TB
|
||||
subgraph Submissions["Workload Submissions"]
|
||||
KFP["Kubeflow Pipelines"]
|
||||
Argo["Argo Workflows"]
|
||||
Spark["Spark Jobs"]
|
||||
Ray["Ray Jobs"]
|
||||
end
|
||||
|
||||
subgraph Volcano["Volcano Scheduler"]
|
||||
Admission["Admission Controller"]
|
||||
Scheduler["Volcano Scheduler"]
|
||||
Controller["Job Controller"]
|
||||
|
||||
subgraph Plugins["Scheduling Plugins"]
|
||||
Gang["Gang Scheduling"]
|
||||
Priority["Priority"]
|
||||
DRF["Dominant Resource Fairness"]
|
||||
Binpack["Bin Packing"]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph Queues["Resource Queues"]
|
||||
MLQueue["ml-training<br/>weight: 4"]
|
||||
InferQueue["inference<br/>weight: 3"]
|
||||
BatchQueue["batch-jobs<br/>weight: 2"]
|
||||
DefaultQueue["default<br/>weight: 1"]
|
||||
end
|
||||
|
||||
subgraph Resources["Cluster Resources"]
|
||||
subgraph GPUs["GPU Nodes"]
|
||||
Khelben["khelben<br/>Strix Halo 64GB"]
|
||||
Elminster["elminster<br/>RTX 2070"]
|
||||
Drizzt["drizzt<br/>RDNA2 680M"]
|
||||
Danilo["danilo<br/>Intel Arc"]
|
||||
end
|
||||
subgraph CPU["CPU Nodes"]
|
||||
Workers["9 x86_64 Workers"]
|
||||
ARM["5 ARM64 Workers"]
|
||||
end
|
||||
end
|
||||
|
||||
KFP --> Admission
|
||||
Argo --> Admission
|
||||
Spark --> Admission
|
||||
Ray --> Admission
|
||||
|
||||
Admission --> Scheduler
|
||||
Scheduler --> Controller
|
||||
|
||||
Scheduler --> Gang
|
||||
Scheduler --> Priority
|
||||
Scheduler --> DRF
|
||||
Scheduler --> Binpack
|
||||
|
||||
Controller --> MLQueue
|
||||
Controller --> InferQueue
|
||||
Controller --> BatchQueue
|
||||
Controller --> DefaultQueue
|
||||
|
||||
MLQueue --> GPUs
|
||||
InferQueue --> GPUs
|
||||
BatchQueue --> GPUs
|
||||
BatchQueue --> CPU
|
||||
DefaultQueue --> CPU
|
||||
|
||||
classDef submit fill:#4a5568,stroke:#718096,color:#fff
|
||||
classDef volcano fill:#667eea,stroke:#5a67d8,color:#fff
|
||||
classDef plugin fill:#9f7aea,stroke:#805ad5,color:#fff
|
||||
classDef queue fill:#ed8936,stroke:#dd6b20,color:#fff
|
||||
classDef gpu fill:#e53e3e,stroke:#c53030,color:#fff
|
||||
classDef cpu fill:#38a169,stroke:#2f855a,color:#fff
|
||||
|
||||
class KFP,Argo,Spark,Ray submit
|
||||
class Admission,Scheduler,Controller volcano
|
||||
class Gang,Priority,DRF,Binpack plugin
|
||||
class MLQueue,InferQueue,BatchQueue,DefaultQueue queue
|
||||
class Khelben,Elminster,Drizzt,Danilo gpu
|
||||
class Workers,ARM cpu
|
||||
Reference in New Issue
Block a user