# ADR-0021: Notification Architecture ## Status Proposed ## Context The homelab infrastructure generates notifications from multiple sources: 1. **CI/CD pipelines** (Gitea Actions) - build success/failure 2. **Alertmanager** - Prometheus alerts for critical/warning conditions 3. **Gatus** - Service health monitoring 4. **Flux** - GitOps reconciliation events Currently, ntfy serves as the primary notification hub, but there are several issues: - **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci` - **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism - **Discord integration desire**: Team wants notifications forwarded to Discord for visibility ## Decision ### 1. ntfy as the Notification Hub ntfy will serve as the central notification aggregation point. All internal services publish to ntfy topics via the internal Kubernetes service URL: ``` http://ntfy-svc.observability.svc.cluster.local/ ``` This keeps ntfy auth-protected externally while allowing internal services to publish freely. ### 2. Standardized Topics | Topic | Source | Description | |-------|--------|-------------| | `gitea-ci` | Gitea Actions | CI/CD build notifications | | `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts | | `gatus` | Gatus | Service health status changes | | `flux` | Flux | GitOps reconciliation events | ### 3. Alertmanager Integration Alertmanager is configured to forward alerts to ntfy using the built-in `tpl=alertmanager` template: ```yaml receivers: - name: ntfy-critical webhookConfigs: - url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=urgent&tags=rotating_light" sendResolved: true - name: ntfy-warning webhookConfigs: - url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=high&tags=warning" sendResolved: true ``` Routes direct alerts based on severity: - `severity=critical` → `ntfy-critical` receiver - `severity=warning` → `ntfy-warning` receiver ### 4. Discord Integration (Future) Discord integration will be implemented as a dedicated bridge service that: 1. **Subscribes** to ntfy topics via SSE/WebSocket 2. **Transforms** ntfy message format to Discord embed format 3. **Forwards** to Discord webhook URL (stored in Vault at `kv/data/discord`) #### Design Options **Option A: Sidecar Container (Simple)** - Alpine container with curl/jq - Subscribes to ntfy JSON stream - Transforms and POSTs to Discord - Pros: Simple, no custom code - Cons: Shell script fragility, limited error handling **Option B: Dedicated Python Service (Recommended)** - Small Python service using `httpx` or `aiohttp` - Proper reconnection logic and error handling - Configurable topic-to-channel mapping - Health endpoint for monitoring - Pros: Robust, testable, maintainable - Cons: Requires building/publishing container image **Option C: ntfy Actions (Limited)** - Configure ntfy server with `upstream-base-url` or actions - Pros: Built into ntfy - Cons: ntfy doesn't natively support Discord webhook format #### Recommended Architecture (Option B) ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ │ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │ │ Gatus/Flux │ │ (notification │ │ (mobile) │ └─────────────────┘ │ hub) │ └─────────────┘ └────────┬─────────┘ │ SSE subscribe ▼ ┌──────────────────┐ ┌─────────────┐ │ ntfy-discord- │────▶│ Discord │ │ bridge │ │ (webhook) │ └──────────────────┘ └─────────────┘ ``` The bridge service would be a new repo (`ntfy-discord-bridge`) following the same patterns as other Python services in the homelab. ## Consequences ### Positive - **Single source of truth**: All notifications flow through ntfy - **Auth protection maintained**: External ntfy access requires Authentik auth - **Flexible routing**: Can subscribe to specific topics per destination - **Separation of concerns**: Discord bridge is independent, can be disabled without affecting ntfy ### Negative - **Additional service**: Discord bridge adds operational overhead - **Latency**: Two-hop delivery (source → ntfy → Discord) adds minimal latency ### Neutral - Topic naming must be documented and followed consistently - Discord webhook URL must be maintained in Vault ## Implementation Checklist - [x] Standardize CI notifications to `gitea-ci` topic - [x] Configure Alertmanager → ntfy for critical/warning alerts - [ ] Create `ntfy-discord-bridge` repository - [ ] Implement bridge service with proper error handling - [ ] Add ExternalSecret for Discord webhook from Vault - [ ] Deploy bridge to observability namespace - [ ] Document topic-to-Discord-channel mapping ## Related - ADR-0015: CI Notifications and Semantic Versioning - ADR-0020: Internal Registry for CI/CD