diff --git a/docs/adr/ADR-0021-notification-architecture.md b/docs/adr/ADR-0021-notification-architecture.md new file mode 100644 index 0000000..1411fa2 --- /dev/null +++ b/docs/adr/ADR-0021-notification-architecture.md @@ -0,0 +1,143 @@ +# ADR-0021: Notification Architecture + +## Status + +Proposed + +## Context + +The homelab infrastructure generates notifications from multiple sources: + +1. **CI/CD pipelines** (Gitea Actions) - build success/failure +2. **Alertmanager** - Prometheus alerts for critical/warning conditions +3. **Gatus** - Service health monitoring +4. **Flux** - GitOps reconciliation events + +Currently, ntfy serves as the primary notification hub, but there are several issues: + +- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci` +- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism +- **Discord integration desire**: Team wants notifications forwarded to Discord for visibility + +## Decision + +### 1. ntfy as the Notification Hub + +ntfy will serve as the central notification aggregation point. All internal services publish to ntfy topics via the internal Kubernetes service URL: + +``` +http://ntfy-svc.observability.svc.cluster.local/ +``` + +This keeps ntfy auth-protected externally while allowing internal services to publish freely. + +### 2. Standardized Topics + +| Topic | Source | Description | +|-------|--------|-------------| +| `gitea-ci` | Gitea Actions | CI/CD build notifications | +| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts | +| `gatus` | Gatus | Service health status changes | +| `flux` | Flux | GitOps reconciliation events | + +### 3. Alertmanager Integration + +Alertmanager is configured to forward alerts to ntfy using the built-in `tpl=alertmanager` template: + +```yaml +receivers: + - name: ntfy-critical + webhookConfigs: + - url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=urgent&tags=rotating_light" + sendResolved: true + - name: ntfy-warning + webhookConfigs: + - url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=high&tags=warning" + sendResolved: true +``` + +Routes direct alerts based on severity: +- `severity=critical` → `ntfy-critical` receiver +- `severity=warning` → `ntfy-warning` receiver + +### 4. Discord Integration (Future) + +Discord integration will be implemented as a dedicated bridge service that: + +1. **Subscribes** to ntfy topics via SSE/WebSocket +2. **Transforms** ntfy message format to Discord embed format +3. **Forwards** to Discord webhook URL (stored in Vault at `kv/data/discord`) + +#### Design Options + +**Option A: Sidecar Container (Simple)** +- Alpine container with curl/jq +- Subscribes to ntfy JSON stream +- Transforms and POSTs to Discord +- Pros: Simple, no custom code +- Cons: Shell script fragility, limited error handling + +**Option B: Dedicated Python Service (Recommended)** +- Small Python service using `httpx` or `aiohttp` +- Proper reconnection logic and error handling +- Configurable topic-to-channel mapping +- Health endpoint for monitoring +- Pros: Robust, testable, maintainable +- Cons: Requires building/publishing container image + +**Option C: ntfy Actions (Limited)** +- Configure ntfy server with `upstream-base-url` or actions +- Pros: Built into ntfy +- Cons: ntfy doesn't natively support Discord webhook format + +#### Recommended Architecture (Option B) + +``` +┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ +│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │ +│ Gatus/Flux │ │ (notification │ │ (mobile) │ +└─────────────────┘ │ hub) │ └─────────────┘ + └────────┬─────────┘ + │ SSE subscribe + ▼ + ┌──────────────────┐ ┌─────────────┐ + │ ntfy-discord- │────▶│ Discord │ + │ bridge │ │ (webhook) │ + └──────────────────┘ └─────────────┘ +``` + +The bridge service would be a new repo (`ntfy-discord-bridge`) following the same patterns as other Python services in the homelab. + +## Consequences + +### Positive + +- **Single source of truth**: All notifications flow through ntfy +- **Auth protection maintained**: External ntfy access requires Authentik auth +- **Flexible routing**: Can subscribe to specific topics per destination +- **Separation of concerns**: Discord bridge is independent, can be disabled without affecting ntfy + +### Negative + +- **Additional service**: Discord bridge adds operational overhead +- **Latency**: Two-hop delivery (source → ntfy → Discord) adds minimal latency + +### Neutral + +- Topic naming must be documented and followed consistently +- Discord webhook URL must be maintained in Vault + +## Implementation Checklist + +- [x] Standardize CI notifications to `gitea-ci` topic +- [x] Configure Alertmanager → ntfy for critical/warning alerts +- [ ] Create `ntfy-discord-bridge` repository +- [ ] Implement bridge service with proper error handling +- [ ] Add ExternalSecret for Discord webhook from Vault +- [ ] Deploy bridge to observability namespace +- [ ] Document topic-to-Discord-channel mapping + +## Related + +- ADR-0015: CI Notifications and Semantic Versioning +- ADR-0020: Internal Registry for CI/CD