From e85deaa642c99ec54a69152cdd4804a2d5adfac9 Mon Sep 17 00:00:00 2001 From: "Billy D." Date: Mon, 2 Feb 2026 11:58:52 -0500 Subject: [PATCH] docs(adr): finalize ADR-0021 and add ADR-0022 ADR-0021 (Accepted): - ntfy as central notification hub - Alertmanager integration for critical/warning alerts - Service readiness notifications via Flux notification-controller - Standardized topic naming ADR-0022 (Proposed): - ntfy-discord-bridge Python service design - SSE subscription with reconnection logic - Message transformation to Discord embeds - Priority/tag to color/emoji mapping - Kubernetes deployment with ExternalSecret for webhook --- .../adr/ADR-0021-notification-architecture.md | 98 ++++--- docs/adr/ADR-0022-ntfy-discord-bridge.md | 241 ++++++++++++++++++ 2 files changed, 284 insertions(+), 55 deletions(-) create mode 100644 docs/adr/ADR-0022-ntfy-discord-bridge.md diff --git a/docs/adr/ADR-0021-notification-architecture.md b/docs/adr/ADR-0021-notification-architecture.md index 1411fa2..6c10783 100644 --- a/docs/adr/ADR-0021-notification-architecture.md +++ b/docs/adr/ADR-0021-notification-architecture.md @@ -2,7 +2,7 @@ ## Status -Proposed +Accepted ## Context @@ -12,12 +12,13 @@ The homelab infrastructure generates notifications from multiple sources: 2. **Alertmanager** - Prometheus alerts for critical/warning conditions 3. **Gatus** - Service health monitoring 4. **Flux** - GitOps reconciliation events +5. **Service readiness** - Notifications when deployments complete successfully Currently, ntfy serves as the primary notification hub, but there are several issues: - **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci` - **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism -- **Discord integration desire**: Team wants notifications forwarded to Discord for visibility +- **No service readiness notifications**: No visibility when services come online after deployment ## Decision @@ -39,6 +40,7 @@ This keeps ntfy auth-protected externally while allowing internal services to pu | `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts | | `gatus` | Gatus | Service health status changes | | `flux` | Flux | GitOps reconciliation events | +| `deployments` | Flux/Argo | Service deployment completions | ### 3. Alertmanager Integration @@ -60,53 +62,43 @@ Routes direct alerts based on severity: - `severity=critical` → `ntfy-critical` receiver - `severity=warning` → `ntfy-warning` receiver -### 4. Discord Integration (Future) +### 4. Service Readiness Notifications -Discord integration will be implemented as a dedicated bridge service that: +To provide visibility when services are fully operational after deployment: -1. **Subscribes** to ntfy topics via SSE/WebSocket -2. **Transforms** ntfy message format to Discord embed format -3. **Forwards** to Discord webhook URL (stored in Vault at `kv/data/discord`) +**Option A: Flux Notification Controller** +Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed: -#### Design Options - -**Option A: Sidecar Container (Simple)** -- Alpine container with curl/jq -- Subscribes to ntfy JSON stream -- Transforms and POSTs to Discord -- Pros: Simple, no custom code -- Cons: Shell script fragility, limited error handling - -**Option B: Dedicated Python Service (Recommended)** -- Small Python service using `httpx` or `aiohttp` -- Proper reconnection logic and error handling -- Configurable topic-to-channel mapping -- Health endpoint for monitoring -- Pros: Robust, testable, maintainable -- Cons: Requires building/publishing container image - -**Option C: ntfy Actions (Limited)** -- Configure ntfy server with `upstream-base-url` or actions -- Pros: Built into ntfy -- Cons: ntfy doesn't natively support Discord webhook format - -#### Recommended Architecture (Option B) - -``` -┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ -│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │ -│ Gatus/Flux │ │ (notification │ │ (mobile) │ -└─────────────────┘ │ hub) │ └─────────────┘ - └────────┬─────────┘ - │ SSE subscribe - ▼ - ┌──────────────────┐ ┌─────────────┐ - │ ntfy-discord- │────▶│ Discord │ - │ bridge │ │ (webhook) │ - └──────────────────┘ └─────────────┘ +```yaml +apiVersion: notification.toolkit.fluxcd.io/v1beta3 +kind: Provider +metadata: + name: ntfy-deployments +spec: + type: generic-hmac # or generic + address: http://ntfy-svc.observability.svc.cluster.local/deployments +--- +apiVersion: notification.toolkit.fluxcd.io/v1beta3 +kind: Alert +metadata: + name: deployment-success +spec: + providerRef: + name: ntfy-deployments + eventSeverity: info + eventSources: + - kind: Kustomization + name: '*' + - kind: HelmRelease + name: '*' + inclusionList: + - ".*succeeded.*" ``` -The bridge service would be a new repo (`ntfy-discord-bridge`) following the same patterns as other Python services in the homelab. +**Option B: Argo Workflows Post-Deploy Hook** +For Argo-managed deployments, add a notification step at workflow completion. + +**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration. ## Consequences @@ -114,30 +106,26 @@ The bridge service would be a new repo (`ntfy-discord-bridge`) following the sam - **Single source of truth**: All notifications flow through ntfy - **Auth protection maintained**: External ntfy access requires Authentik auth -- **Flexible routing**: Can subscribe to specific topics per destination -- **Separation of concerns**: Discord bridge is independent, can be disabled without affecting ntfy +- **Deployment visibility**: Know when services are ready without watching logs +- **Consistent topic naming**: All sources follow documented conventions ### Negative -- **Additional service**: Discord bridge adds operational overhead -- **Latency**: Two-hop delivery (source → ntfy → Discord) adds minimal latency +- **Configuration overhead**: Each notification source requires explicit configuration ### Neutral - Topic naming must be documented and followed consistently -- Discord webhook URL must be maintained in Vault +- Future Discord integration addressed in ADR-0022 ## Implementation Checklist - [x] Standardize CI notifications to `gitea-ci` topic - [x] Configure Alertmanager → ntfy for critical/warning alerts -- [ ] Create `ntfy-discord-bridge` repository -- [ ] Implement bridge service with proper error handling -- [ ] Add ExternalSecret for Discord webhook from Vault -- [ ] Deploy bridge to observability namespace -- [ ] Document topic-to-Discord-channel mapping +- [ ] Configure Flux notification-controller for deployment notifications +- [ ] Add `deployments` topic subscription to ntfy app ## Related - ADR-0015: CI Notifications and Semantic Versioning -- ADR-0020: Internal Registry for CI/CD +- ADR-0022: ntfy-Discord Bridge Service diff --git a/docs/adr/ADR-0022-ntfy-discord-bridge.md b/docs/adr/ADR-0022-ntfy-discord-bridge.md new file mode 100644 index 0000000..4f502ae --- /dev/null +++ b/docs/adr/ADR-0022-ntfy-discord-bridge.md @@ -0,0 +1,241 @@ +# ADR-0022: ntfy-Discord Bridge Service + +## Status + +Proposed + +## Context + +Per ADR-0021, ntfy serves as the central notification hub for the homelab. However, Discord is used for team collaboration and visibility, requiring notifications to be forwarded there as well. + +ntfy does not natively support Discord webhook format. Discord expects a specific JSON structure with embeds, while ntfy uses its own message format. A bridge service is needed to: + +1. Subscribe to ntfy topics +2. Transform messages to Discord embed format +3. Forward to Discord webhooks + +## Decision + +### Architecture + +A dedicated Python microservice (`ntfy-discord-bridge`) will bridge ntfy to Discord: + +``` +┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ +│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │ +│ Gatus/Flux │ │ (notification │ │ (mobile) │ +└─────────────────┘ │ hub) │ └─────────────┘ + └────────┬─────────┘ + │ SSE/JSON stream + ▼ + ┌──────────────────┐ ┌─────────────┐ + │ ntfy-discord- │────▶│ Discord │ + │ bridge │ │ Webhook │ + └──────────────────┘ └─────────────┘ +``` + +### Service Design + +**Repository**: `ntfy-discord-bridge` + +**Technology Stack**: +- Python 3.12+ +- `httpx` for async HTTP (SSE subscription + Discord POST) +- `pydantic` for configuration validation +- `structlog` for structured logging +- Poetry/uv for dependency management + +**Core Features**: + +1. **SSE Subscription**: Connect to ntfy's JSON stream endpoint for real-time messages +2. **Automatic Reconnection**: Exponential backoff on connection failures +3. **Message Transformation**: Convert ntfy format to Discord embed format +4. **Priority Mapping**: Map ntfy priorities to Discord embed colors +5. **Topic Routing**: Configure which topics go to which Discord channels/webhooks +6. **Health Endpoint**: `/health` for Kubernetes probes + +### Configuration + +Environment variables or ConfigMap: + +```yaml +NTFY_URL: "http://ntfy-svc.observability.svc.cluster.local" +DISCORD_WEBHOOK_URL: "${DISCORD_WEBHOOK_URL}" # From Vault via ExternalSecret + +# Topic routing (optional - defaults to single webhook) +TOPIC_WEBHOOKS: | + gitea-ci: ${DISCORD_CI_WEBHOOK} + alertmanager-alerts: ${DISCORD_ALERTS_WEBHOOK} + deployments: ${DISCORD_DEPLOYMENTS_WEBHOOK} + +# Topics to subscribe to (comma-separated) +NTFY_TOPICS: "gitea-ci,alertmanager-alerts,deployments,gatus" +``` + +### Message Transformation + +ntfy message: +```json +{ + "id": "abc123", + "topic": "gitea-ci", + "title": "Build succeeded", + "message": "ray-serve-apps published to PyPI", + "priority": 3, + "tags": ["package", "white_check_mark"], + "time": 1770050091 +} +``` + +Discord embed: +```json +{ + "embeds": [{ + "title": "✅ Build succeeded", + "description": "ray-serve-apps published to PyPI", + "color": 3066993, + "fields": [ + {"name": "Topic", "value": "gitea-ci", "inline": true}, + {"name": "Tags", "value": "package", "inline": true} + ], + "timestamp": "2026-02-02T11:34:51Z", + "footer": {"text": "ntfy"} + }] +} +``` + +**Priority → Color Mapping**: +| Priority | Name | Discord Color | +|----------|------|---------------| +| 5 | Max/Urgent | 🔴 Red (15158332) | +| 4 | High | 🟠 Orange (15105570) | +| 3 | Default | 🔵 Blue (3066993) | +| 2 | Low | ⚪ Gray (9807270) | +| 1 | Min | ⚪ Light Gray (12370112) | + +**Tag → Emoji Mapping**: +Common ntfy tags are converted to Discord-friendly emojis in the title: +- `white_check_mark` / `heavy_check_mark` → ✅ +- `x` / `skull` → ❌ +- `warning` → ⚠️ +- `rotating_light` → 🚨 +- `rocket` → 🚀 +- `package` → 📦 + +### Kubernetes Deployment + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ntfy-discord-bridge + namespace: observability +spec: + replicas: 1 + selector: + matchLabels: + app: ntfy-discord-bridge + template: + spec: + containers: + - name: bridge + image: registry.daviestechlabs.io/ntfy-discord-bridge:latest + env: + - name: NTFY_URL + value: "http://ntfy-svc.observability.svc.cluster.local" + - name: NTFY_TOPICS + value: "gitea-ci,alertmanager-alerts,deployments" + - name: DISCORD_WEBHOOK_URL + valueFrom: + secretKeyRef: + name: discord-webhook-secret + key: webhook-url + ports: + - containerPort: 8080 + name: health + livenessProbe: + httpGet: + path: /health + port: health + initialDelaySeconds: 5 + periodSeconds: 30 + resources: + limits: + cpu: 100m + memory: 128Mi + requests: + cpu: 10m + memory: 64Mi +``` + +### Secret Management + +Discord webhook URL stored in Vault at `kv/data/discord`: + +```yaml +apiVersion: external-secrets.io/v1beta1 +kind: ExternalSecret +metadata: + name: discord-webhook-secret + namespace: observability +spec: + refreshInterval: 1h + secretStoreRef: + name: vault + kind: ClusterSecretStore + target: + name: discord-webhook-secret + data: + - secretKey: webhook-url + remoteRef: + key: kv/data/discord + property: webhook_url +``` + +### Error Handling + +1. **Connection Loss**: Exponential backoff (1s, 2s, 4s, ... max 60s) +2. **Discord Rate Limits**: Respect `Retry-After` header, queue messages +3. **Invalid Messages**: Log and skip, don't crash +4. **Webhook Errors**: Log error, continue processing other messages + +## Consequences + +### Positive + +- **Robust**: Proper reconnection and error handling vs shell script +- **Testable**: Python service with unit tests +- **Observable**: Structured logging, health endpoints +- **Flexible**: Easy to add topic routing, filtering, rate limiting +- **Consistent**: Follows same patterns as other Python services (handler-base, etc.) + +### Negative + +- **Operational Overhead**: Another service to maintain +- **Build Pipeline**: Requires CI/CD for container image +- **Latency**: Adds ~100ms to notification delivery + +### Neutral + +- Webhook URL must be maintained in Vault +- Service logs should be monitored for errors + +## Implementation Checklist + +- [ ] Create `ntfy-discord-bridge` repository +- [ ] Implement core bridge logic with httpx +- [ ] Add reconnection with exponential backoff +- [ ] Implement message transformation +- [ ] Add health endpoint +- [ ] Write unit tests +- [ ] Create Dockerfile +- [ ] Set up CI/CD pipeline (Gitea Actions) +- [ ] Add ExternalSecret for Discord webhook +- [ ] Create Kubernetes manifests +- [ ] Deploy to observability namespace +- [ ] Verify notifications flowing to Discord + +## Related + +- ADR-0021: Notification Architecture +- ADR-0015: CI Notifications and Semantic Versioning