docs(adr): finalize ADR-0021 and add ADR-0022

ADR-0021 (Accepted):
- ntfy as central notification hub
- Alertmanager integration for critical/warning alerts
- Service readiness notifications via Flux notification-controller
- Standardized topic naming

ADR-0022 (Proposed):
- ntfy-discord-bridge Python service design
- SSE subscription with reconnection logic
- Message transformation to Discord embeds
- Priority/tag to color/emoji mapping
- Kubernetes deployment with ExternalSecret for webhook
This commit is contained in:
2026-02-02 11:58:52 -05:00
parent 7b77d6c29f
commit e85deaa642
2 changed files with 284 additions and 55 deletions

View File

@@ -2,7 +2,7 @@
## Status ## Status
Proposed Accepted
## Context ## Context
@@ -12,12 +12,13 @@ The homelab infrastructure generates notifications from multiple sources:
2. **Alertmanager** - Prometheus alerts for critical/warning conditions 2. **Alertmanager** - Prometheus alerts for critical/warning conditions
3. **Gatus** - Service health monitoring 3. **Gatus** - Service health monitoring
4. **Flux** - GitOps reconciliation events 4. **Flux** - GitOps reconciliation events
5. **Service readiness** - Notifications when deployments complete successfully
Currently, ntfy serves as the primary notification hub, but there are several issues: Currently, ntfy serves as the primary notification hub, but there are several issues:
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci` - **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism - **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
- **Discord integration desire**: Team wants notifications forwarded to Discord for visibility - **No service readiness notifications**: No visibility when services come online after deployment
## Decision ## Decision
@@ -39,6 +40,7 @@ This keeps ntfy auth-protected externally while allowing internal services to pu
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts | | `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
| `gatus` | Gatus | Service health status changes | | `gatus` | Gatus | Service health status changes |
| `flux` | Flux | GitOps reconciliation events | | `flux` | Flux | GitOps reconciliation events |
| `deployments` | Flux/Argo | Service deployment completions |
### 3. Alertmanager Integration ### 3. Alertmanager Integration
@@ -60,53 +62,43 @@ Routes direct alerts based on severity:
- `severity=critical``ntfy-critical` receiver - `severity=critical``ntfy-critical` receiver
- `severity=warning``ntfy-warning` receiver - `severity=warning``ntfy-warning` receiver
### 4. Discord Integration (Future) ### 4. Service Readiness Notifications
Discord integration will be implemented as a dedicated bridge service that: To provide visibility when services are fully operational after deployment:
1. **Subscribes** to ntfy topics via SSE/WebSocket **Option A: Flux Notification Controller**
2. **Transforms** ntfy message format to Discord embed format Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed:
3. **Forwards** to Discord webhook URL (stored in Vault at `kv/data/discord`)
#### Design Options ```yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
**Option A: Sidecar Container (Simple)** kind: Provider
- Alpine container with curl/jq metadata:
- Subscribes to ntfy JSON stream name: ntfy-deployments
- Transforms and POSTs to Discord spec:
- Pros: Simple, no custom code type: generic-hmac # or generic
- Cons: Shell script fragility, limited error handling address: http://ntfy-svc.observability.svc.cluster.local/deployments
---
**Option B: Dedicated Python Service (Recommended)** apiVersion: notification.toolkit.fluxcd.io/v1beta3
- Small Python service using `httpx` or `aiohttp` kind: Alert
- Proper reconnection logic and error handling metadata:
- Configurable topic-to-channel mapping name: deployment-success
- Health endpoint for monitoring spec:
- Pros: Robust, testable, maintainable providerRef:
- Cons: Requires building/publishing container image name: ntfy-deployments
eventSeverity: info
**Option C: ntfy Actions (Limited)** eventSources:
- Configure ntfy server with `upstream-base-url` or actions - kind: Kustomization
- Pros: Built into ntfy name: '*'
- Cons: ntfy doesn't natively support Discord webhook format - kind: HelmRelease
name: '*'
#### Recommended Architecture (Option B) inclusionList:
- ".*succeeded.*"
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
│ Gatus/Flux │ │ (notification │ │ (mobile) │
└─────────────────┘ │ hub) │ └─────────────┘
└────────┬─────────┘
│ SSE subscribe
┌──────────────────┐ ┌─────────────┐
│ ntfy-discord- │────▶│ Discord │
│ bridge │ │ (webhook) │
└──────────────────┘ └─────────────┘
``` ```
The bridge service would be a new repo (`ntfy-discord-bridge`) following the same patterns as other Python services in the homelab. **Option B: Argo Workflows Post-Deploy Hook**
For Argo-managed deployments, add a notification step at workflow completion.
**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration.
## Consequences ## Consequences
@@ -114,30 +106,26 @@ The bridge service would be a new repo (`ntfy-discord-bridge`) following the sam
- **Single source of truth**: All notifications flow through ntfy - **Single source of truth**: All notifications flow through ntfy
- **Auth protection maintained**: External ntfy access requires Authentik auth - **Auth protection maintained**: External ntfy access requires Authentik auth
- **Flexible routing**: Can subscribe to specific topics per destination - **Deployment visibility**: Know when services are ready without watching logs
- **Separation of concerns**: Discord bridge is independent, can be disabled without affecting ntfy - **Consistent topic naming**: All sources follow documented conventions
### Negative ### Negative
- **Additional service**: Discord bridge adds operational overhead - **Configuration overhead**: Each notification source requires explicit configuration
- **Latency**: Two-hop delivery (source → ntfy → Discord) adds minimal latency
### Neutral ### Neutral
- Topic naming must be documented and followed consistently - Topic naming must be documented and followed consistently
- Discord webhook URL must be maintained in Vault - Future Discord integration addressed in ADR-0022
## Implementation Checklist ## Implementation Checklist
- [x] Standardize CI notifications to `gitea-ci` topic - [x] Standardize CI notifications to `gitea-ci` topic
- [x] Configure Alertmanager → ntfy for critical/warning alerts - [x] Configure Alertmanager → ntfy for critical/warning alerts
- [ ] Create `ntfy-discord-bridge` repository - [ ] Configure Flux notification-controller for deployment notifications
- [ ] Implement bridge service with proper error handling - [ ] Add `deployments` topic subscription to ntfy app
- [ ] Add ExternalSecret for Discord webhook from Vault
- [ ] Deploy bridge to observability namespace
- [ ] Document topic-to-Discord-channel mapping
## Related ## Related
- ADR-0015: CI Notifications and Semantic Versioning - ADR-0015: CI Notifications and Semantic Versioning
- ADR-0020: Internal Registry for CI/CD - ADR-0022: ntfy-Discord Bridge Service

View File

@@ -0,0 +1,241 @@
# ADR-0022: ntfy-Discord Bridge Service
## Status
Proposed
## Context
Per ADR-0021, ntfy serves as the central notification hub for the homelab. However, Discord is used for team collaboration and visibility, requiring notifications to be forwarded there as well.
ntfy does not natively support Discord webhook format. Discord expects a specific JSON structure with embeds, while ntfy uses its own message format. A bridge service is needed to:
1. Subscribe to ntfy topics
2. Transform messages to Discord embed format
3. Forward to Discord webhooks
## Decision
### Architecture
A dedicated Python microservice (`ntfy-discord-bridge`) will bridge ntfy to Discord:
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
│ Gatus/Flux │ │ (notification │ │ (mobile) │
└─────────────────┘ │ hub) │ └─────────────┘
└────────┬─────────┘
│ SSE/JSON stream
┌──────────────────┐ ┌─────────────┐
│ ntfy-discord- │────▶│ Discord │
│ bridge │ │ Webhook │
└──────────────────┘ └─────────────┘
```
### Service Design
**Repository**: `ntfy-discord-bridge`
**Technology Stack**:
- Python 3.12+
- `httpx` for async HTTP (SSE subscription + Discord POST)
- `pydantic` for configuration validation
- `structlog` for structured logging
- Poetry/uv for dependency management
**Core Features**:
1. **SSE Subscription**: Connect to ntfy's JSON stream endpoint for real-time messages
2. **Automatic Reconnection**: Exponential backoff on connection failures
3. **Message Transformation**: Convert ntfy format to Discord embed format
4. **Priority Mapping**: Map ntfy priorities to Discord embed colors
5. **Topic Routing**: Configure which topics go to which Discord channels/webhooks
6. **Health Endpoint**: `/health` for Kubernetes probes
### Configuration
Environment variables or ConfigMap:
```yaml
NTFY_URL: "http://ntfy-svc.observability.svc.cluster.local"
DISCORD_WEBHOOK_URL: "${DISCORD_WEBHOOK_URL}" # From Vault via ExternalSecret
# Topic routing (optional - defaults to single webhook)
TOPIC_WEBHOOKS: |
gitea-ci: ${DISCORD_CI_WEBHOOK}
alertmanager-alerts: ${DISCORD_ALERTS_WEBHOOK}
deployments: ${DISCORD_DEPLOYMENTS_WEBHOOK}
# Topics to subscribe to (comma-separated)
NTFY_TOPICS: "gitea-ci,alertmanager-alerts,deployments,gatus"
```
### Message Transformation
ntfy message:
```json
{
"id": "abc123",
"topic": "gitea-ci",
"title": "Build succeeded",
"message": "ray-serve-apps published to PyPI",
"priority": 3,
"tags": ["package", "white_check_mark"],
"time": 1770050091
}
```
Discord embed:
```json
{
"embeds": [{
"title": "✅ Build succeeded",
"description": "ray-serve-apps published to PyPI",
"color": 3066993,
"fields": [
{"name": "Topic", "value": "gitea-ci", "inline": true},
{"name": "Tags", "value": "package", "inline": true}
],
"timestamp": "2026-02-02T11:34:51Z",
"footer": {"text": "ntfy"}
}]
}
```
**Priority → Color Mapping**:
| Priority | Name | Discord Color |
|----------|------|---------------|
| 5 | Max/Urgent | 🔴 Red (15158332) |
| 4 | High | 🟠 Orange (15105570) |
| 3 | Default | 🔵 Blue (3066993) |
| 2 | Low | ⚪ Gray (9807270) |
| 1 | Min | ⚪ Light Gray (12370112) |
**Tag → Emoji Mapping**:
Common ntfy tags are converted to Discord-friendly emojis in the title:
- `white_check_mark` / `heavy_check_mark` → ✅
- `x` / `skull` → ❌
- `warning` → ⚠️
- `rotating_light` → 🚨
- `rocket` → 🚀
- `package` → 📦
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ntfy-discord-bridge
namespace: observability
spec:
replicas: 1
selector:
matchLabels:
app: ntfy-discord-bridge
template:
spec:
containers:
- name: bridge
image: registry.daviestechlabs.io/ntfy-discord-bridge:latest
env:
- name: NTFY_URL
value: "http://ntfy-svc.observability.svc.cluster.local"
- name: NTFY_TOPICS
value: "gitea-ci,alertmanager-alerts,deployments"
- name: DISCORD_WEBHOOK_URL
valueFrom:
secretKeyRef:
name: discord-webhook-secret
key: webhook-url
ports:
- containerPort: 8080
name: health
livenessProbe:
httpGet:
path: /health
port: health
initialDelaySeconds: 5
periodSeconds: 30
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 10m
memory: 64Mi
```
### Secret Management
Discord webhook URL stored in Vault at `kv/data/discord`:
```yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: discord-webhook-secret
namespace: observability
spec:
refreshInterval: 1h
secretStoreRef:
name: vault
kind: ClusterSecretStore
target:
name: discord-webhook-secret
data:
- secretKey: webhook-url
remoteRef:
key: kv/data/discord
property: webhook_url
```
### Error Handling
1. **Connection Loss**: Exponential backoff (1s, 2s, 4s, ... max 60s)
2. **Discord Rate Limits**: Respect `Retry-After` header, queue messages
3. **Invalid Messages**: Log and skip, don't crash
4. **Webhook Errors**: Log error, continue processing other messages
## Consequences
### Positive
- **Robust**: Proper reconnection and error handling vs shell script
- **Testable**: Python service with unit tests
- **Observable**: Structured logging, health endpoints
- **Flexible**: Easy to add topic routing, filtering, rate limiting
- **Consistent**: Follows same patterns as other Python services (handler-base, etc.)
### Negative
- **Operational Overhead**: Another service to maintain
- **Build Pipeline**: Requires CI/CD for container image
- **Latency**: Adds ~100ms to notification delivery
### Neutral
- Webhook URL must be maintained in Vault
- Service logs should be monitored for errors
## Implementation Checklist
- [ ] Create `ntfy-discord-bridge` repository
- [ ] Implement core bridge logic with httpx
- [ ] Add reconnection with exponential backoff
- [ ] Implement message transformation
- [ ] Add health endpoint
- [ ] Write unit tests
- [ ] Create Dockerfile
- [ ] Set up CI/CD pipeline (Gitea Actions)
- [ ] Add ExternalSecret for Discord webhook
- [ ] Create Kubernetes manifests
- [ ] Deploy to observability namespace
- [ ] Verify notifications flowing to Discord
## Related
- ADR-0021: Notification Architecture
- ADR-0015: CI Notifications and Semantic Versioning