docs(adr): finalize ADR-0021 and add ADR-0022

ADR-0021 (Accepted):
- ntfy as central notification hub
- Alertmanager integration for critical/warning alerts
- Service readiness notifications via Flux notification-controller
- Standardized topic naming

ADR-0022 (Proposed):
- ntfy-discord-bridge Python service design
- SSE subscription with reconnection logic
- Message transformation to Discord embeds
- Priority/tag to color/emoji mapping
- Kubernetes deployment with ExternalSecret for webhook
This commit is contained in:
2026-02-02 11:58:52 -05:00
parent 7b77d6c29f
commit e85deaa642
2 changed files with 284 additions and 55 deletions

View File

@@ -2,7 +2,7 @@
## Status
Proposed
Accepted
## Context
@@ -12,12 +12,13 @@ The homelab infrastructure generates notifications from multiple sources:
2. **Alertmanager** - Prometheus alerts for critical/warning conditions
3. **Gatus** - Service health monitoring
4. **Flux** - GitOps reconciliation events
5. **Service readiness** - Notifications when deployments complete successfully
Currently, ntfy serves as the primary notification hub, but there are several issues:
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
- **Discord integration desire**: Team wants notifications forwarded to Discord for visibility
- **No service readiness notifications**: No visibility when services come online after deployment
## Decision
@@ -39,6 +40,7 @@ This keeps ntfy auth-protected externally while allowing internal services to pu
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
| `gatus` | Gatus | Service health status changes |
| `flux` | Flux | GitOps reconciliation events |
| `deployments` | Flux/Argo | Service deployment completions |
### 3. Alertmanager Integration
@@ -60,53 +62,43 @@ Routes direct alerts based on severity:
- `severity=critical``ntfy-critical` receiver
- `severity=warning``ntfy-warning` receiver
### 4. Discord Integration (Future)
### 4. Service Readiness Notifications
Discord integration will be implemented as a dedicated bridge service that:
To provide visibility when services are fully operational after deployment:
1. **Subscribes** to ntfy topics via SSE/WebSocket
2. **Transforms** ntfy message format to Discord embed format
3. **Forwards** to Discord webhook URL (stored in Vault at `kv/data/discord`)
**Option A: Flux Notification Controller**
Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed:
#### Design Options
**Option A: Sidecar Container (Simple)**
- Alpine container with curl/jq
- Subscribes to ntfy JSON stream
- Transforms and POSTs to Discord
- Pros: Simple, no custom code
- Cons: Shell script fragility, limited error handling
**Option B: Dedicated Python Service (Recommended)**
- Small Python service using `httpx` or `aiohttp`
- Proper reconnection logic and error handling
- Configurable topic-to-channel mapping
- Health endpoint for monitoring
- Pros: Robust, testable, maintainable
- Cons: Requires building/publishing container image
**Option C: ntfy Actions (Limited)**
- Configure ntfy server with `upstream-base-url` or actions
- Pros: Built into ntfy
- Cons: ntfy doesn't natively support Discord webhook format
#### Recommended Architecture (Option B)
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
│ Gatus/Flux │ │ (notification │ │ (mobile) │
└─────────────────┘ │ hub) │ └─────────────┘
└────────┬─────────┘
│ SSE subscribe
┌──────────────────┐ ┌─────────────┐
│ ntfy-discord- │────▶│ Discord │
│ bridge │ │ (webhook) │
└──────────────────┘ └─────────────┘
```yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
name: ntfy-deployments
spec:
type: generic-hmac # or generic
address: http://ntfy-svc.observability.svc.cluster.local/deployments
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: deployment-success
spec:
providerRef:
name: ntfy-deployments
eventSeverity: info
eventSources:
- kind: Kustomization
name: '*'
- kind: HelmRelease
name: '*'
inclusionList:
- ".*succeeded.*"
```
The bridge service would be a new repo (`ntfy-discord-bridge`) following the same patterns as other Python services in the homelab.
**Option B: Argo Workflows Post-Deploy Hook**
For Argo-managed deployments, add a notification step at workflow completion.
**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration.
## Consequences
@@ -114,30 +106,26 @@ The bridge service would be a new repo (`ntfy-discord-bridge`) following the sam
- **Single source of truth**: All notifications flow through ntfy
- **Auth protection maintained**: External ntfy access requires Authentik auth
- **Flexible routing**: Can subscribe to specific topics per destination
- **Separation of concerns**: Discord bridge is independent, can be disabled without affecting ntfy
- **Deployment visibility**: Know when services are ready without watching logs
- **Consistent topic naming**: All sources follow documented conventions
### Negative
- **Additional service**: Discord bridge adds operational overhead
- **Latency**: Two-hop delivery (source → ntfy → Discord) adds minimal latency
- **Configuration overhead**: Each notification source requires explicit configuration
### Neutral
- Topic naming must be documented and followed consistently
- Discord webhook URL must be maintained in Vault
- Future Discord integration addressed in ADR-0022
## Implementation Checklist
- [x] Standardize CI notifications to `gitea-ci` topic
- [x] Configure Alertmanager → ntfy for critical/warning alerts
- [ ] Create `ntfy-discord-bridge` repository
- [ ] Implement bridge service with proper error handling
- [ ] Add ExternalSecret for Discord webhook from Vault
- [ ] Deploy bridge to observability namespace
- [ ] Document topic-to-Discord-channel mapping
- [ ] Configure Flux notification-controller for deployment notifications
- [ ] Add `deployments` topic subscription to ntfy app
## Related
- ADR-0015: CI Notifications and Semantic Versioning
- ADR-0020: Internal Registry for CI/CD
- ADR-0022: ntfy-Discord Bridge Service

View File

@@ -0,0 +1,241 @@
# ADR-0022: ntfy-Discord Bridge Service
## Status
Proposed
## Context
Per ADR-0021, ntfy serves as the central notification hub for the homelab. However, Discord is used for team collaboration and visibility, requiring notifications to be forwarded there as well.
ntfy does not natively support Discord webhook format. Discord expects a specific JSON structure with embeds, while ntfy uses its own message format. A bridge service is needed to:
1. Subscribe to ntfy topics
2. Transform messages to Discord embed format
3. Forward to Discord webhooks
## Decision
### Architecture
A dedicated Python microservice (`ntfy-discord-bridge`) will bridge ntfy to Discord:
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
│ Gatus/Flux │ │ (notification │ │ (mobile) │
└─────────────────┘ │ hub) │ └─────────────┘
└────────┬─────────┘
│ SSE/JSON stream
┌──────────────────┐ ┌─────────────┐
│ ntfy-discord- │────▶│ Discord │
│ bridge │ │ Webhook │
└──────────────────┘ └─────────────┘
```
### Service Design
**Repository**: `ntfy-discord-bridge`
**Technology Stack**:
- Python 3.12+
- `httpx` for async HTTP (SSE subscription + Discord POST)
- `pydantic` for configuration validation
- `structlog` for structured logging
- Poetry/uv for dependency management
**Core Features**:
1. **SSE Subscription**: Connect to ntfy's JSON stream endpoint for real-time messages
2. **Automatic Reconnection**: Exponential backoff on connection failures
3. **Message Transformation**: Convert ntfy format to Discord embed format
4. **Priority Mapping**: Map ntfy priorities to Discord embed colors
5. **Topic Routing**: Configure which topics go to which Discord channels/webhooks
6. **Health Endpoint**: `/health` for Kubernetes probes
### Configuration
Environment variables or ConfigMap:
```yaml
NTFY_URL: "http://ntfy-svc.observability.svc.cluster.local"
DISCORD_WEBHOOK_URL: "${DISCORD_WEBHOOK_URL}" # From Vault via ExternalSecret
# Topic routing (optional - defaults to single webhook)
TOPIC_WEBHOOKS: |
gitea-ci: ${DISCORD_CI_WEBHOOK}
alertmanager-alerts: ${DISCORD_ALERTS_WEBHOOK}
deployments: ${DISCORD_DEPLOYMENTS_WEBHOOK}
# Topics to subscribe to (comma-separated)
NTFY_TOPICS: "gitea-ci,alertmanager-alerts,deployments,gatus"
```
### Message Transformation
ntfy message:
```json
{
"id": "abc123",
"topic": "gitea-ci",
"title": "Build succeeded",
"message": "ray-serve-apps published to PyPI",
"priority": 3,
"tags": ["package", "white_check_mark"],
"time": 1770050091
}
```
Discord embed:
```json
{
"embeds": [{
"title": "✅ Build succeeded",
"description": "ray-serve-apps published to PyPI",
"color": 3066993,
"fields": [
{"name": "Topic", "value": "gitea-ci", "inline": true},
{"name": "Tags", "value": "package", "inline": true}
],
"timestamp": "2026-02-02T11:34:51Z",
"footer": {"text": "ntfy"}
}]
}
```
**Priority → Color Mapping**:
| Priority | Name | Discord Color |
|----------|------|---------------|
| 5 | Max/Urgent | 🔴 Red (15158332) |
| 4 | High | 🟠 Orange (15105570) |
| 3 | Default | 🔵 Blue (3066993) |
| 2 | Low | ⚪ Gray (9807270) |
| 1 | Min | ⚪ Light Gray (12370112) |
**Tag → Emoji Mapping**:
Common ntfy tags are converted to Discord-friendly emojis in the title:
- `white_check_mark` / `heavy_check_mark` → ✅
- `x` / `skull` → ❌
- `warning` → ⚠️
- `rotating_light` → 🚨
- `rocket` → 🚀
- `package` → 📦
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ntfy-discord-bridge
namespace: observability
spec:
replicas: 1
selector:
matchLabels:
app: ntfy-discord-bridge
template:
spec:
containers:
- name: bridge
image: registry.daviestechlabs.io/ntfy-discord-bridge:latest
env:
- name: NTFY_URL
value: "http://ntfy-svc.observability.svc.cluster.local"
- name: NTFY_TOPICS
value: "gitea-ci,alertmanager-alerts,deployments"
- name: DISCORD_WEBHOOK_URL
valueFrom:
secretKeyRef:
name: discord-webhook-secret
key: webhook-url
ports:
- containerPort: 8080
name: health
livenessProbe:
httpGet:
path: /health
port: health
initialDelaySeconds: 5
periodSeconds: 30
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 10m
memory: 64Mi
```
### Secret Management
Discord webhook URL stored in Vault at `kv/data/discord`:
```yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: discord-webhook-secret
namespace: observability
spec:
refreshInterval: 1h
secretStoreRef:
name: vault
kind: ClusterSecretStore
target:
name: discord-webhook-secret
data:
- secretKey: webhook-url
remoteRef:
key: kv/data/discord
property: webhook_url
```
### Error Handling
1. **Connection Loss**: Exponential backoff (1s, 2s, 4s, ... max 60s)
2. **Discord Rate Limits**: Respect `Retry-After` header, queue messages
3. **Invalid Messages**: Log and skip, don't crash
4. **Webhook Errors**: Log error, continue processing other messages
## Consequences
### Positive
- **Robust**: Proper reconnection and error handling vs shell script
- **Testable**: Python service with unit tests
- **Observable**: Structured logging, health endpoints
- **Flexible**: Easy to add topic routing, filtering, rate limiting
- **Consistent**: Follows same patterns as other Python services (handler-base, etc.)
### Negative
- **Operational Overhead**: Another service to maintain
- **Build Pipeline**: Requires CI/CD for container image
- **Latency**: Adds ~100ms to notification delivery
### Neutral
- Webhook URL must be maintained in Vault
- Service logs should be monitored for errors
## Implementation Checklist
- [ ] Create `ntfy-discord-bridge` repository
- [ ] Implement core bridge logic with httpx
- [ ] Add reconnection with exponential backoff
- [ ] Implement message transformation
- [ ] Add health endpoint
- [ ] Write unit tests
- [ ] Create Dockerfile
- [ ] Set up CI/CD pipeline (Gitea Actions)
- [ ] Add ExternalSecret for Discord webhook
- [ ] Create Kubernetes manifests
- [ ] Deploy to observability namespace
- [ ] Verify notifications flowing to Discord
## Related
- ADR-0021: Notification Architecture
- ADR-0015: CI Notifications and Semantic Versioning