Files
homelab-design/docs/adr/ADR-0022-ntfy-discord-bridge.md
Billy D. 5e9f589311 docs(adr): update ADR-0022 to use Go with hot reload
- Switch from Python to Go for smaller images (~10MB vs ~150MB)
- Add fsnotify for hot reload of secrets without pod restart
- Update status from Proposed to Accepted
- Add Prometheus metrics endpoint
- Update resource limits (32Mi vs 128Mi)
- Mark repository creation as complete
2026-02-02 17:49:34 -05:00

303 lines
9.1 KiB
Markdown

# ADR-0022: ntfy-Discord Bridge Service
## Status
Accepted
## Context
Per ADR-0021, ntfy serves as the central notification hub for the homelab. However, Discord is used for team collaboration and visibility, requiring notifications to be forwarded there as well.
ntfy does not natively support Discord webhook format. Discord expects a specific JSON structure with embeds, while ntfy uses its own message format. A bridge service is needed to:
1. Subscribe to ntfy topics
2. Transform messages to Discord embed format
3. Forward to Discord webhooks
## Decision
### Architecture
A dedicated Go microservice (`ntfy-discord`) will bridge ntfy to Discord:
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
│ Gatus/Flux │ │ (notification │ │ (mobile) │
└─────────────────┘ │ hub) │ └─────────────┘
└────────┬─────────┘
│ SSE/JSON stream
┌──────────────────┐ ┌─────────────┐
│ ntfy-discord │────▶│ Discord │
│ (Go) │ │ Webhook │
└──────────────────┘ └─────────────┘
```
### Service Design
**Repository**: `ntfy-discord`
**Technology Stack**:
- Go 1.22+
- `fsnotify` for hot reload of secrets/config
- Standard library `net/http` for SSE subscription
- `slog` for structured logging
- Scratch/distroless base image (~10MB final image)
**Why Go over Python**:
- **Smaller images**: ~10MB vs ~150MB+ for Python
- **Cloud native**: Single static binary, no runtime dependencies
- **Memory efficient**: Lower RSS, ideal for always-on bridge
- **Concurrency**: Goroutines for SSE handling and webhook delivery
- **Compile-time safety**: Catch errors before deployment
**Core Features**:
1. **SSE Subscription**: Connect to ntfy's JSON stream endpoint for real-time messages
2. **Automatic Reconnection**: Exponential backoff on connection failures
3. **Message Transformation**: Convert ntfy format to Discord embed format
4. **Priority Mapping**: Map ntfy priorities to Discord embed colors
5. **Topic Routing**: Configure which topics go to which Discord channels/webhooks
6. **Hot Reload**: Watch mounted secrets/configmaps with fsnotify, reload without restart
7. **Health Endpoint**: `/health` and `/ready` for Kubernetes probes
8. **Metrics**: Prometheus metrics at `/metrics`
### Hot Reload Implementation
Kubernetes mounts secrets as symlinked files that update atomically. The bridge uses `fsnotify` to watch for changes:
```go
// Watch for secret changes and reload config
func (b *Bridge) watchSecrets(ctx context.Context, secretPath string) {
watcher, _ := fsnotify.NewWatcher()
defer watcher.Close()
watcher.Add(secretPath)
for {
select {
case event := <-watcher.Events:
if event.Has(fsnotify.Write) || event.Has(fsnotify.Create) {
slog.Info("secret changed, reloading config")
b.reloadConfig(secretPath)
}
case <-ctx.Done():
return
}
}
}
```
This allows ExternalSecrets to rotate the Discord webhook URL without pod restarts.
### Configuration
Configuration via environment variables and mounted secrets:
```yaml
# Environment variables (ConfigMap)
NTFY_URL: "http://ntfy.observability.svc.cluster.local"
NTFY_TOPICS: "gitea-ci,alertmanager-alerts,flux-deployments,gatus"
LOG_LEVEL: "info"
METRICS_ENABLED: "true"
# Mounted secret (hot-reloadable)
/secrets/discord-webhook-url # Single webhook for all topics
# OR for topic routing:
/secrets/topic-webhooks.yaml # YAML mapping topics to webhooks
```
Topic routing file (optional):
```yaml
gitea-ci: "https://discord.com/api/webhooks/xxx/ci"
alertmanager-alerts: "https://discord.com/api/webhooks/xxx/alerts"
flux-deployments: "https://discord.com/api/webhooks/xxx/deploys"
default: "https://discord.com/api/webhooks/xxx/general"
```
### Message Transformation
ntfy message:
```json
{
"id": "abc123",
"topic": "gitea-ci",
"title": "Build succeeded",
"message": "ray-serve-apps published to PyPI",
"priority": 3,
"tags": ["package", "white_check_mark"],
"time": 1770050091
}
```
Discord embed:
```json
{
"embeds": [{
"title": "✅ Build succeeded",
"description": "ray-serve-apps published to PyPI",
"color": 3066993,
"fields": [
{"name": "Topic", "value": "gitea-ci", "inline": true}
],
"timestamp": "2026-02-02T11:34:51Z",
"footer": {"text": "ntfy"}
}]
}
```
**Priority → Color Mapping**:
| Priority | Name | Discord Color |
|----------|------|---------------|
| 5 | Max/Urgent | 🔴 Red (15158332) |
| 4 | High | 🟠 Orange (15105570) |
| 3 | Default | 🔵 Blue (3066993) |
| 2 | Low | ⚪ Gray (9807270) |
| 1 | Min | ⚪ Light Gray (12370112) |
**Tag → Emoji Mapping**:
Common ntfy tags are converted to Discord-friendly emojis in the title:
- `white_check_mark` / `heavy_check_mark` → ✅
- `x` / `skull` → ❌
- `warning` → ⚠️
- `rotating_light` → 🚨
- `rocket` → 🚀
- `package` → 📦
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ntfy-discord
namespace: observability
spec:
replicas: 1
selector:
matchLabels:
app: ntfy-discord
template:
metadata:
labels:
app: ntfy-discord
spec:
containers:
- name: bridge
image: gitea-http.gitea.svc.cluster.local:3000/daviestechlabs/ntfy-discord:latest
env:
- name: NTFY_URL
value: "http://ntfy.observability.svc.cluster.local"
- name: NTFY_TOPICS
value: "gitea-ci,alertmanager-alerts,flux-deployments"
- name: SECRETS_PATH
value: "/secrets"
ports:
- containerPort: 8080
name: http
volumeMounts:
- name: discord-secrets
mountPath: /secrets
readOnly: true
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: http
periodSeconds: 10
resources:
limits:
cpu: 50m
memory: 32Mi
requests:
cpu: 5m
memory: 16Mi
volumes:
- name: discord-secrets
secret:
secretName: discord-webhook-secret
```
### Secret Management
Discord webhook URL stored in Vault at `kv/data/discord`:
```yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: discord-webhook-secret
namespace: observability
spec:
refreshInterval: 1h
secretStoreRef:
name: vault
kind: ClusterSecretStore
target:
name: discord-webhook-secret
data:
- secretKey: webhook-url
remoteRef:
key: kv/data/discord
property: webhook_url
```
When ExternalSecrets refreshes and updates the secret, the bridge detects the file change and reloads without restart.
### Error Handling
1. **Connection Loss**: Exponential backoff (1s, 2s, 4s, ... max 60s)
2. **Discord Rate Limits**: Respect `Retry-After` header, queue messages
3. **Invalid Messages**: Log and skip, don't crash
4. **Webhook Errors**: Log error, continue processing other messages
5. **Config Reload Errors**: Log error, keep using previous config
## Consequences
### Positive
- **Tiny footprint**: ~10MB image, 16MB memory
- **Hot reload**: Secrets update without pod restart
- **Robust**: Proper reconnection and error handling
- **Observable**: Structured logging, Prometheus metrics, health endpoints
- **Fast startup**: <100ms cold start
- **Cloud native**: Static binary, distroless image
### Negative
- **Go learning curve**: Different patterns than Python services
- **Operational Overhead**: Another service to maintain
- **Latency**: Adds ~50-100ms to notification delivery
### Neutral
- Webhook URL must be maintained in Vault
- Service logs should be monitored for errors
## Implementation Checklist
- [x] Create `ntfy-discord` repository
- [ ] Implement core bridge logic
- [ ] Add SSE client with reconnection
- [ ] Implement message transformation
- [ ] Add fsnotify hot reload for secrets
- [ ] Add health/ready/metrics endpoints
- [ ] Write unit tests
- [ ] Create multi-stage Dockerfile (scratch base)
- [ ] Set up CI/CD pipeline (Gitea Actions)
- [ ] Add ExternalSecret for Discord webhook
- [ ] Create Kubernetes manifests
- [ ] Deploy to observability namespace
- [ ] Verify notifications flowing to Discord
## Related
- ADR-0021: Notification Architecture
- ADR-0015: CI Notifications and Semantic Versioning