docs(adr): finalize ADR-0021 and add ADR-0022
ADR-0021 (Accepted): - ntfy as central notification hub - Alertmanager integration for critical/warning alerts - Service readiness notifications via Flux notification-controller - Standardized topic naming ADR-0022 (Proposed): - ntfy-discord-bridge Python service design - SSE subscription with reconnection logic - Message transformation to Discord embeds - Priority/tag to color/emoji mapping - Kubernetes deployment with ExternalSecret for webhook
This commit is contained in:
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
Proposed
|
Accepted
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
||||||
@@ -12,12 +12,13 @@ The homelab infrastructure generates notifications from multiple sources:
|
|||||||
2. **Alertmanager** - Prometheus alerts for critical/warning conditions
|
2. **Alertmanager** - Prometheus alerts for critical/warning conditions
|
||||||
3. **Gatus** - Service health monitoring
|
3. **Gatus** - Service health monitoring
|
||||||
4. **Flux** - GitOps reconciliation events
|
4. **Flux** - GitOps reconciliation events
|
||||||
|
5. **Service readiness** - Notifications when deployments complete successfully
|
||||||
|
|
||||||
Currently, ntfy serves as the primary notification hub, but there are several issues:
|
Currently, ntfy serves as the primary notification hub, but there are several issues:
|
||||||
|
|
||||||
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
|
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
|
||||||
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
|
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
|
||||||
- **Discord integration desire**: Team wants notifications forwarded to Discord for visibility
|
- **No service readiness notifications**: No visibility when services come online after deployment
|
||||||
|
|
||||||
## Decision
|
## Decision
|
||||||
|
|
||||||
@@ -39,6 +40,7 @@ This keeps ntfy auth-protected externally while allowing internal services to pu
|
|||||||
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
|
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
|
||||||
| `gatus` | Gatus | Service health status changes |
|
| `gatus` | Gatus | Service health status changes |
|
||||||
| `flux` | Flux | GitOps reconciliation events |
|
| `flux` | Flux | GitOps reconciliation events |
|
||||||
|
| `deployments` | Flux/Argo | Service deployment completions |
|
||||||
|
|
||||||
### 3. Alertmanager Integration
|
### 3. Alertmanager Integration
|
||||||
|
|
||||||
@@ -60,53 +62,43 @@ Routes direct alerts based on severity:
|
|||||||
- `severity=critical` → `ntfy-critical` receiver
|
- `severity=critical` → `ntfy-critical` receiver
|
||||||
- `severity=warning` → `ntfy-warning` receiver
|
- `severity=warning` → `ntfy-warning` receiver
|
||||||
|
|
||||||
### 4. Discord Integration (Future)
|
### 4. Service Readiness Notifications
|
||||||
|
|
||||||
Discord integration will be implemented as a dedicated bridge service that:
|
To provide visibility when services are fully operational after deployment:
|
||||||
|
|
||||||
1. **Subscribes** to ntfy topics via SSE/WebSocket
|
**Option A: Flux Notification Controller**
|
||||||
2. **Transforms** ntfy message format to Discord embed format
|
Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed:
|
||||||
3. **Forwards** to Discord webhook URL (stored in Vault at `kv/data/discord`)
|
|
||||||
|
|
||||||
#### Design Options
|
```yaml
|
||||||
|
apiVersion: notification.toolkit.fluxcd.io/v1beta3
|
||||||
**Option A: Sidecar Container (Simple)**
|
kind: Provider
|
||||||
- Alpine container with curl/jq
|
metadata:
|
||||||
- Subscribes to ntfy JSON stream
|
name: ntfy-deployments
|
||||||
- Transforms and POSTs to Discord
|
spec:
|
||||||
- Pros: Simple, no custom code
|
type: generic-hmac # or generic
|
||||||
- Cons: Shell script fragility, limited error handling
|
address: http://ntfy-svc.observability.svc.cluster.local/deployments
|
||||||
|
---
|
||||||
**Option B: Dedicated Python Service (Recommended)**
|
apiVersion: notification.toolkit.fluxcd.io/v1beta3
|
||||||
- Small Python service using `httpx` or `aiohttp`
|
kind: Alert
|
||||||
- Proper reconnection logic and error handling
|
metadata:
|
||||||
- Configurable topic-to-channel mapping
|
name: deployment-success
|
||||||
- Health endpoint for monitoring
|
spec:
|
||||||
- Pros: Robust, testable, maintainable
|
providerRef:
|
||||||
- Cons: Requires building/publishing container image
|
name: ntfy-deployments
|
||||||
|
eventSeverity: info
|
||||||
**Option C: ntfy Actions (Limited)**
|
eventSources:
|
||||||
- Configure ntfy server with `upstream-base-url` or actions
|
- kind: Kustomization
|
||||||
- Pros: Built into ntfy
|
name: '*'
|
||||||
- Cons: ntfy doesn't natively support Discord webhook format
|
- kind: HelmRelease
|
||||||
|
name: '*'
|
||||||
#### Recommended Architecture (Option B)
|
inclusionList:
|
||||||
|
- ".*succeeded.*"
|
||||||
```
|
|
||||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
|
|
||||||
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
|
|
||||||
│ Gatus/Flux │ │ (notification │ │ (mobile) │
|
|
||||||
└─────────────────┘ │ hub) │ └─────────────┘
|
|
||||||
└────────┬─────────┘
|
|
||||||
│ SSE subscribe
|
|
||||||
▼
|
|
||||||
┌──────────────────┐ ┌─────────────┐
|
|
||||||
│ ntfy-discord- │────▶│ Discord │
|
|
||||||
│ bridge │ │ (webhook) │
|
|
||||||
└──────────────────┘ └─────────────┘
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The bridge service would be a new repo (`ntfy-discord-bridge`) following the same patterns as other Python services in the homelab.
|
**Option B: Argo Workflows Post-Deploy Hook**
|
||||||
|
For Argo-managed deployments, add a notification step at workflow completion.
|
||||||
|
|
||||||
|
**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration.
|
||||||
|
|
||||||
## Consequences
|
## Consequences
|
||||||
|
|
||||||
@@ -114,30 +106,26 @@ The bridge service would be a new repo (`ntfy-discord-bridge`) following the sam
|
|||||||
|
|
||||||
- **Single source of truth**: All notifications flow through ntfy
|
- **Single source of truth**: All notifications flow through ntfy
|
||||||
- **Auth protection maintained**: External ntfy access requires Authentik auth
|
- **Auth protection maintained**: External ntfy access requires Authentik auth
|
||||||
- **Flexible routing**: Can subscribe to specific topics per destination
|
- **Deployment visibility**: Know when services are ready without watching logs
|
||||||
- **Separation of concerns**: Discord bridge is independent, can be disabled without affecting ntfy
|
- **Consistent topic naming**: All sources follow documented conventions
|
||||||
|
|
||||||
### Negative
|
### Negative
|
||||||
|
|
||||||
- **Additional service**: Discord bridge adds operational overhead
|
- **Configuration overhead**: Each notification source requires explicit configuration
|
||||||
- **Latency**: Two-hop delivery (source → ntfy → Discord) adds minimal latency
|
|
||||||
|
|
||||||
### Neutral
|
### Neutral
|
||||||
|
|
||||||
- Topic naming must be documented and followed consistently
|
- Topic naming must be documented and followed consistently
|
||||||
- Discord webhook URL must be maintained in Vault
|
- Future Discord integration addressed in ADR-0022
|
||||||
|
|
||||||
## Implementation Checklist
|
## Implementation Checklist
|
||||||
|
|
||||||
- [x] Standardize CI notifications to `gitea-ci` topic
|
- [x] Standardize CI notifications to `gitea-ci` topic
|
||||||
- [x] Configure Alertmanager → ntfy for critical/warning alerts
|
- [x] Configure Alertmanager → ntfy for critical/warning alerts
|
||||||
- [ ] Create `ntfy-discord-bridge` repository
|
- [ ] Configure Flux notification-controller for deployment notifications
|
||||||
- [ ] Implement bridge service with proper error handling
|
- [ ] Add `deployments` topic subscription to ntfy app
|
||||||
- [ ] Add ExternalSecret for Discord webhook from Vault
|
|
||||||
- [ ] Deploy bridge to observability namespace
|
|
||||||
- [ ] Document topic-to-Discord-channel mapping
|
|
||||||
|
|
||||||
## Related
|
## Related
|
||||||
|
|
||||||
- ADR-0015: CI Notifications and Semantic Versioning
|
- ADR-0015: CI Notifications and Semantic Versioning
|
||||||
- ADR-0020: Internal Registry for CI/CD
|
- ADR-0022: ntfy-Discord Bridge Service
|
||||||
|
|||||||
241
docs/adr/ADR-0022-ntfy-discord-bridge.md
Normal file
241
docs/adr/ADR-0022-ntfy-discord-bridge.md
Normal file
@@ -0,0 +1,241 @@
|
|||||||
|
# ADR-0022: ntfy-Discord Bridge Service
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Proposed
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Per ADR-0021, ntfy serves as the central notification hub for the homelab. However, Discord is used for team collaboration and visibility, requiring notifications to be forwarded there as well.
|
||||||
|
|
||||||
|
ntfy does not natively support Discord webhook format. Discord expects a specific JSON structure with embeds, while ntfy uses its own message format. A bridge service is needed to:
|
||||||
|
|
||||||
|
1. Subscribe to ntfy topics
|
||||||
|
2. Transform messages to Discord embed format
|
||||||
|
3. Forward to Discord webhooks
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
A dedicated Python microservice (`ntfy-discord-bridge`) will bridge ntfy to Discord:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
|
||||||
|
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
|
||||||
|
│ Gatus/Flux │ │ (notification │ │ (mobile) │
|
||||||
|
└─────────────────┘ │ hub) │ └─────────────┘
|
||||||
|
└────────┬─────────┘
|
||||||
|
│ SSE/JSON stream
|
||||||
|
▼
|
||||||
|
┌──────────────────┐ ┌─────────────┐
|
||||||
|
│ ntfy-discord- │────▶│ Discord │
|
||||||
|
│ bridge │ │ Webhook │
|
||||||
|
└──────────────────┘ └─────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Service Design
|
||||||
|
|
||||||
|
**Repository**: `ntfy-discord-bridge`
|
||||||
|
|
||||||
|
**Technology Stack**:
|
||||||
|
- Python 3.12+
|
||||||
|
- `httpx` for async HTTP (SSE subscription + Discord POST)
|
||||||
|
- `pydantic` for configuration validation
|
||||||
|
- `structlog` for structured logging
|
||||||
|
- Poetry/uv for dependency management
|
||||||
|
|
||||||
|
**Core Features**:
|
||||||
|
|
||||||
|
1. **SSE Subscription**: Connect to ntfy's JSON stream endpoint for real-time messages
|
||||||
|
2. **Automatic Reconnection**: Exponential backoff on connection failures
|
||||||
|
3. **Message Transformation**: Convert ntfy format to Discord embed format
|
||||||
|
4. **Priority Mapping**: Map ntfy priorities to Discord embed colors
|
||||||
|
5. **Topic Routing**: Configure which topics go to which Discord channels/webhooks
|
||||||
|
6. **Health Endpoint**: `/health` for Kubernetes probes
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Environment variables or ConfigMap:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
NTFY_URL: "http://ntfy-svc.observability.svc.cluster.local"
|
||||||
|
DISCORD_WEBHOOK_URL: "${DISCORD_WEBHOOK_URL}" # From Vault via ExternalSecret
|
||||||
|
|
||||||
|
# Topic routing (optional - defaults to single webhook)
|
||||||
|
TOPIC_WEBHOOKS: |
|
||||||
|
gitea-ci: ${DISCORD_CI_WEBHOOK}
|
||||||
|
alertmanager-alerts: ${DISCORD_ALERTS_WEBHOOK}
|
||||||
|
deployments: ${DISCORD_DEPLOYMENTS_WEBHOOK}
|
||||||
|
|
||||||
|
# Topics to subscribe to (comma-separated)
|
||||||
|
NTFY_TOPICS: "gitea-ci,alertmanager-alerts,deployments,gatus"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Message Transformation
|
||||||
|
|
||||||
|
ntfy message:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "abc123",
|
||||||
|
"topic": "gitea-ci",
|
||||||
|
"title": "Build succeeded",
|
||||||
|
"message": "ray-serve-apps published to PyPI",
|
||||||
|
"priority": 3,
|
||||||
|
"tags": ["package", "white_check_mark"],
|
||||||
|
"time": 1770050091
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Discord embed:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"embeds": [{
|
||||||
|
"title": "✅ Build succeeded",
|
||||||
|
"description": "ray-serve-apps published to PyPI",
|
||||||
|
"color": 3066993,
|
||||||
|
"fields": [
|
||||||
|
{"name": "Topic", "value": "gitea-ci", "inline": true},
|
||||||
|
{"name": "Tags", "value": "package", "inline": true}
|
||||||
|
],
|
||||||
|
"timestamp": "2026-02-02T11:34:51Z",
|
||||||
|
"footer": {"text": "ntfy"}
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Priority → Color Mapping**:
|
||||||
|
| Priority | Name | Discord Color |
|
||||||
|
|----------|------|---------------|
|
||||||
|
| 5 | Max/Urgent | 🔴 Red (15158332) |
|
||||||
|
| 4 | High | 🟠 Orange (15105570) |
|
||||||
|
| 3 | Default | 🔵 Blue (3066993) |
|
||||||
|
| 2 | Low | ⚪ Gray (9807270) |
|
||||||
|
| 1 | Min | ⚪ Light Gray (12370112) |
|
||||||
|
|
||||||
|
**Tag → Emoji Mapping**:
|
||||||
|
Common ntfy tags are converted to Discord-friendly emojis in the title:
|
||||||
|
- `white_check_mark` / `heavy_check_mark` → ✅
|
||||||
|
- `x` / `skull` → ❌
|
||||||
|
- `warning` → ⚠️
|
||||||
|
- `rotating_light` → 🚨
|
||||||
|
- `rocket` → 🚀
|
||||||
|
- `package` → 📦
|
||||||
|
|
||||||
|
### Kubernetes Deployment
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: ntfy-discord-bridge
|
||||||
|
namespace: observability
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: ntfy-discord-bridge
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: bridge
|
||||||
|
image: registry.daviestechlabs.io/ntfy-discord-bridge:latest
|
||||||
|
env:
|
||||||
|
- name: NTFY_URL
|
||||||
|
value: "http://ntfy-svc.observability.svc.cluster.local"
|
||||||
|
- name: NTFY_TOPICS
|
||||||
|
value: "gitea-ci,alertmanager-alerts,deployments"
|
||||||
|
- name: DISCORD_WEBHOOK_URL
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: discord-webhook-secret
|
||||||
|
key: webhook-url
|
||||||
|
ports:
|
||||||
|
- containerPort: 8080
|
||||||
|
name: health
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health
|
||||||
|
port: health
|
||||||
|
initialDelaySeconds: 5
|
||||||
|
periodSeconds: 30
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 128Mi
|
||||||
|
requests:
|
||||||
|
cpu: 10m
|
||||||
|
memory: 64Mi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Secret Management
|
||||||
|
|
||||||
|
Discord webhook URL stored in Vault at `kv/data/discord`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: external-secrets.io/v1beta1
|
||||||
|
kind: ExternalSecret
|
||||||
|
metadata:
|
||||||
|
name: discord-webhook-secret
|
||||||
|
namespace: observability
|
||||||
|
spec:
|
||||||
|
refreshInterval: 1h
|
||||||
|
secretStoreRef:
|
||||||
|
name: vault
|
||||||
|
kind: ClusterSecretStore
|
||||||
|
target:
|
||||||
|
name: discord-webhook-secret
|
||||||
|
data:
|
||||||
|
- secretKey: webhook-url
|
||||||
|
remoteRef:
|
||||||
|
key: kv/data/discord
|
||||||
|
property: webhook_url
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
|
||||||
|
1. **Connection Loss**: Exponential backoff (1s, 2s, 4s, ... max 60s)
|
||||||
|
2. **Discord Rate Limits**: Respect `Retry-After` header, queue messages
|
||||||
|
3. **Invalid Messages**: Log and skip, don't crash
|
||||||
|
4. **Webhook Errors**: Log error, continue processing other messages
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
|
||||||
|
- **Robust**: Proper reconnection and error handling vs shell script
|
||||||
|
- **Testable**: Python service with unit tests
|
||||||
|
- **Observable**: Structured logging, health endpoints
|
||||||
|
- **Flexible**: Easy to add topic routing, filtering, rate limiting
|
||||||
|
- **Consistent**: Follows same patterns as other Python services (handler-base, etc.)
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
|
||||||
|
- **Operational Overhead**: Another service to maintain
|
||||||
|
- **Build Pipeline**: Requires CI/CD for container image
|
||||||
|
- **Latency**: Adds ~100ms to notification delivery
|
||||||
|
|
||||||
|
### Neutral
|
||||||
|
|
||||||
|
- Webhook URL must be maintained in Vault
|
||||||
|
- Service logs should be monitored for errors
|
||||||
|
|
||||||
|
## Implementation Checklist
|
||||||
|
|
||||||
|
- [ ] Create `ntfy-discord-bridge` repository
|
||||||
|
- [ ] Implement core bridge logic with httpx
|
||||||
|
- [ ] Add reconnection with exponential backoff
|
||||||
|
- [ ] Implement message transformation
|
||||||
|
- [ ] Add health endpoint
|
||||||
|
- [ ] Write unit tests
|
||||||
|
- [ ] Create Dockerfile
|
||||||
|
- [ ] Set up CI/CD pipeline (Gitea Actions)
|
||||||
|
- [ ] Add ExternalSecret for Discord webhook
|
||||||
|
- [ ] Create Kubernetes manifests
|
||||||
|
- [ ] Deploy to observability namespace
|
||||||
|
- [ ] Verify notifications flowing to Discord
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- ADR-0021: Notification Architecture
|
||||||
|
- ADR-0015: CI Notifications and Semantic Versioning
|
||||||
Reference in New Issue
Block a user