- Define ntfy as central notification hub - Standardize topic naming (gitea-ci, alertmanager-alerts, etc.) - Document Alertmanager integration - Design ntfy-to-Discord bridge architecture (future work)
144 lines
5.5 KiB
Markdown
144 lines
5.5 KiB
Markdown
# ADR-0021: Notification Architecture
|
|
|
|
## Status
|
|
|
|
Proposed
|
|
|
|
## Context
|
|
|
|
The homelab infrastructure generates notifications from multiple sources:
|
|
|
|
1. **CI/CD pipelines** (Gitea Actions) - build success/failure
|
|
2. **Alertmanager** - Prometheus alerts for critical/warning conditions
|
|
3. **Gatus** - Service health monitoring
|
|
4. **Flux** - GitOps reconciliation events
|
|
|
|
Currently, ntfy serves as the primary notification hub, but there are several issues:
|
|
|
|
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
|
|
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
|
|
- **Discord integration desire**: Team wants notifications forwarded to Discord for visibility
|
|
|
|
## Decision
|
|
|
|
### 1. ntfy as the Notification Hub
|
|
|
|
ntfy will serve as the central notification aggregation point. All internal services publish to ntfy topics via the internal Kubernetes service URL:
|
|
|
|
```
|
|
http://ntfy-svc.observability.svc.cluster.local/<topic>
|
|
```
|
|
|
|
This keeps ntfy auth-protected externally while allowing internal services to publish freely.
|
|
|
|
### 2. Standardized Topics
|
|
|
|
| Topic | Source | Description |
|
|
|-------|--------|-------------|
|
|
| `gitea-ci` | Gitea Actions | CI/CD build notifications |
|
|
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
|
|
| `gatus` | Gatus | Service health status changes |
|
|
| `flux` | Flux | GitOps reconciliation events |
|
|
|
|
### 3. Alertmanager Integration
|
|
|
|
Alertmanager is configured to forward alerts to ntfy using the built-in `tpl=alertmanager` template:
|
|
|
|
```yaml
|
|
receivers:
|
|
- name: ntfy-critical
|
|
webhookConfigs:
|
|
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=urgent&tags=rotating_light"
|
|
sendResolved: true
|
|
- name: ntfy-warning
|
|
webhookConfigs:
|
|
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=high&tags=warning"
|
|
sendResolved: true
|
|
```
|
|
|
|
Routes direct alerts based on severity:
|
|
- `severity=critical` → `ntfy-critical` receiver
|
|
- `severity=warning` → `ntfy-warning` receiver
|
|
|
|
### 4. Discord Integration (Future)
|
|
|
|
Discord integration will be implemented as a dedicated bridge service that:
|
|
|
|
1. **Subscribes** to ntfy topics via SSE/WebSocket
|
|
2. **Transforms** ntfy message format to Discord embed format
|
|
3. **Forwards** to Discord webhook URL (stored in Vault at `kv/data/discord`)
|
|
|
|
#### Design Options
|
|
|
|
**Option A: Sidecar Container (Simple)**
|
|
- Alpine container with curl/jq
|
|
- Subscribes to ntfy JSON stream
|
|
- Transforms and POSTs to Discord
|
|
- Pros: Simple, no custom code
|
|
- Cons: Shell script fragility, limited error handling
|
|
|
|
**Option B: Dedicated Python Service (Recommended)**
|
|
- Small Python service using `httpx` or `aiohttp`
|
|
- Proper reconnection logic and error handling
|
|
- Configurable topic-to-channel mapping
|
|
- Health endpoint for monitoring
|
|
- Pros: Robust, testable, maintainable
|
|
- Cons: Requires building/publishing container image
|
|
|
|
**Option C: ntfy Actions (Limited)**
|
|
- Configure ntfy server with `upstream-base-url` or actions
|
|
- Pros: Built into ntfy
|
|
- Cons: ntfy doesn't natively support Discord webhook format
|
|
|
|
#### Recommended Architecture (Option B)
|
|
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
|
|
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
|
|
│ Gatus/Flux │ │ (notification │ │ (mobile) │
|
|
└─────────────────┘ │ hub) │ └─────────────┘
|
|
└────────┬─────────┘
|
|
│ SSE subscribe
|
|
▼
|
|
┌──────────────────┐ ┌─────────────┐
|
|
│ ntfy-discord- │────▶│ Discord │
|
|
│ bridge │ │ (webhook) │
|
|
└──────────────────┘ └─────────────┘
|
|
```
|
|
|
|
The bridge service would be a new repo (`ntfy-discord-bridge`) following the same patterns as other Python services in the homelab.
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Single source of truth**: All notifications flow through ntfy
|
|
- **Auth protection maintained**: External ntfy access requires Authentik auth
|
|
- **Flexible routing**: Can subscribe to specific topics per destination
|
|
- **Separation of concerns**: Discord bridge is independent, can be disabled without affecting ntfy
|
|
|
|
### Negative
|
|
|
|
- **Additional service**: Discord bridge adds operational overhead
|
|
- **Latency**: Two-hop delivery (source → ntfy → Discord) adds minimal latency
|
|
|
|
### Neutral
|
|
|
|
- Topic naming must be documented and followed consistently
|
|
- Discord webhook URL must be maintained in Vault
|
|
|
|
## Implementation Checklist
|
|
|
|
- [x] Standardize CI notifications to `gitea-ci` topic
|
|
- [x] Configure Alertmanager → ntfy for critical/warning alerts
|
|
- [ ] Create `ntfy-discord-bridge` repository
|
|
- [ ] Implement bridge service with proper error handling
|
|
- [ ] Add ExternalSecret for Discord webhook from Vault
|
|
- [ ] Deploy bridge to observability namespace
|
|
- [ ] Document topic-to-Discord-channel mapping
|
|
|
|
## Related
|
|
|
|
- ADR-0015: CI Notifications and Semantic Versioning
|
|
- ADR-0020: Internal Registry for CI/CD
|