Files
homelab-design/docs/adr/ADR-0021-notification-architecture.md
Billy D. e85deaa642 docs(adr): finalize ADR-0021 and add ADR-0022
ADR-0021 (Accepted):
- ntfy as central notification hub
- Alertmanager integration for critical/warning alerts
- Service readiness notifications via Flux notification-controller
- Standardized topic naming

ADR-0022 (Proposed):
- ntfy-discord-bridge Python service design
- SSE subscription with reconnection logic
- Message transformation to Discord embeds
- Priority/tag to color/emoji mapping
- Kubernetes deployment with ExternalSecret for webhook
2026-02-02 11:58:52 -05:00

132 lines
4.2 KiB
Markdown

# ADR-0021: Notification Architecture
## Status
Accepted
## Context
The homelab infrastructure generates notifications from multiple sources:
1. **CI/CD pipelines** (Gitea Actions) - build success/failure
2. **Alertmanager** - Prometheus alerts for critical/warning conditions
3. **Gatus** - Service health monitoring
4. **Flux** - GitOps reconciliation events
5. **Service readiness** - Notifications when deployments complete successfully
Currently, ntfy serves as the primary notification hub, but there are several issues:
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
- **No service readiness notifications**: No visibility when services come online after deployment
## Decision
### 1. ntfy as the Notification Hub
ntfy will serve as the central notification aggregation point. All internal services publish to ntfy topics via the internal Kubernetes service URL:
```
http://ntfy-svc.observability.svc.cluster.local/<topic>
```
This keeps ntfy auth-protected externally while allowing internal services to publish freely.
### 2. Standardized Topics
| Topic | Source | Description |
|-------|--------|-------------|
| `gitea-ci` | Gitea Actions | CI/CD build notifications |
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
| `gatus` | Gatus | Service health status changes |
| `flux` | Flux | GitOps reconciliation events |
| `deployments` | Flux/Argo | Service deployment completions |
### 3. Alertmanager Integration
Alertmanager is configured to forward alerts to ntfy using the built-in `tpl=alertmanager` template:
```yaml
receivers:
- name: ntfy-critical
webhookConfigs:
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=urgent&tags=rotating_light"
sendResolved: true
- name: ntfy-warning
webhookConfigs:
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=high&tags=warning"
sendResolved: true
```
Routes direct alerts based on severity:
- `severity=critical``ntfy-critical` receiver
- `severity=warning``ntfy-warning` receiver
### 4. Service Readiness Notifications
To provide visibility when services are fully operational after deployment:
**Option A: Flux Notification Controller**
Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed:
```yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
name: ntfy-deployments
spec:
type: generic-hmac # or generic
address: http://ntfy-svc.observability.svc.cluster.local/deployments
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: deployment-success
spec:
providerRef:
name: ntfy-deployments
eventSeverity: info
eventSources:
- kind: Kustomization
name: '*'
- kind: HelmRelease
name: '*'
inclusionList:
- ".*succeeded.*"
```
**Option B: Argo Workflows Post-Deploy Hook**
For Argo-managed deployments, add a notification step at workflow completion.
**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration.
## Consequences
### Positive
- **Single source of truth**: All notifications flow through ntfy
- **Auth protection maintained**: External ntfy access requires Authentik auth
- **Deployment visibility**: Know when services are ready without watching logs
- **Consistent topic naming**: All sources follow documented conventions
### Negative
- **Configuration overhead**: Each notification source requires explicit configuration
### Neutral
- Topic naming must be documented and followed consistently
- Future Discord integration addressed in ADR-0022
## Implementation Checklist
- [x] Standardize CI notifications to `gitea-ci` topic
- [x] Configure Alertmanager → ntfy for critical/warning alerts
- [ ] Configure Flux notification-controller for deployment notifications
- [ ] Add `deployments` topic subscription to ntfy app
## Related
- ADR-0015: CI Notifications and Semantic Versioning
- ADR-0022: ntfy-Discord Bridge Service