Files
homelab-design/decisions/0021-notification-architecture.md
Billy D. 3a46a98be3
All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
docs: add ADR index workflow, standardize all ADR formats
- Add Gitea Action to auto-update README badges and ADR table on push
- Standardize 8 ADRs from heading-style to inline metadata format
- Add shields.io badges for ADR counts (total/accepted/proposed)
- Replace static directory listing with linked ADR table in README
- Accept ADR-0030 (MFA/YubiKey strategy)
2026-02-09 17:25:27 -05:00

133 lines
4.3 KiB
Markdown

# Notification Architecture
* Status: accepted
* Date: 2026-02-04
* Deciders: Billy
* Technical Story: Unify notification delivery across CI, alerting, and monitoring systems
## Context
The homelab infrastructure generates notifications from multiple sources:
1. **CI/CD pipelines** (Gitea Actions) - build success/failure
2. **Alertmanager** - Prometheus alerts for critical/warning conditions
3. **Gatus** - Service health monitoring
4. **Flux** - GitOps reconciliation events
5. **Service readiness** - Notifications when deployments complete successfully
Currently, ntfy serves as the primary notification hub, but there are several issues:
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
- **No service readiness notifications**: No visibility when services come online after deployment
## Decision
### 1. ntfy as the Notification Hub
ntfy will serve as the central notification aggregation point. All internal services publish to ntfy topics via the internal Kubernetes service URL:
```
http://ntfy-svc.observability.svc.cluster.local/<topic>
```
This keeps ntfy auth-protected externally while allowing internal services to publish freely.
### 2. Standardized Topics
| Topic | Source | Description |
|-------|--------|-------------|
| `gitea-ci` | Gitea Actions | CI/CD build notifications |
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
| `gatus` | Gatus | Service health status changes |
| `flux` | Flux | GitOps reconciliation events |
| `deployments` | Flux/Argo | Service deployment completions |
### 3. Alertmanager Integration
Alertmanager is configured to forward alerts to ntfy using the built-in `tpl=alertmanager` template:
```yaml
receivers:
- name: ntfy-critical
webhookConfigs:
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=urgent&tags=rotating_light"
sendResolved: true
- name: ntfy-warning
webhookConfigs:
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=high&tags=warning"
sendResolved: true
```
Routes direct alerts based on severity:
- `severity=critical``ntfy-critical` receiver
- `severity=warning``ntfy-warning` receiver
### 4. Service Readiness Notifications
To provide visibility when services are fully operational after deployment:
**Option A: Flux Notification Controller**
Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed:
```yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
name: ntfy-deployments
spec:
type: generic-hmac # or generic
address: http://ntfy-svc.observability.svc.cluster.local/deployments
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: deployment-success
spec:
providerRef:
name: ntfy-deployments
eventSeverity: info
eventSources:
- kind: Kustomization
name: '*'
- kind: HelmRelease
name: '*'
inclusionList:
- ".*succeeded.*"
```
**Option B: Argo Workflows Post-Deploy Hook**
For Argo-managed deployments, add a notification step at workflow completion.
**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration.
## Consequences
### Positive
- **Single source of truth**: All notifications flow through ntfy
- **Auth protection maintained**: External ntfy access requires Authentik auth
- **Deployment visibility**: Know when services are ready without watching logs
- **Consistent topic naming**: All sources follow documented conventions
### Negative
- **Configuration overhead**: Each notification source requires explicit configuration
### Neutral
- Topic naming must be documented and followed consistently
- Future Discord integration addressed in ADR-0022
## Implementation Checklist
- [x] Standardize CI notifications to `gitea-ci` topic
- [x] Configure Alertmanager → ntfy for critical/warning alerts
- [ ] Configure Flux notification-controller for deployment notifications
- [ ] Add `deployments` topic subscription to ntfy app
## Related
- ADR-0015: CI Notifications and Semantic Versioning
- ADR-0022: ntfy-Discord Bridge Service