All checks were successful
Update README with ADR Index / update-readme (push) Successful in 6s
- Add Gitea Action to auto-update README badges and ADR table on push - Standardize 8 ADRs from heading-style to inline metadata format - Add shields.io badges for ADR counts (total/accepted/proposed) - Replace static directory listing with linked ADR table in README - Accept ADR-0030 (MFA/YubiKey strategy)
133 lines
4.3 KiB
Markdown
133 lines
4.3 KiB
Markdown
# Notification Architecture
|
|
|
|
* Status: accepted
|
|
* Date: 2026-02-04
|
|
* Deciders: Billy
|
|
* Technical Story: Unify notification delivery across CI, alerting, and monitoring systems
|
|
|
|
## Context
|
|
|
|
The homelab infrastructure generates notifications from multiple sources:
|
|
|
|
1. **CI/CD pipelines** (Gitea Actions) - build success/failure
|
|
2. **Alertmanager** - Prometheus alerts for critical/warning conditions
|
|
3. **Gatus** - Service health monitoring
|
|
4. **Flux** - GitOps reconciliation events
|
|
5. **Service readiness** - Notifications when deployments complete successfully
|
|
|
|
Currently, ntfy serves as the primary notification hub, but there are several issues:
|
|
|
|
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
|
|
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
|
|
- **No service readiness notifications**: No visibility when services come online after deployment
|
|
|
|
## Decision
|
|
|
|
### 1. ntfy as the Notification Hub
|
|
|
|
ntfy will serve as the central notification aggregation point. All internal services publish to ntfy topics via the internal Kubernetes service URL:
|
|
|
|
```
|
|
http://ntfy-svc.observability.svc.cluster.local/<topic>
|
|
```
|
|
|
|
This keeps ntfy auth-protected externally while allowing internal services to publish freely.
|
|
|
|
### 2. Standardized Topics
|
|
|
|
| Topic | Source | Description |
|
|
|-------|--------|-------------|
|
|
| `gitea-ci` | Gitea Actions | CI/CD build notifications |
|
|
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
|
|
| `gatus` | Gatus | Service health status changes |
|
|
| `flux` | Flux | GitOps reconciliation events |
|
|
| `deployments` | Flux/Argo | Service deployment completions |
|
|
|
|
### 3. Alertmanager Integration
|
|
|
|
Alertmanager is configured to forward alerts to ntfy using the built-in `tpl=alertmanager` template:
|
|
|
|
```yaml
|
|
receivers:
|
|
- name: ntfy-critical
|
|
webhookConfigs:
|
|
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=urgent&tags=rotating_light"
|
|
sendResolved: true
|
|
- name: ntfy-warning
|
|
webhookConfigs:
|
|
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=high&tags=warning"
|
|
sendResolved: true
|
|
```
|
|
|
|
Routes direct alerts based on severity:
|
|
- `severity=critical` → `ntfy-critical` receiver
|
|
- `severity=warning` → `ntfy-warning` receiver
|
|
|
|
### 4. Service Readiness Notifications
|
|
|
|
To provide visibility when services are fully operational after deployment:
|
|
|
|
**Option A: Flux Notification Controller**
|
|
Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed:
|
|
|
|
```yaml
|
|
apiVersion: notification.toolkit.fluxcd.io/v1beta3
|
|
kind: Provider
|
|
metadata:
|
|
name: ntfy-deployments
|
|
spec:
|
|
type: generic-hmac # or generic
|
|
address: http://ntfy-svc.observability.svc.cluster.local/deployments
|
|
---
|
|
apiVersion: notification.toolkit.fluxcd.io/v1beta3
|
|
kind: Alert
|
|
metadata:
|
|
name: deployment-success
|
|
spec:
|
|
providerRef:
|
|
name: ntfy-deployments
|
|
eventSeverity: info
|
|
eventSources:
|
|
- kind: Kustomization
|
|
name: '*'
|
|
- kind: HelmRelease
|
|
name: '*'
|
|
inclusionList:
|
|
- ".*succeeded.*"
|
|
```
|
|
|
|
**Option B: Argo Workflows Post-Deploy Hook**
|
|
For Argo-managed deployments, add a notification step at workflow completion.
|
|
|
|
**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration.
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Single source of truth**: All notifications flow through ntfy
|
|
- **Auth protection maintained**: External ntfy access requires Authentik auth
|
|
- **Deployment visibility**: Know when services are ready without watching logs
|
|
- **Consistent topic naming**: All sources follow documented conventions
|
|
|
|
### Negative
|
|
|
|
- **Configuration overhead**: Each notification source requires explicit configuration
|
|
|
|
### Neutral
|
|
|
|
- Topic naming must be documented and followed consistently
|
|
- Future Discord integration addressed in ADR-0022
|
|
|
|
## Implementation Checklist
|
|
|
|
- [x] Standardize CI notifications to `gitea-ci` topic
|
|
- [x] Configure Alertmanager → ntfy for critical/warning alerts
|
|
- [ ] Configure Flux notification-controller for deployment notifications
|
|
- [ ] Add `deployments` topic subscription to ntfy app
|
|
|
|
## Related
|
|
|
|
- ADR-0015: CI Notifications and Semantic Versioning
|
|
- ADR-0022: ntfy-Discord Bridge Service
|