chore: Consolidate ADRs into decisions/ directory

- Added ADR-0016: Affine email verification strategy
- Moved ADRs 0019-0024 from docs/adr/ to decisions/
- Renamed to consistent format (removed ADR- prefix)
This commit is contained in:
2026-02-04 08:28:12 -05:00
parent 85b1a9019b
commit 8f4df84657
7 changed files with 150 additions and 0 deletions

View File

@@ -0,0 +1,150 @@
# Affine Email Verification Strategy for Authentik OIDC
* Status: proposed
* Date: 2026-02-04
* Deciders: Billy
* Technical Story: Affine requires email verification for users, but Authentik is not configured with SMTP for email delivery
## Context and Problem Statement
Affine (self-hosted note-taking/collaboration tool) requires users to have verified email addresses. When users authenticate via Authentik OIDC, Affine checks the `email_verified` claim. Currently, Authentik has no SMTP configuration, so it cannot send verification emails, causing new users to be blocked or have limited functionality in Affine.
How can we satisfy Affine's email verification requirement without adding significant infrastructure complexity to the homelab?
## Decision Drivers
* Minimize external dependencies and ongoing costs
* Keep the solution self-contained within the homelab
* Avoid breaking changes on Affine upgrades
* Maintain security - don't completely bypass verification for untrusted users
* Simple to implement and maintain
## Considered Options
1. **Override `email_verified` claim in Authentik** - Configure Authentik to always return `email_verified: true` for trusted users
2. **Deploy local SMTP server (Mailpit)** - Run a lightweight mail capture server in-cluster
3. **Configure Affine to skip verification for OIDC users** - Use Affine's configuration to trust OIDC-provided emails
## Decision Outcome
Chosen option: **Option 1 (Override `email_verified` claim)** as the primary solution, with Option 3 as a fallback if Affine supports it.
This approach requires zero additional infrastructure, works immediately, and is appropriate for a homelab where all users are trusted (family/personal use). Option 2 (Mailpit) is documented for future reference if actual email delivery becomes needed for other applications.
### Positive Consequences
* No additional services to deploy or maintain
* Works immediately with existing Authentik setup
* No external dependencies or costs
* Can be easily reverted if requirements change
### Negative Consequences
* Bypasses "real" email verification - relies on trust
* If Affine is ever exposed to untrusted users, this would need revisiting
* Other applications expecting real email verification would need similar workarounds
## Pros and Cons of the Options
### Option 1: Override `email_verified` Claim in Authentik
Configure an Authentik property mapping to always return `email_verified: true` in the OIDC token for the Affine application.
**Implementation:**
1. In Authentik Admin → Customization → Property Mappings
2. Create a new "Scope Mapping" for email_verified
3. Set expression: `return True`
4. Assign to Affine OIDC provider
* Good, because zero infrastructure required
* Good, because immediate solution
* Good, because appropriate for trusted homelab users
* Bad, because not "real" verification
* Bad, because per-application configuration needed
### Option 2: Deploy Local SMTP Server (Mailpit)
Deploy Mailpit (or MailHog) as a lightweight SMTP server in the cluster that captures all emails for viewing via web UI.
**Implementation:**
```yaml
# Example Mailpit deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: mailpit
namespace: productivity
spec:
replicas: 1
selector:
matchLabels:
app: mailpit
template:
spec:
containers:
- name: mailpit
image: axllent/mailpit:latest
ports:
- containerPort: 1025 # SMTP
- containerPort: 8025 # Web UI
```
Then configure Authentik SMTP settings:
- Host: `mailpit.productivity.svc.cluster.local`
- Port: `1025`
- TLS: disabled (internal traffic)
* Good, because provides actual email flow for testing
* Good, because useful for other apps needing email (password reset, notifications)
* Good, because emails viewable via web UI
* Bad, because emails don't actually leave the cluster
* Bad, because another service to maintain
* Bad, because requires Authentik reconfiguration
### Option 3: Configure Affine to Skip Verification for OIDC Users
If Affine supports it, configure the application to trust email addresses from OIDC providers without requiring separate verification.
**Potential Configuration (needs verification):**
```yaml
# In affine-config ConfigMap
AFFINE_AUTH_OIDC_EMAIL_VERIFIED: "true"
# or similar environment variable
```
* Good, because no Authentik changes needed
* Good, because scoped to Affine only
* Bad, because may not be supported by Affine
* Bad, because could break on Affine upgrades
* Bad, because requires research into Affine's configuration options
## Implementation Notes
### For Option 1 (Recommended)
1. Access Authentik admin at `https://auth.daviestechlabs.io/if/admin/`
2. Navigate to Customization → Property Mappings
3. Create new Scope Mapping:
- Name: `Affine Email Verified Override`
- Scope name: `email`
- Expression:
```python
return {
"email": request.user.email,
"email_verified": True,
}
```
4. Edit the Affine OIDC Provider → Advanced Settings → Scope Mappings
5. Replace default email mapping with the new override
### Future Considerations
If the homelab expands to include external users or applications requiring real email delivery:
- Revisit Option 2 (Mailpit) for development/testing
- Consider external SMTP service (SendGrid free tier, AWS SES) for production email
## References
* [Authentik Property Mappings Documentation](https://docs.goauthentik.io/docs/property-mappings)
* [Affine Self-Hosting Documentation](https://docs.affine.pro/docs/self-host-affine)
* [Mailpit GitHub](https://github.com/axllent/mailpit)

View File

@@ -0,0 +1,365 @@
# ADR-0019: Python Module Deployment Strategy
## Status
Accepted
## Date
2026-02-02
## Context
We have Python modules for AI/ML workflows that need to run on our unified GPU cluster:
| Repo | Purpose | Needs GPU? |
|------|---------|------------|
| `handler-base` | Shared library (NATS, clients, telemetry) | No |
| `chat-handler` | Text chat → RAG → LLM pipeline | No (calls GPU endpoints) |
| `voice-assistant` | Audio → STT → RAG → LLM → TTS pipeline | No (calls GPU endpoints) |
| `pipeline-bridge` | Kubeflow ↔ NATS integration | No |
| `kuberay-images/ray-serve/` | Inference deployments (Whisper, TTS, LLM, etc.) | **Yes** |
### Current Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ PLATFORM LAYERS │
├─────────────────────────────────────────────────────────────────────┤
│ Kubeflow Pipelines │ KServe (visibility) │ MLflow (registry) │
│ [Orchestration] │ [InferenceServices] │ [Models/Metrics] │
├─────────────────────────────────────────────────────────────────────┤
│ RAY CLUSTER │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Ray Serve Applications (GPU inference) │ │
│ │ ├─ /llm → VLLMDeployment (khelben, 0.95 GPU) │ │
│ │ ├─ /whisper → WhisperDeployment (elminster, 0.5 GPU) │ │
│ │ ├─ /tts → TTSDeployment (elminster, 0.5 GPU) │ │
│ │ ├─ /embeddings → EmbeddingsDeployment (drizzt, 0.8 GPU) │ │
│ │ └─ /reranker → RerankerDeployment (danilo, 0.8 GPU) │ │
│ ├────────────────────────────────────────────────────────────────┤ │
│ │ Ray Serve Applications (CPU orchestration) ← WHERE HANDLERS GO │ │
│ │ ├─ /chat → ChatHandler (head node, 0 GPU) │ │
│ │ └─ /voice → VoiceHandler (head node, 0 GPU) │ │
│ └────────────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────┤
│ RayJob (batch/training) │ NATS (events) │ Milvus (vectors) │
└─────────────────────────────────────────────────────────────────────┘
```
The key insight is that **handlers ARE Ray Serve applications** - they just don't need GPUs.
They should run inside the Ray cluster to:
1. Use Ray's internal calling (faster than HTTP)
2. Share observability (Ray Dashboard)
3. Leverage Ray's scheduling for resource management
## Decision
**Deploy handlers as Ray Serve applications inside the Ray cluster**, using `runtime_env`
to install Python packages from Gitea's package registry at deployment time.
### Why Ray Serve (not standalone containers)?
1. **Unified Platform**: Everything runs in Ray - inference AND orchestration
2. **Internal Calls**: Handlers can call inference deployments via Ray handles (no HTTP)
3. **Resource Sharing**: Ray head node has spare CPU/memory for orchestration
4. **Single Observability**: Ray Dashboard shows all applications
5. **Simpler Ops**: One RayService to manage, not multiple Deployments
### Why runtime_env with pip (not baked into images)?
1. **Faster Iteration**: Change handler code → push to PyPI → redeploy RayService
2. **Decoupled Releases**: Handlers update independently of worker images
3. **Smaller Images**: Worker images only need inference dependencies
4. **MLflow Integration**: Can version handlers as MLflow models if needed
## Implementation Plan
### Phase 1: Publish Packages to Gitea PyPI
Each handler repo publishes to Gitea's built-in package registry on release:
```yaml
# .gitea/workflows/ci.yml
name: CI
on:
push:
branches: [main]
tags: ['v*']
pull_request:
branches: [main]
jobs:
lint:
# ... existing lint job
test:
# ... existing test job
publish:
runs-on: ubuntu-latest
needs: [lint, test]
if: startsWith(github.ref, 'refs/tags/v')
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Build package
run: uv build
- name: Publish to Gitea PyPI
env:
UV_PUBLISH_URL: https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi
UV_PUBLISH_TOKEN: ${{ secrets.GITEA_TOKEN }}
run: uv publish
```
### Phase 2: Update RayService with Handler Applications
Add handler applications to the existing RayService:
```yaml
# rayservice.yaml additions
spec:
serveConfigV2: |
applications:
# ... existing GPU inference applications ...
# ============================================
# HANDLERS (CPU - runs on head node)
# ============================================
# Chat Handler - RAG + LLM pipeline
- name: chat-handler
route_prefix: /chat
import_path: chat_handler:app
runtime_env:
pip:
- handler-base>=0.1.0
- chat-handler>=0.1.0
pip_find_links:
- https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple/
env_vars:
NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
MILVUS_HOST: "milvus.ai-ml.svc.cluster.local"
OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring.svc.cluster.local:4317"
deployments:
- name: ChatDeployment
num_replicas: 2
ray_actor_options:
num_cpus: 0.5
num_gpus: 0 # No GPU needed
max_ongoing_requests: 50
# Voice Assistant - STT → RAG → LLM → TTS pipeline
- name: voice-assistant
route_prefix: /voice
import_path: voice_assistant:app
runtime_env:
pip:
- handler-base>=0.1.0
- voice-assistant>=0.1.0
pip_find_links:
- https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple/
env_vars:
NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
MILVUS_HOST: "milvus.ai-ml.svc.cluster.local"
deployments:
- name: VoiceDeployment
num_replicas: 2
ray_actor_options:
num_cpus: 1
num_gpus: 0
max_ongoing_requests: 20
```
### Phase 3: Refactor Handlers for Ray Serve
Convert handlers from standalone NATS subscribers to Ray Serve deployments that can
also optionally subscribe to NATS:
```python
# chat_handler.py (refactored)
from ray import serve
from handler_base import Settings
from handler_base.clients import EmbeddingsClient, LLMClient, RerankerClient, MilvusClient
@serve.deployment(
name="ChatDeployment",
num_replicas=2,
ray_actor_options={"num_cpus": 0.5, "num_gpus": 0}
)
class ChatHandler:
def __init__(self):
self.settings = Settings()
# Initialize clients - these can use Ray handles for internal calls
self.embeddings = EmbeddingsClient()
self.llm = LLMClient()
self.reranker = RerankerClient()
self.milvus = MilvusClient()
async def __call__(self, request) -> dict:
"""Handle HTTP requests (from Gradio, etc.)"""
data = await request.json()
return await self.process_chat(data)
async def process_chat(self, data: dict) -> dict:
"""Core chat logic - called by HTTP or NATS"""
query = data["query"]
# 1. Generate embeddings
embedding = await self.embeddings.embed(query)
# 2. Vector search
results = await self.milvus.search(embedding, top_k=10)
# 3. Rerank
reranked = await self.reranker.rerank(query, results)
# 4. Generate response
response = await self.llm.generate(query, context=reranked[:5])
return {
"response": response,
"sources": reranked[:5]
}
# Ray Serve app binding
app = ChatHandler.bind()
```
### Phase 4: Use Ray Handles for Internal Calls (Optional Optimization)
Update handler-base clients to use Ray handles when running inside Ray:
```python
# handler_base/clients/embeddings.py
import ray
from ray import serve
class EmbeddingsClient:
def __init__(self, url: str = None):
self.url = url
self._handle = None
# If running inside Ray, get handle to embeddings deployment
if ray.is_initialized():
try:
self._handle = serve.get_deployment_handle(
"EmbeddingsDeployment",
app_name="embeddings"
)
except Exception:
pass # Fall back to HTTP
async def embed(self, text: str) -> list[float]:
if self._handle:
# Fast internal Ray call
return await self._handle.embed.remote(text)
else:
# HTTP fallback for external callers
async with httpx.AsyncClient() as client:
resp = await client.post(f"{self.url}/v1/embeddings", json={"input": text})
return resp.json()["data"][0]["embedding"]
```
### Phase 5: NATS Bridge (Optional)
If you still want NATS integration, add a separate NATS bridge that forwards to Ray Serve:
```python
# pipeline_bridge.py - runs as Ray actor, subscribes to NATS
import ray
from ray import serve
import nats
@ray.remote
class NATSBridge:
def __init__(self):
self.nc = None
self.chat_handle = serve.get_deployment_handle("ChatDeployment", "chat-handler")
self.voice_handle = serve.get_deployment_handle("VoiceDeployment", "voice-assistant")
async def start(self):
self.nc = await nats.connect("nats://nats.ai-ml.svc.cluster.local:4222")
await self.nc.subscribe("ai.chat.request", cb=self.handle_chat)
await self.nc.subscribe("voice.request", cb=self.handle_voice)
async def handle_chat(self, msg):
result = await self.chat_handle.process_chat.remote(msg.data)
if msg.reply:
await self.nc.publish(msg.reply, result)
```
## CI/CD Flow
```
┌────────────────────────────────────────────────────────────────────┐
│ Developer pushes to handler repo │
├────────────────────────────────────────────────────────────────────┤
│ 1. Gitea Actions: lint → test │
│ 2. On tag: build wheel → publish to Gitea PyPI │
├────────────────────────────────────────────────────────────────────┤
│ 3. Update RayService version in homelab-k8s2 │
│ (bump handler-base>=0.2.0 in runtime_env) │
├────────────────────────────────────────────────────────────────────┤
│ 4. Flux detects change → applies RayService │
│ 5. Ray downloads new packages → restarts deployments │
└────────────────────────────────────────────────────────────────────┘
```
## Alternatives Considered
### Standalone Container Deployments
Run handlers as separate Kubernetes Deployments outside Ray.
**Rejected because:**
- Duplicates infrastructure (separate scaling, health checks, etc.)
- HTTP overhead for every inference call
- Separate observability stack
- Against the "Ray as unified compute" philosophy
### Bake Handlers into Worker Images
Pre-install handler code in ray-worker images.
**Rejected because:**
- Couples handler releases to image rebuilds
- Slower iteration cycle
- Larger images
## Consequences
### Positive
- Single platform: Everything runs in Ray
- Fast internal calls via Ray handles
- Unified observability in Ray Dashboard
- Clean abstraction layers: Kubeflow → KServe → Ray → GPU
- Handlers scale with Ray's autoscaler
### Negative
- Handlers share Ray head node resources
- Need to manage Gitea PyPI authentication for runtime_env
- Slightly more complex RayService configuration
### Neutral
- MLflow can track handler "models" if we want versioned deployments
- Kubeflow can trigger handler updates via pipelines
## References
- [ray-kserve-integration.md](../../homelab-k8s2/docs/ray-kserve-integration.md)
- [Ray Serve runtime_env docs](https://docs.ray.io/en/latest/serve/production-guide/config.html)
- [Gitea Package Registry](https://docs.gitea.io/en-us/packages/pypi/)
- [ADR-0012: Ray Cluster Architecture](ADR-0012-ray-cluster-unified.md)

View File

@@ -0,0 +1,133 @@
# ADR-0020: Internal Registry URLs for CI/CD
## Status
Accepted
## Date
2026-02-02
## Context
| Factor | Details |
|--------|---------|
| Problem | Cloudflare proxying limits uploads to 100MB per request |
| Impact | Docker images (20GB+) and large packages fail to push |
| Current Setup | Gitea at `git.daviestechlabs.io` behind Cloudflare |
| Internal Access | `registry.lab.daviestechlabs.io` bypasses Cloudflare |
Our Gitea instance is accessible via two URLs:
- **External**: `git.daviestechlabs.io` - proxied through Cloudflare (DDoS protection, caching)
- **Internal**: `registry.lab.daviestechlabs.io` - direct access from cluster network
Cloudflare's free tier enforces a 100MB upload limit per request. This blocks:
- Docker image pushes (multi-GB layers)
- Large Python package uploads
- Any artifact exceeding 100MB
## Decision
**Use internal registry URLs for all CI/CD artifact uploads.**
CI/CD workflows running on Gitea Actions runners (which are inside the cluster) should use `registry.lab.daviestechlabs.io` for:
- Docker image pushes
- PyPI package uploads
- Any large artifact uploads
External URLs remain for:
- Git operations (clone, push, pull)
- Package downloads (pip install, docker pull)
- Human access via browser
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ INTERNET │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ Cloudflare │
│ (100MB limit) │
└────────┬────────┘
┌──────────────────────────────┐
│ git.daviestechlabs.io │
│ (external access) │
└──────────────────────────────┘
│ same Gitea instance
┌──────────────────────────────┐
│ registry.lab.daviestechlabs │
│ (internal, no limits) │
└──────────────────────────────┘
│ direct upload
┌──────────────────────────────┐
│ Gitea Actions Runner │
│ (in-cluster) │
└──────────────────────────────┘
```
## Consequences
### Positive
- **No upload size limits** for CI/CD artifacts
- **Faster uploads** (no Cloudflare proxy overhead)
- **Lower latency** for in-cluster operations
- **Cost savings** (reduced Cloudflare bandwidth)
### Negative
- **Two URLs to maintain** in workflow configurations
- **Runners must be in-cluster** (cannot use external runners for uploads)
- **DNS split-horizon** required if accessing from outside
### Neutral
- External users can still pull packages/images via Cloudflare URL
- Git operations continue through external URL (small payloads)
## Implementation
### Docker Registry Login
```yaml
- name: Login to Internal Registry
uses: docker/login-action@v3
with:
registry: registry.lab.daviestechlabs.io
username: ${{ secrets.REGISTRY_USER }}
password: ${{ secrets.REGISTRY_TOKEN }}
```
### PyPI Upload
```yaml
- name: Publish to Gitea PyPI
run: |
twine upload \
--repository-url https://registry.lab.daviestechlabs.io/api/packages/daviestechlabs/pypi \
dist/*
```
### Environment Variable Pattern
For consistency across workflows:
```yaml
env:
REGISTRY_EXTERNAL: git.daviestechlabs.io
REGISTRY_INTERNAL: registry.lab.daviestechlabs.io
```
## Related
- [ADR-0019: Handler Deployment Strategy](ADR-0019-handler-deployment-strategy.md) - Uses PyPI publishing
- Cloudflare upload limits: https://developers.cloudflare.com/workers/platform/limits/

View File

@@ -0,0 +1,131 @@
# ADR-0021: Notification Architecture
## Status
Accepted
## Context
The homelab infrastructure generates notifications from multiple sources:
1. **CI/CD pipelines** (Gitea Actions) - build success/failure
2. **Alertmanager** - Prometheus alerts for critical/warning conditions
3. **Gatus** - Service health monitoring
4. **Flux** - GitOps reconciliation events
5. **Service readiness** - Notifications when deployments complete successfully
Currently, ntfy serves as the primary notification hub, but there are several issues:
- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
- **No service readiness notifications**: No visibility when services come online after deployment
## Decision
### 1. ntfy as the Notification Hub
ntfy will serve as the central notification aggregation point. All internal services publish to ntfy topics via the internal Kubernetes service URL:
```
http://ntfy-svc.observability.svc.cluster.local/<topic>
```
This keeps ntfy auth-protected externally while allowing internal services to publish freely.
### 2. Standardized Topics
| Topic | Source | Description |
|-------|--------|-------------|
| `gitea-ci` | Gitea Actions | CI/CD build notifications |
| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
| `gatus` | Gatus | Service health status changes |
| `flux` | Flux | GitOps reconciliation events |
| `deployments` | Flux/Argo | Service deployment completions |
### 3. Alertmanager Integration
Alertmanager is configured to forward alerts to ntfy using the built-in `tpl=alertmanager` template:
```yaml
receivers:
- name: ntfy-critical
webhookConfigs:
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=urgent&tags=rotating_light"
sendResolved: true
- name: ntfy-warning
webhookConfigs:
- url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=high&tags=warning"
sendResolved: true
```
Routes direct alerts based on severity:
- `severity=critical``ntfy-critical` receiver
- `severity=warning``ntfy-warning` receiver
### 4. Service Readiness Notifications
To provide visibility when services are fully operational after deployment:
**Option A: Flux Notification Controller**
Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed:
```yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
name: ntfy-deployments
spec:
type: generic-hmac # or generic
address: http://ntfy-svc.observability.svc.cluster.local/deployments
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: deployment-success
spec:
providerRef:
name: ntfy-deployments
eventSeverity: info
eventSources:
- kind: Kustomization
name: '*'
- kind: HelmRelease
name: '*'
inclusionList:
- ".*succeeded.*"
```
**Option B: Argo Workflows Post-Deploy Hook**
For Argo-managed deployments, add a notification step at workflow completion.
**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration.
## Consequences
### Positive
- **Single source of truth**: All notifications flow through ntfy
- **Auth protection maintained**: External ntfy access requires Authentik auth
- **Deployment visibility**: Know when services are ready without watching logs
- **Consistent topic naming**: All sources follow documented conventions
### Negative
- **Configuration overhead**: Each notification source requires explicit configuration
### Neutral
- Topic naming must be documented and followed consistently
- Future Discord integration addressed in ADR-0022
## Implementation Checklist
- [x] Standardize CI notifications to `gitea-ci` topic
- [x] Configure Alertmanager → ntfy for critical/warning alerts
- [ ] Configure Flux notification-controller for deployment notifications
- [ ] Add `deployments` topic subscription to ntfy app
## Related
- ADR-0015: CI Notifications and Semantic Versioning
- ADR-0022: ntfy-Discord Bridge Service

View File

@@ -0,0 +1,302 @@
# ADR-0022: ntfy-Discord Bridge Service
## Status
Accepted
## Context
Per ADR-0021, ntfy serves as the central notification hub for the homelab. However, Discord is used for team collaboration and visibility, requiring notifications to be forwarded there as well.
ntfy does not natively support Discord webhook format. Discord expects a specific JSON structure with embeds, while ntfy uses its own message format. A bridge service is needed to:
1. Subscribe to ntfy topics
2. Transform messages to Discord embed format
3. Forward to Discord webhooks
## Decision
### Architecture
A dedicated Go microservice (`ntfy-discord`) will bridge ntfy to Discord:
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ CI/Alertmanager │────▶│ ntfy │────▶│ ntfy App │
│ Gatus/Flux │ │ (notification │ │ (mobile) │
└─────────────────┘ │ hub) │ └─────────────┘
└────────┬─────────┘
│ SSE/JSON stream
┌──────────────────┐ ┌─────────────┐
│ ntfy-discord │────▶│ Discord │
│ (Go) │ │ Webhook │
└──────────────────┘ └─────────────┘
```
### Service Design
**Repository**: `ntfy-discord`
**Technology Stack**:
- Go 1.22+
- `fsnotify` for hot reload of secrets/config
- Standard library `net/http` for SSE subscription
- `slog` for structured logging
- Scratch/distroless base image (~10MB final image)
**Why Go over Python**:
- **Smaller images**: ~10MB vs ~150MB+ for Python
- **Cloud native**: Single static binary, no runtime dependencies
- **Memory efficient**: Lower RSS, ideal for always-on bridge
- **Concurrency**: Goroutines for SSE handling and webhook delivery
- **Compile-time safety**: Catch errors before deployment
**Core Features**:
1. **SSE Subscription**: Connect to ntfy's JSON stream endpoint for real-time messages
2. **Automatic Reconnection**: Exponential backoff on connection failures
3. **Message Transformation**: Convert ntfy format to Discord embed format
4. **Priority Mapping**: Map ntfy priorities to Discord embed colors
5. **Topic Routing**: Configure which topics go to which Discord channels/webhooks
6. **Hot Reload**: Watch mounted secrets/configmaps with fsnotify, reload without restart
7. **Health Endpoint**: `/health` and `/ready` for Kubernetes probes
8. **Metrics**: Prometheus metrics at `/metrics`
### Hot Reload Implementation
Kubernetes mounts secrets as symlinked files that update atomically. The bridge uses `fsnotify` to watch for changes:
```go
// Watch for secret changes and reload config
func (b *Bridge) watchSecrets(ctx context.Context, secretPath string) {
watcher, _ := fsnotify.NewWatcher()
defer watcher.Close()
watcher.Add(secretPath)
for {
select {
case event := <-watcher.Events:
if event.Has(fsnotify.Write) || event.Has(fsnotify.Create) {
slog.Info("secret changed, reloading config")
b.reloadConfig(secretPath)
}
case <-ctx.Done():
return
}
}
}
```
This allows ExternalSecrets to rotate the Discord webhook URL without pod restarts.
### Configuration
Configuration via environment variables and mounted secrets:
```yaml
# Environment variables (ConfigMap)
NTFY_URL: "http://ntfy.observability.svc.cluster.local"
NTFY_TOPICS: "gitea-ci,alertmanager-alerts,flux-deployments,gatus"
LOG_LEVEL: "info"
METRICS_ENABLED: "true"
# Mounted secret (hot-reloadable)
/secrets/discord-webhook-url # Single webhook for all topics
# OR for topic routing:
/secrets/topic-webhooks.yaml # YAML mapping topics to webhooks
```
Topic routing file (optional):
```yaml
gitea-ci: "https://discord.com/api/webhooks/xxx/ci"
alertmanager-alerts: "https://discord.com/api/webhooks/xxx/alerts"
flux-deployments: "https://discord.com/api/webhooks/xxx/deploys"
default: "https://discord.com/api/webhooks/xxx/general"
```
### Message Transformation
ntfy message:
```json
{
"id": "abc123",
"topic": "gitea-ci",
"title": "Build succeeded",
"message": "ray-serve-apps published to PyPI",
"priority": 3,
"tags": ["package", "white_check_mark"],
"time": 1770050091
}
```
Discord embed:
```json
{
"embeds": [{
"title": "✅ Build succeeded",
"description": "ray-serve-apps published to PyPI",
"color": 3066993,
"fields": [
{"name": "Topic", "value": "gitea-ci", "inline": true}
],
"timestamp": "2026-02-02T11:34:51Z",
"footer": {"text": "ntfy"}
}]
}
```
**Priority → Color Mapping**:
| Priority | Name | Discord Color |
|----------|------|---------------|
| 5 | Max/Urgent | 🔴 Red (15158332) |
| 4 | High | 🟠 Orange (15105570) |
| 3 | Default | 🔵 Blue (3066993) |
| 2 | Low | ⚪ Gray (9807270) |
| 1 | Min | ⚪ Light Gray (12370112) |
**Tag → Emoji Mapping**:
Common ntfy tags are converted to Discord-friendly emojis in the title:
- `white_check_mark` / `heavy_check_mark` → ✅
- `x` / `skull` → ❌
- `warning` → ⚠️
- `rotating_light` → 🚨
- `rocket` → 🚀
- `package` → 📦
### Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ntfy-discord
namespace: observability
spec:
replicas: 1
selector:
matchLabels:
app: ntfy-discord
template:
metadata:
labels:
app: ntfy-discord
spec:
containers:
- name: bridge
image: gitea-http.gitea.svc.cluster.local:3000/daviestechlabs/ntfy-discord:latest
env:
- name: NTFY_URL
value: "http://ntfy.observability.svc.cluster.local"
- name: NTFY_TOPICS
value: "gitea-ci,alertmanager-alerts,flux-deployments"
- name: SECRETS_PATH
value: "/secrets"
ports:
- containerPort: 8080
name: http
volumeMounts:
- name: discord-secrets
mountPath: /secrets
readOnly: true
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: http
periodSeconds: 10
resources:
limits:
cpu: 50m
memory: 32Mi
requests:
cpu: 5m
memory: 16Mi
volumes:
- name: discord-secrets
secret:
secretName: discord-webhook-secret
```
### Secret Management
Discord webhook URL stored in Vault at `kv/data/discord`:
```yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: discord-webhook-secret
namespace: observability
spec:
refreshInterval: 1h
secretStoreRef:
name: vault
kind: ClusterSecretStore
target:
name: discord-webhook-secret
data:
- secretKey: webhook-url
remoteRef:
key: kv/data/discord
property: webhook_url
```
When ExternalSecrets refreshes and updates the secret, the bridge detects the file change and reloads without restart.
### Error Handling
1. **Connection Loss**: Exponential backoff (1s, 2s, 4s, ... max 60s)
2. **Discord Rate Limits**: Respect `Retry-After` header, queue messages
3. **Invalid Messages**: Log and skip, don't crash
4. **Webhook Errors**: Log error, continue processing other messages
5. **Config Reload Errors**: Log error, keep using previous config
## Consequences
### Positive
- **Tiny footprint**: ~10MB image, 16MB memory
- **Hot reload**: Secrets update without pod restart
- **Robust**: Proper reconnection and error handling
- **Observable**: Structured logging, Prometheus metrics, health endpoints
- **Fast startup**: <100ms cold start
- **Cloud native**: Static binary, distroless image
### Negative
- **Go learning curve**: Different patterns than Python services
- **Operational Overhead**: Another service to maintain
- **Latency**: Adds ~50-100ms to notification delivery
### Neutral
- Webhook URL must be maintained in Vault
- Service logs should be monitored for errors
## Implementation Checklist
- [x] Create `ntfy-discord` repository
- [ ] Implement core bridge logic
- [ ] Add SSE client with reconnection
- [ ] Implement message transformation
- [ ] Add fsnotify hot reload for secrets
- [ ] Add health/ready/metrics endpoints
- [ ] Write unit tests
- [ ] Create multi-stage Dockerfile (scratch base)
- [ ] Set up CI/CD pipeline (Gitea Actions)
- [ ] Add ExternalSecret for Discord webhook
- [ ] Create Kubernetes manifests
- [ ] Deploy to observability namespace
- [ ] Verify notifications flowing to Discord
## Related
- ADR-0021: Notification Architecture
- ADR-0015: CI Notifications and Semantic Versioning

View File

@@ -0,0 +1,108 @@
# ADR-0023: Valkey for ML Inference Caching
## Status
Accepted
## Context
The AI/ML platform requires caching infrastructure for multiple use cases:
1. **KV-Cache Offloading**: vLLM can offload key-value cache tensors to external storage, reducing GPU memory pressure and enabling longer context windows
2. **Embedding Cache**: Frequently requested embeddings can be cached to avoid redundant GPU computation
3. **Session State**: Conversation history and intermediate results for multi-turn interactions
4. **Ray Object Store Spillover**: Large Ray objects can spill to external storage when memory is constrained
Previously, two separate Valkey instances existed:
- `valkey` - General-purpose with 10Gi persistent storage
- `mlcache` - ML-optimized ephemeral cache with 4GB memory limit and LRU eviction
Analysis revealed that `mlcache` had **zero consumers** in the codebase - no services were actually connecting to it.
## Decision
### Consolidate to Single Valkey Instance
Remove `mlcache` and use the existing `valkey` instance for all caching needs. When vLLM KV-cache offloading is implemented in the RayService deployment, configure it to use the existing Valkey instance.
### Valkey Configuration
The current `valkey` instance at `valkey.ai-ml.svc.cluster.local:6379`:
| Setting | Value | Rationale |
|---------|-------|-----------|
| Persistence | 10Gi Longhorn PVC | Survive restarts, cache warm-up |
| Memory | 512Mi request, 2Gi limit | Sufficient for current workloads |
| Auth | Disabled | Internal cluster-only access |
| Metrics | Prometheus ServiceMonitor | Observability |
### Future: vLLM KV-Cache Integration
When implementing LMCache or similar KV-cache offloading for vLLM:
```python
# In ray_serve/serve_llm.py
from vllm import AsyncLLMEngine
engine = AsyncLLMEngine.from_engine_args(
engine_args,
kv_cache_config={
"type": "redis",
"url": "redis://valkey.ai-ml.svc.cluster.local:6379",
"prefix": "vllm:kv:",
"ttl": 3600, # 1 hour cache lifetime
}
)
```
If memory pressure becomes an issue, scale Valkey resources:
```yaml
resources:
limits:
memory: "8Gi" # Increase for larger KV-cache
extraArgs:
- --maxmemory
- 6gb
- --maxmemory-policy
- allkeys-lru
```
### Key Prefixes Convention
To avoid collisions when multiple services share Valkey:
| Service | Prefix | Example Key |
|---------|--------|-------------|
| vLLM KV-Cache | `vllm:kv:` | `vllm:kv:layer0:tok123` |
| Embeddings Cache | `emb:` | `emb:sha256:abc123` |
| Ray State | `ray:` | `ray:actor:xyz` |
| Session State | `session:` | `session:user:123` |
## Consequences
### Positive
- **Reduced complexity**: One cache instance instead of two
- **Resource efficiency**: No unused mlcache consuming 4GB memory allocation
- **Operational simplicity**: Single point of monitoring and maintenance
- **Cost savings**: One less PVC, pod, and service to manage
### Negative
- **Shared resource contention**: All workloads share the same cache
- **Single point of failure**: Cache unavailability affects all consumers
### Mitigations
- **Namespace isolation via prefixes**: Prevents key collisions
- **LRU eviction**: Automatic cleanup when memory is constrained
- **Persistent storage**: Cache survives pod restarts
- **Monitoring**: Prometheus metrics for memory usage alerts
## References
- [vLLM Distributed KV-Cache](https://docs.vllm.ai/en/latest/serving/distributed_serving.html)
- [LMCache Project](https://github.com/LMCache/LMCache)
- [Valkey Documentation](https://valkey.io/docs/)
- [Ray External Storage](https://docs.ray.io/en/latest/ray-core/objects/object-spilling.html)

View File

@@ -0,0 +1,184 @@
# ADR-0024: Ray Repository Structure
## Status
Accepted
## Date
2026-02-03
## Context
| Factor | Details |
|--------|---------|
| Problem | Need to document the Ray-specific repository structure |
| Impact | Clarity on where Ray components live post-migration |
| Current State | kuberay-images standalone, ray-serve needs extraction |
| Goal | Clean separation with independent release cycles |
### Historical Context
`llm-workflows` was the original monolithic repository containing all ML/AI infrastructure code. It has been **archived** after being fully decomposed into focused, independent repositories:
| Repository | Purpose |
|------------|---------|
| `ai-apps` | Gradio applications (STT, TTS, embeddings UIs) |
| `ai-pipelines` | Kubeflow pipeline definitions |
| `ai-services` | Core ML service implementations |
| `chat-handler` | Chat orchestration and routing |
| `handler-base` | Base handler framework |
| `pipeline-bridge` | Bridge between pipelines and services |
| `stt-module` | Speech-to-text service |
| `tts-module` | Text-to-speech service |
| `voice-assistant` | Voice assistant integration |
| `gradio-ui` | Shared Gradio UI components |
| `kuberay-images` | GPU-specific Ray worker base images |
| `ntfy-discord` | Notification bridge |
| `spark-analytics-jobs` | Spark batch analytics |
| `flink-analytics-jobs` | Flink streaming analytics |
### Remaining Ray Component
The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services.
| Component | Current Location | Purpose |
|-----------|------------------|---------|
| kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
| ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services |
| llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings |
### Problems with Current Structure
1. **Tight Coupling**: ray-serve changes require llm-workflows repo access
2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps
3. **Version Management**: Can't independently version ray-serve deployments
4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo
5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds
## Decision
**Establish two dedicated Ray repositories with distinct purposes:**
| Repository | Type | Contents | Release Cycle |
|------------|------|----------|---------------|
| `kuberay-images` | Docker images | Ray worker base images (GPU-specific) | On dependency updates |
| `ray-serve` | PyPI package | Ray Serve application code | Per model/feature update |
### Key Design: Dynamic Code Loading
Ray Serve applications are deployed as **PyPI packages**, not baked into Docker images. This enables:
- **Dynamic Decoupling**: Update model serving logic without rebuilding containers
- **Runtime Flexibility**: Ray cluster pulls code via `pip install` at runtime
- **Faster Iteration**: Code changes don't require image rebuilds or pod restarts
- **Version Pinning**: Kubernetes manifests specify package versions independently
### Repository Structure
```
kuberay-images/ # Docker images - GPU runtime environments
├── Dockerfile.ray-worker-nvidia
├── Dockerfile.ray-worker-rdna2
├── Dockerfile.ray-worker-strixhalo
├── Dockerfile.ray-worker-intel
├── Makefile
└── .gitea/workflows/
└── build-push.yaml # Builds & pushes to container registry
ray-serve/ # PyPI package - application code
├── src/
│ └── ray_serve/
│ ├── __init__.py
│ ├── model_configs.py
│ └── serve_apps.py
├── pyproject.toml
├── README.md
└── .gitea/workflows/
└── publish-ray-serve.yaml # Publishes to PyPI registry
```
**Note**: Kubernetes deployment manifests live in `homelab-k8s2`, not in either Ray repo. This maintains separation between:
- **Infrastructure** (kuberay-images) - How to run Ray workers
- **Application** (ray-serve) - What code to run
- **Orchestration** (homelab-k8s2) - Where and when to deploy
## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ RAY INFRASTRUCTURE │
└─────────────────────────────────────────────────────────────────────┘
┌───────────────────┴───────────────────┐
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ kuberay-images│ │ ray-serve │
│ │ │ │
│ Base worker │ │ PyPI package │
│ Docker images │ │ Ray Serve │
│ │ │ application │
│ NVIDIA/AMD/ │ │ │
│ Intel GPUs │ │ Model configs │
└───────────────┘ └───────────────┘
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Container │ │ PyPI │
│ Registry │ │ Registry │
│ registry.lab/ │ │ registry.lab/ │
│ kuberay/* │ │ pypi/ray-serve│
└───────────────┘ └───────────────┘
│ │
└───────────────────┬───────────────────┘
┌───────────────────┐
│ Ray Cluster │
│ │
│ 1. Pull container │
│ 2. pip install │
│ ray-serve │
│ 3. Run serve app │
└───────────────────┘
```
## Consequences
### Positive
- **Dynamic Updates**: Deploy new model serving code without rebuilding images
- **Independent Releases**: Containers and application code versioned separately
- **Faster Iteration**: PyPI publish is seconds vs minutes for Docker builds
- **Clear Separation**: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
- **Runtime Flexibility**: Same container can run different ray-serve versions
### Negative
- **Runtime Dependencies**: Pod startup requires `pip install` (cached in practice)
- **Version Coordination**: Must track compatible versions between kuberay-images and ray-serve
### Migration Steps
1.`kuberay-images` already exists as standalone repo
2.`llm-workflows` archived - all components extracted to dedicated repos
3. [ ] Create `ray-serve` repo on Gitea
4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo
5. [ ] Set up pyproject.toml for PyPI publishing
6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z`
7. [ ] Verify Ray cluster pulls package correctly at runtime
## Version Compatibility Matrix
| kuberay-images | ray-serve | Notes |
|----------------|-----------|-------|
| 1.0.0 | 1.0.0 | Initial structure |
## References
- [ADR-0020: Internal Registry for CI/CD](./ADR-0020-internal-registry-for-cicd.md)
- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)
- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)