chore: Consolidate ADRs into decisions/ directory

- Added ADR-0016: Affine email verification strategy - Moved ADRs 0019-0024 from docs/adr/ to decisions/ - Renamed to consistent format (removed ADR- prefix)
2026-02-04 08:28:12 -05:00
parent 85b1a9019b
commit 8f4df84657
7 changed files with 150 additions and 0 deletions
--- a/decisions/0016-affine-email-verification-strategy.md
+++ b/decisions/0016-affine-email-verification-strategy.md
@@ -0,0 +1,150 @@
+# Affine Email Verification Strategy for Authentik OIDC
+
+* Status: proposed
+* Date: 2026-02-04
+* Deciders: Billy
+* Technical Story: Affine requires email verification for users, but Authentik is not configured with SMTP for email delivery
+
+## Context and Problem Statement
+
+Affine (self-hosted note-taking/collaboration tool) requires users to have verified email addresses. When users authenticate via Authentik OIDC, Affine checks the `email_verified` claim. Currently, Authentik has no SMTP configuration, so it cannot send verification emails, causing new users to be blocked or have limited functionality in Affine.
+
+How can we satisfy Affine's email verification requirement without adding significant infrastructure complexity to the homelab?
+
+## Decision Drivers
+
+* Minimize external dependencies and ongoing costs
+* Keep the solution self-contained within the homelab
+* Avoid breaking changes on Affine upgrades
+* Maintain security - don't completely bypass verification for untrusted users
+* Simple to implement and maintain
+
+## Considered Options
+
+1. **Override `email_verified` claim in Authentik** - Configure Authentik to always return `email_verified: true` for trusted users
+2. **Deploy local SMTP server (Mailpit)** - Run a lightweight mail capture server in-cluster
+3. **Configure Affine to skip verification for OIDC users** - Use Affine's configuration to trust OIDC-provided emails
+
+## Decision Outcome
+
+Chosen option: **Option 1 (Override `email_verified` claim)** as the primary solution, with Option 3 as a fallback if Affine supports it.
+
+This approach requires zero additional infrastructure, works immediately, and is appropriate for a homelab where all users are trusted (family/personal use). Option 2 (Mailpit) is documented for future reference if actual email delivery becomes needed for other applications.
+
+### Positive Consequences
+
+* No additional services to deploy or maintain
+* Works immediately with existing Authentik setup
+* No external dependencies or costs
+* Can be easily reverted if requirements change
+
+### Negative Consequences
+
+* Bypasses "real" email verification - relies on trust
+* If Affine is ever exposed to untrusted users, this would need revisiting
+* Other applications expecting real email verification would need similar workarounds
+
+## Pros and Cons of the Options
+
+### Option 1: Override `email_verified` Claim in Authentik
+
+Configure an Authentik property mapping to always return `email_verified: true` in the OIDC token for the Affine application.
+
+**Implementation:**
+1. In Authentik Admin → Customization → Property Mappings
+2. Create a new "Scope Mapping" for email_verified
+3. Set expression: `return True`
+4. Assign to Affine OIDC provider
+
+* Good, because zero infrastructure required
+* Good, because immediate solution
+* Good, because appropriate for trusted homelab users
+* Bad, because not "real" verification
+* Bad, because per-application configuration needed
+
+### Option 2: Deploy Local SMTP Server (Mailpit)
+
+Deploy Mailpit (or MailHog) as a lightweight SMTP server in the cluster that captures all emails for viewing via web UI.
+
+**Implementation:**
+```yaml
+# Example Mailpit deployment
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: mailpit
+  namespace: productivity
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: mailpit
+  template:
+    spec:
+      containers:
+        - name: mailpit
+          image: axllent/mailpit:latest
+          ports:
+            - containerPort: 1025  # SMTP
+            - containerPort: 8025  # Web UI
+```
+
+Then configure Authentik SMTP settings:
+- Host: `mailpit.productivity.svc.cluster.local`
+- Port: `1025`
+- TLS: disabled (internal traffic)
+
+* Good, because provides actual email flow for testing
+* Good, because useful for other apps needing email (password reset, notifications)
+* Good, because emails viewable via web UI
+* Bad, because emails don't actually leave the cluster
+* Bad, because another service to maintain
+* Bad, because requires Authentik reconfiguration
+
+### Option 3: Configure Affine to Skip Verification for OIDC Users
+
+If Affine supports it, configure the application to trust email addresses from OIDC providers without requiring separate verification.
+
+**Potential Configuration (needs verification):**
+```yaml
+# In affine-config ConfigMap
+AFFINE_AUTH_OIDC_EMAIL_VERIFIED: "true"
+# or similar environment variable
+```
+
+* Good, because no Authentik changes needed
+* Good, because scoped to Affine only
+* Bad, because may not be supported by Affine
+* Bad, because could break on Affine upgrades
+* Bad, because requires research into Affine's configuration options
+
+## Implementation Notes
+
+### For Option 1 (Recommended)
+
+1. Access Authentik admin at `https://auth.daviestechlabs.io/if/admin/`
+2. Navigate to Customization → Property Mappings
+3. Create new Scope Mapping:
+   - Name: `Affine Email Verified Override`
+   - Scope name: `email`
+   - Expression:
+     ```python
+     return {
+         "email": request.user.email,
+         "email_verified": True,
+     }
+     ```
+4. Edit the Affine OIDC Provider → Advanced Settings → Scope Mappings
+5. Replace default email mapping with the new override
+
+### Future Considerations
+
+If the homelab expands to include external users or applications requiring real email delivery:
+- Revisit Option 2 (Mailpit) for development/testing
+- Consider external SMTP service (SendGrid free tier, AWS SES) for production email
+
+## References
+
+* [Authentik Property Mappings Documentation](https://docs.goauthentik.io/docs/property-mappings)
+* [Affine Self-Hosting Documentation](https://docs.affine.pro/docs/self-host-affine)
+* [Mailpit GitHub](https://github.com/axllent/mailpit)
--- a/decisions/0019-handler-deployment-strategy.md
+++ b/decisions/0019-handler-deployment-strategy.md
@@ -0,0 +1,365 @@
+# ADR-0019: Python Module Deployment Strategy
+
+## Status
+
+Accepted
+
+## Date
+
+2026-02-02
+
+## Context
+
+We have Python modules for AI/ML workflows that need to run on our unified GPU cluster:
+
+| Repo | Purpose | Needs GPU? |
+|------|---------|------------|
+| `handler-base` | Shared library (NATS, clients, telemetry) | No |
+| `chat-handler` | Text chat → RAG → LLM pipeline | No (calls GPU endpoints) |
+| `voice-assistant` | Audio → STT → RAG → LLM → TTS pipeline | No (calls GPU endpoints) |
+| `pipeline-bridge` | Kubeflow ↔ NATS integration | No |
+| `kuberay-images/ray-serve/` | Inference deployments (Whisper, TTS, LLM, etc.) | **Yes** |
+
+### Current Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                        PLATFORM LAYERS                               │
+├─────────────────────────────────────────────────────────────────────┤
+│  Kubeflow Pipelines  │  KServe (visibility)  │  MLflow (registry)  │
+│  [Orchestration]     │  [InferenceServices]  │  [Models/Metrics]   │
+├─────────────────────────────────────────────────────────────────────┤
+│                         RAY CLUSTER                                  │
+│  ┌────────────────────────────────────────────────────────────────┐ │
+│  │ Ray Serve Applications (GPU inference)                         │ │
+│  │ ├─ /llm        → VLLMDeployment      (khelben, 0.95 GPU)      │ │
+│  │ ├─ /whisper    → WhisperDeployment   (elminster, 0.5 GPU)     │ │
+│  │ ├─ /tts        → TTSDeployment       (elminster, 0.5 GPU)     │ │
+│  │ ├─ /embeddings → EmbeddingsDeployment (drizzt, 0.8 GPU)       │ │
+│  │ └─ /reranker   → RerankerDeployment  (danilo, 0.8 GPU)        │ │
+│  ├────────────────────────────────────────────────────────────────┤ │
+│  │ Ray Serve Applications (CPU orchestration) ← WHERE HANDLERS GO │ │
+│  │ ├─ /chat       → ChatHandler         (head node, 0 GPU)       │ │
+│  │ └─ /voice      → VoiceHandler        (head node, 0 GPU)       │ │
+│  └────────────────────────────────────────────────────────────────┘ │
+├─────────────────────────────────────────────────────────────────────┤
+│  RayJob (batch/training)  │  NATS (events)  │  Milvus (vectors)    │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+The key insight is that **handlers ARE Ray Serve applications** - they just don't need GPUs.
+They should run inside the Ray cluster to:
+1. Use Ray's internal calling (faster than HTTP)
+2. Share observability (Ray Dashboard)
+3. Leverage Ray's scheduling for resource management
+
+## Decision
+
+**Deploy handlers as Ray Serve applications inside the Ray cluster**, using `runtime_env` 
+to install Python packages from Gitea's package registry at deployment time.
+
+### Why Ray Serve (not standalone containers)?
+
+1. **Unified Platform**: Everything runs in Ray - inference AND orchestration
+2. **Internal Calls**: Handlers can call inference deployments via Ray handles (no HTTP)
+3. **Resource Sharing**: Ray head node has spare CPU/memory for orchestration
+4. **Single Observability**: Ray Dashboard shows all applications
+5. **Simpler Ops**: One RayService to manage, not multiple Deployments
+
+### Why runtime_env with pip (not baked into images)?
+
+1. **Faster Iteration**: Change handler code → push to PyPI → redeploy RayService
+2. **Decoupled Releases**: Handlers update independently of worker images
+3. **Smaller Images**: Worker images only need inference dependencies
+4. **MLflow Integration**: Can version handlers as MLflow models if needed
+
+## Implementation Plan
+
+### Phase 1: Publish Packages to Gitea PyPI
+
+Each handler repo publishes to Gitea's built-in package registry on release:
+
+```yaml
+# .gitea/workflows/ci.yml
+name: CI
+
+on:
+  push:
+    branches: [main]
+    tags: ['v*']
+  pull_request:
+    branches: [main]
+
+jobs:
+  lint:
+    # ... existing lint job
+
+  test:
+    # ... existing test job
+
+  publish:
+    runs-on: ubuntu-latest
+    needs: [lint, test]
+    if: startsWith(github.ref, 'refs/tags/v')
+    steps:
+      - uses: actions/checkout@v4
+      
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+          
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        
+      - name: Build package
+        run: uv build
+        
+      - name: Publish to Gitea PyPI
+        env:
+          UV_PUBLISH_URL: https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi
+          UV_PUBLISH_TOKEN: ${{ secrets.GITEA_TOKEN }}
+        run: uv publish
+```
+
+### Phase 2: Update RayService with Handler Applications
+
+Add handler applications to the existing RayService:
+
+```yaml
+# rayservice.yaml additions
+spec:
+  serveConfigV2: |
+    applications:
+      # ... existing GPU inference applications ...
+
+      # ============================================
+      # HANDLERS (CPU - runs on head node)
+      # ============================================
+
+      # Chat Handler - RAG + LLM pipeline
+      - name: chat-handler
+        route_prefix: /chat
+        import_path: chat_handler:app
+        runtime_env:
+          pip:
+            - handler-base>=0.1.0
+            - chat-handler>=0.1.0
+          pip_find_links:
+            - https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple/
+          env_vars:
+            NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
+            MILVUS_HOST: "milvus.ai-ml.svc.cluster.local"
+            OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring.svc.cluster.local:4317"
+        deployments:
+          - name: ChatDeployment
+            num_replicas: 2
+            ray_actor_options:
+              num_cpus: 0.5
+              num_gpus: 0  # No GPU needed
+            max_ongoing_requests: 50
+
+      # Voice Assistant - STT → RAG → LLM → TTS pipeline  
+      - name: voice-assistant
+        route_prefix: /voice
+        import_path: voice_assistant:app
+        runtime_env:
+          pip:
+            - handler-base>=0.1.0
+            - voice-assistant>=0.1.0
+          pip_find_links:
+            - https://git.daviestechlabs.io/api/packages/daviestechlabs/pypi/simple/
+          env_vars:
+            NATS_URL: "nats://nats.ai-ml.svc.cluster.local:4222"
+            MILVUS_HOST: "milvus.ai-ml.svc.cluster.local"
+        deployments:
+          - name: VoiceDeployment
+            num_replicas: 2
+            ray_actor_options:
+              num_cpus: 1
+              num_gpus: 0
+            max_ongoing_requests: 20
+```
+
+### Phase 3: Refactor Handlers for Ray Serve
+
+Convert handlers from standalone NATS subscribers to Ray Serve deployments that can 
+also optionally subscribe to NATS:
+
+```python
+# chat_handler.py (refactored)
+from ray import serve
+from handler_base import Settings
+from handler_base.clients import EmbeddingsClient, LLMClient, RerankerClient, MilvusClient
+
+@serve.deployment(
+    name="ChatDeployment",
+    num_replicas=2,
+    ray_actor_options={"num_cpus": 0.5, "num_gpus": 0}
+)
+class ChatHandler:
+    def __init__(self):
+        self.settings = Settings()
+        
+        # Initialize clients - these can use Ray handles for internal calls
+        self.embeddings = EmbeddingsClient()
+        self.llm = LLMClient()
+        self.reranker = RerankerClient()
+        self.milvus = MilvusClient()
+
+    async def __call__(self, request) -> dict:
+        """Handle HTTP requests (from Gradio, etc.)"""
+        data = await request.json()
+        return await self.process_chat(data)
+
+    async def process_chat(self, data: dict) -> dict:
+        """Core chat logic - called by HTTP or NATS"""
+        query = data["query"]
+        
+        # 1. Generate embeddings
+        embedding = await self.embeddings.embed(query)
+        
+        # 2. Vector search
+        results = await self.milvus.search(embedding, top_k=10)
+        
+        # 3. Rerank
+        reranked = await self.reranker.rerank(query, results)
+        
+        # 4. Generate response
+        response = await self.llm.generate(query, context=reranked[:5])
+        
+        return {
+            "response": response,
+            "sources": reranked[:5]
+        }
+
+# Ray Serve app binding
+app = ChatHandler.bind()
+```
+
+### Phase 4: Use Ray Handles for Internal Calls (Optional Optimization)
+
+Update handler-base clients to use Ray handles when running inside Ray:
+
+```python
+# handler_base/clients/embeddings.py
+import ray
+from ray import serve
+
+class EmbeddingsClient:
+    def __init__(self, url: str = None):
+        self.url = url
+        self._handle = None
+        
+        # If running inside Ray, get handle to embeddings deployment
+        if ray.is_initialized():
+            try:
+                self._handle = serve.get_deployment_handle(
+                    "EmbeddingsDeployment", 
+                    app_name="embeddings"
+                )
+            except Exception:
+                pass  # Fall back to HTTP
+    
+    async def embed(self, text: str) -> list[float]:
+        if self._handle:
+            # Fast internal Ray call
+            return await self._handle.embed.remote(text)
+        else:
+            # HTTP fallback for external callers
+            async with httpx.AsyncClient() as client:
+                resp = await client.post(f"{self.url}/v1/embeddings", json={"input": text})
+                return resp.json()["data"][0]["embedding"]
+```
+
+### Phase 5: NATS Bridge (Optional)
+
+If you still want NATS integration, add a separate NATS bridge that forwards to Ray Serve:
+
+```python
+# pipeline_bridge.py - runs as Ray actor, subscribes to NATS
+import ray
+from ray import serve
+import nats
+
+@ray.remote
+class NATSBridge:
+    def __init__(self):
+        self.nc = None
+        self.chat_handle = serve.get_deployment_handle("ChatDeployment", "chat-handler")
+        self.voice_handle = serve.get_deployment_handle("VoiceDeployment", "voice-assistant")
+    
+    async def start(self):
+        self.nc = await nats.connect("nats://nats.ai-ml.svc.cluster.local:4222")
+        
+        await self.nc.subscribe("ai.chat.request", cb=self.handle_chat)
+        await self.nc.subscribe("voice.request", cb=self.handle_voice)
+    
+    async def handle_chat(self, msg):
+        result = await self.chat_handle.process_chat.remote(msg.data)
+        if msg.reply:
+            await self.nc.publish(msg.reply, result)
+```
+
+## CI/CD Flow
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│ Developer pushes to handler repo                                   │
+├────────────────────────────────────────────────────────────────────┤
+│ 1. Gitea Actions: lint → test                                      │
+│ 2. On tag: build wheel → publish to Gitea PyPI                    │
+├────────────────────────────────────────────────────────────────────┤
+│ 3. Update RayService version in homelab-k8s2                       │
+│    (bump handler-base>=0.2.0 in runtime_env)                       │
+├────────────────────────────────────────────────────────────────────┤
+│ 4. Flux detects change → applies RayService                        │
+│ 5. Ray downloads new packages → restarts deployments               │
+└────────────────────────────────────────────────────────────────────┘
+```
+
+## Alternatives Considered
+
+### Standalone Container Deployments
+
+Run handlers as separate Kubernetes Deployments outside Ray.
+
+**Rejected because:**
+- Duplicates infrastructure (separate scaling, health checks, etc.)
+- HTTP overhead for every inference call
+- Separate observability stack
+- Against the "Ray as unified compute" philosophy
+
+### Bake Handlers into Worker Images
+
+Pre-install handler code in ray-worker images.
+
+**Rejected because:**
+- Couples handler releases to image rebuilds
+- Slower iteration cycle
+- Larger images
+
+## Consequences
+
+### Positive
+- Single platform: Everything runs in Ray
+- Fast internal calls via Ray handles
+- Unified observability in Ray Dashboard
+- Clean abstraction layers: Kubeflow → KServe → Ray → GPU
+- Handlers scale with Ray's autoscaler
+
+### Negative
+- Handlers share Ray head node resources
+- Need to manage Gitea PyPI authentication for runtime_env
+- Slightly more complex RayService configuration
+
+### Neutral
+- MLflow can track handler "models" if we want versioned deployments
+- Kubeflow can trigger handler updates via pipelines
+
+## References
+
+- [ray-kserve-integration.md](../../homelab-k8s2/docs/ray-kserve-integration.md)
+- [Ray Serve runtime_env docs](https://docs.ray.io/en/latest/serve/production-guide/config.html)
+- [Gitea Package Registry](https://docs.gitea.io/en-us/packages/pypi/)
+- [ADR-0012: Ray Cluster Architecture](ADR-0012-ray-cluster-unified.md)
--- a/decisions/0020-internal-registry-for-cicd.md
+++ b/decisions/0020-internal-registry-for-cicd.md
@@ -0,0 +1,133 @@
+# ADR-0020: Internal Registry URLs for CI/CD
+
+## Status
+
+Accepted
+
+## Date
+
+2026-02-02
+
+## Context
+
+| Factor | Details |
+|--------|---------|
+| Problem | Cloudflare proxying limits uploads to 100MB per request |
+| Impact | Docker images (20GB+) and large packages fail to push |
+| Current Setup | Gitea at `git.daviestechlabs.io` behind Cloudflare |
+| Internal Access | `registry.lab.daviestechlabs.io` bypasses Cloudflare |
+
+Our Gitea instance is accessible via two URLs:
+- **External**: `git.daviestechlabs.io` - proxied through Cloudflare (DDoS protection, caching)
+- **Internal**: `registry.lab.daviestechlabs.io` - direct access from cluster network
+
+Cloudflare's free tier enforces a 100MB upload limit per request. This blocks:
+- Docker image pushes (multi-GB layers)
+- Large Python package uploads
+- Any artifact exceeding 100MB
+
+## Decision
+
+**Use internal registry URLs for all CI/CD artifact uploads.**
+
+CI/CD workflows running on Gitea Actions runners (which are inside the cluster) should use `registry.lab.daviestechlabs.io` for:
+- Docker image pushes
+- PyPI package uploads
+- Any large artifact uploads
+
+External URLs remain for:
+- Git operations (clone, push, pull)
+- Package downloads (pip install, docker pull)
+- Human access via browser
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         INTERNET                                 │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+                    ┌─────────────────┐
+                    │   Cloudflare    │
+                    │  (100MB limit)  │
+                    └────────┬────────┘
+                             │
+                             ▼
+              ┌──────────────────────────────┐
+              │  git.daviestechlabs.io       │
+              │  (external access)           │
+              └──────────────────────────────┘
+                             │
+                             │  same Gitea instance
+                             │
+              ┌──────────────────────────────┐
+              │  registry.lab.daviestechlabs │
+              │  (internal, no limits)       │
+              └──────────────────────────────┘
+                             ▲
+                             │ direct upload
+                             │
+              ┌──────────────────────────────┐
+              │     Gitea Actions Runner     │
+              │     (in-cluster)             │
+              └──────────────────────────────┘
+```
+
+## Consequences
+
+### Positive
+
+- **No upload size limits** for CI/CD artifacts
+- **Faster uploads** (no Cloudflare proxy overhead)
+- **Lower latency** for in-cluster operations
+- **Cost savings** (reduced Cloudflare bandwidth)
+
+### Negative
+
+- **Two URLs to maintain** in workflow configurations
+- **Runners must be in-cluster** (cannot use external runners for uploads)
+- **DNS split-horizon** required if accessing from outside
+
+### Neutral
+
+- External users can still pull packages/images via Cloudflare URL
+- Git operations continue through external URL (small payloads)
+
+## Implementation
+
+### Docker Registry Login
+
+```yaml
+- name: Login to Internal Registry
+  uses: docker/login-action@v3
+  with:
+    registry: registry.lab.daviestechlabs.io
+    username: ${{ secrets.REGISTRY_USER }}
+    password: ${{ secrets.REGISTRY_TOKEN }}
+```
+
+### PyPI Upload
+
+```yaml
+- name: Publish to Gitea PyPI
+  run: |
+    twine upload \
+      --repository-url https://registry.lab.daviestechlabs.io/api/packages/daviestechlabs/pypi \
+      dist/*
+```
+
+### Environment Variable Pattern
+
+For consistency across workflows:
+
+```yaml
+env:
+  REGISTRY_EXTERNAL: git.daviestechlabs.io
+  REGISTRY_INTERNAL: registry.lab.daviestechlabs.io
+```
+
+## Related
+
+- [ADR-0019: Handler Deployment Strategy](ADR-0019-handler-deployment-strategy.md) - Uses PyPI publishing
+- Cloudflare upload limits: https://developers.cloudflare.com/workers/platform/limits/
--- a/decisions/0021-notification-architecture.md
+++ b/decisions/0021-notification-architecture.md
@@ -0,0 +1,131 @@
+# ADR-0021: Notification Architecture
+
+## Status
+
+Accepted
+
+## Context
+
+The homelab infrastructure generates notifications from multiple sources:
+
+1. **CI/CD pipelines** (Gitea Actions) - build success/failure
+2. **Alertmanager** - Prometheus alerts for critical/warning conditions
+3. **Gatus** - Service health monitoring
+4. **Flux** - GitOps reconciliation events
+5. **Service readiness** - Notifications when deployments complete successfully
+
+Currently, ntfy serves as the primary notification hub, but there are several issues:
+
+- **Topic inconsistency**: CI workflows were posting to `builds` while documentation (ADR-0015) specified `gitea-ci`
+- **No Alertmanager integration**: Critical Prometheus alerts had no delivery mechanism
+- **No service readiness notifications**: No visibility when services come online after deployment
+
+## Decision
+
+### 1. ntfy as the Notification Hub
+
+ntfy will serve as the central notification aggregation point. All internal services publish to ntfy topics via the internal Kubernetes service URL:
+
+```
+http://ntfy-svc.observability.svc.cluster.local/<topic>
+```
+
+This keeps ntfy auth-protected externally while allowing internal services to publish freely.
+
+### 2. Standardized Topics
+
+| Topic | Source | Description |
+|-------|--------|-------------|
+| `gitea-ci` | Gitea Actions | CI/CD build notifications |
+| `alertmanager-alerts` | Alertmanager | Prometheus critical/warning alerts |
+| `gatus` | Gatus | Service health status changes |
+| `flux` | Flux | GitOps reconciliation events |
+| `deployments` | Flux/Argo | Service deployment completions |
+
+### 3. Alertmanager Integration
+
+Alertmanager is configured to forward alerts to ntfy using the built-in `tpl=alertmanager` template:
+
+```yaml
+receivers:
+  - name: ntfy-critical
+    webhookConfigs:
+      - url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=urgent&tags=rotating_light"
+        sendResolved: true
+  - name: ntfy-warning
+    webhookConfigs:
+      - url: "http://ntfy-svc.observability.svc.cluster.local/alertmanager-alerts?tpl=alertmanager&priority=high&tags=warning"
+        sendResolved: true
+```
+
+Routes direct alerts based on severity:
+- `severity=critical` → `ntfy-critical` receiver
+- `severity=warning` → `ntfy-warning` receiver
+
+### 4. Service Readiness Notifications
+
+To provide visibility when services are fully operational after deployment:
+
+**Option A: Flux Notification Controller**
+Configure Flux's notification-controller to send alerts when Kustomizations/HelmReleases succeed:
+
+```yaml
+apiVersion: notification.toolkit.fluxcd.io/v1beta3
+kind: Provider
+metadata:
+  name: ntfy-deployments
+spec:
+  type: generic-hmac  # or generic
+  address: http://ntfy-svc.observability.svc.cluster.local/deployments
+---
+apiVersion: notification.toolkit.fluxcd.io/v1beta3
+kind: Alert
+metadata:
+  name: deployment-success
+spec:
+  providerRef:
+    name: ntfy-deployments
+  eventSeverity: info
+  eventSources:
+    - kind: Kustomization
+      name: '*'
+    - kind: HelmRelease
+      name: '*'
+  inclusionList:
+    - ".*succeeded.*"
+```
+
+**Option B: Argo Workflows Post-Deploy Hook**
+For Argo-managed deployments, add a notification step at workflow completion.
+
+**Recommendation**: Use Flux Notification Controller (Option A) as it's already part of the GitOps stack and provides native integration.
+
+## Consequences
+
+### Positive
+
+- **Single source of truth**: All notifications flow through ntfy
+- **Auth protection maintained**: External ntfy access requires Authentik auth
+- **Deployment visibility**: Know when services are ready without watching logs
+- **Consistent topic naming**: All sources follow documented conventions
+
+### Negative
+
+- **Configuration overhead**: Each notification source requires explicit configuration
+
+### Neutral
+
+- Topic naming must be documented and followed consistently
+- Future Discord integration addressed in ADR-0022
+
+## Implementation Checklist
+
+- [x] Standardize CI notifications to `gitea-ci` topic
+- [x] Configure Alertmanager → ntfy for critical/warning alerts
+- [ ] Configure Flux notification-controller for deployment notifications
+- [ ] Add `deployments` topic subscription to ntfy app
+
+## Related
+
+- ADR-0015: CI Notifications and Semantic Versioning
+- ADR-0022: ntfy-Discord Bridge Service
--- a/decisions/0022-ntfy-discord-bridge.md
+++ b/decisions/0022-ntfy-discord-bridge.md
@@ -0,0 +1,302 @@
+# ADR-0022: ntfy-Discord Bridge Service
+
+## Status
+
+Accepted
+
+## Context
+
+Per ADR-0021, ntfy serves as the central notification hub for the homelab. However, Discord is used for team collaboration and visibility, requiring notifications to be forwarded there as well.
+
+ntfy does not natively support Discord webhook format. Discord expects a specific JSON structure with embeds, while ntfy uses its own message format. A bridge service is needed to:
+
+1. Subscribe to ntfy topics
+2. Transform messages to Discord embed format
+3. Forward to Discord webhooks
+
+## Decision
+
+### Architecture
+
+A dedicated Go microservice (`ntfy-discord`) will bridge ntfy to Discord:
+
+```
+┌─────────────────┐     ┌──────────────────┐     ┌─────────────┐
+│ CI/Alertmanager │────▶│      ntfy        │────▶│  ntfy App   │
+│ Gatus/Flux      │     │  (notification   │     │  (mobile)   │
+└─────────────────┘     │      hub)        │     └─────────────┘
+                        └────────┬─────────┘
+                                 │ SSE/JSON stream
+                                 ▼
+                        ┌──────────────────┐     ┌─────────────┐
+                        │  ntfy-discord    │────▶│  Discord    │
+                        │     (Go)         │     │  Webhook    │
+                        └──────────────────┘     └─────────────┘
+```
+
+### Service Design
+
+**Repository**: `ntfy-discord`
+
+**Technology Stack**:
+- Go 1.22+
+- `fsnotify` for hot reload of secrets/config
+- Standard library `net/http` for SSE subscription
+- `slog` for structured logging
+- Scratch/distroless base image (~10MB final image)
+
+**Why Go over Python**:
+- **Smaller images**: ~10MB vs ~150MB+ for Python
+- **Cloud native**: Single static binary, no runtime dependencies
+- **Memory efficient**: Lower RSS, ideal for always-on bridge
+- **Concurrency**: Goroutines for SSE handling and webhook delivery
+- **Compile-time safety**: Catch errors before deployment
+
+**Core Features**:
+
+1. **SSE Subscription**: Connect to ntfy's JSON stream endpoint for real-time messages
+2. **Automatic Reconnection**: Exponential backoff on connection failures
+3. **Message Transformation**: Convert ntfy format to Discord embed format
+4. **Priority Mapping**: Map ntfy priorities to Discord embed colors
+5. **Topic Routing**: Configure which topics go to which Discord channels/webhooks
+6. **Hot Reload**: Watch mounted secrets/configmaps with fsnotify, reload without restart
+7. **Health Endpoint**: `/health` and `/ready` for Kubernetes probes
+8. **Metrics**: Prometheus metrics at `/metrics`
+
+### Hot Reload Implementation
+
+Kubernetes mounts secrets as symlinked files that update atomically. The bridge uses `fsnotify` to watch for changes:
+
+```go
+// Watch for secret changes and reload config
+func (b *Bridge) watchSecrets(ctx context.Context, secretPath string) {
+    watcher, _ := fsnotify.NewWatcher()
+    defer watcher.Close()
+    
+    watcher.Add(secretPath)
+    
+    for {
+        select {
+        case event := <-watcher.Events:
+            if event.Has(fsnotify.Write) || event.Has(fsnotify.Create) {
+                slog.Info("secret changed, reloading config")
+                b.reloadConfig(secretPath)
+            }
+        case <-ctx.Done():
+            return
+        }
+    }
+}
+```
+
+This allows ExternalSecrets to rotate the Discord webhook URL without pod restarts.
+
+### Configuration
+
+Configuration via environment variables and mounted secrets:
+
+```yaml
+# Environment variables (ConfigMap)
+NTFY_URL: "http://ntfy.observability.svc.cluster.local"
+NTFY_TOPICS: "gitea-ci,alertmanager-alerts,flux-deployments,gatus"
+LOG_LEVEL: "info"
+METRICS_ENABLED: "true"
+
+# Mounted secret (hot-reloadable)
+/secrets/discord-webhook-url  # Single webhook for all topics
+# OR for topic routing:
+/secrets/topic-webhooks.yaml  # YAML mapping topics to webhooks
+```
+
+Topic routing file (optional):
+```yaml
+gitea-ci: "https://discord.com/api/webhooks/xxx/ci"
+alertmanager-alerts: "https://discord.com/api/webhooks/xxx/alerts"
+flux-deployments: "https://discord.com/api/webhooks/xxx/deploys"
+default: "https://discord.com/api/webhooks/xxx/general"
+```
+
+### Message Transformation
+
+ntfy message:
+```json
+{
+  "id": "abc123",
+  "topic": "gitea-ci",
+  "title": "Build succeeded",
+  "message": "ray-serve-apps published to PyPI",
+  "priority": 3,
+  "tags": ["package", "white_check_mark"],
+  "time": 1770050091
+}
+```
+
+Discord embed:
+```json
+{
+  "embeds": [{
+    "title": "✅ Build succeeded",
+    "description": "ray-serve-apps published to PyPI",
+    "color": 3066993,
+    "fields": [
+      {"name": "Topic", "value": "gitea-ci", "inline": true}
+    ],
+    "timestamp": "2026-02-02T11:34:51Z",
+    "footer": {"text": "ntfy"}
+  }]
+}
+```
+
+**Priority → Color Mapping**:
+| Priority | Name | Discord Color |
+|----------|------|---------------|
+| 5 | Max/Urgent | 🔴 Red (15158332) |
+| 4 | High | 🟠 Orange (15105570) |
+| 3 | Default | 🔵 Blue (3066993) |
+| 2 | Low | ⚪ Gray (9807270) |
+| 1 | Min | ⚪ Light Gray (12370112) |
+
+**Tag → Emoji Mapping**:
+Common ntfy tags are converted to Discord-friendly emojis in the title:
+- `white_check_mark` / `heavy_check_mark` → ✅
+- `x` / `skull` → ❌
+- `warning` → ⚠️
+- `rotating_light` → 🚨
+- `rocket` → 🚀
+- `package` → 📦
+
+### Kubernetes Deployment
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: ntfy-discord
+  namespace: observability
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: ntfy-discord
+  template:
+    metadata:
+      labels:
+        app: ntfy-discord
+    spec:
+      containers:
+        - name: bridge
+          image: gitea-http.gitea.svc.cluster.local:3000/daviestechlabs/ntfy-discord:latest
+          env:
+            - name: NTFY_URL
+              value: "http://ntfy.observability.svc.cluster.local"
+            - name: NTFY_TOPICS
+              value: "gitea-ci,alertmanager-alerts,flux-deployments"
+            - name: SECRETS_PATH
+              value: "/secrets"
+          ports:
+            - containerPort: 8080
+              name: http
+          volumeMounts:
+            - name: discord-secrets
+              mountPath: /secrets
+              readOnly: true
+          livenessProbe:
+            httpGet:
+              path: /health
+              port: http
+            initialDelaySeconds: 5
+            periodSeconds: 30
+          readinessProbe:
+            httpGet:
+              path: /ready
+              port: http
+            periodSeconds: 10
+          resources:
+            limits:
+              cpu: 50m
+              memory: 32Mi
+            requests:
+              cpu: 5m
+              memory: 16Mi
+      volumes:
+        - name: discord-secrets
+          secret:
+            secretName: discord-webhook-secret
+```
+
+### Secret Management
+
+Discord webhook URL stored in Vault at `kv/data/discord`:
+
+```yaml
+apiVersion: external-secrets.io/v1beta1
+kind: ExternalSecret
+metadata:
+  name: discord-webhook-secret
+  namespace: observability
+spec:
+  refreshInterval: 1h
+  secretStoreRef:
+    name: vault
+    kind: ClusterSecretStore
+  target:
+    name: discord-webhook-secret
+  data:
+    - secretKey: webhook-url
+      remoteRef:
+        key: kv/data/discord
+        property: webhook_url
+```
+
+When ExternalSecrets refreshes and updates the secret, the bridge detects the file change and reloads without restart.
+
+### Error Handling
+
+1. **Connection Loss**: Exponential backoff (1s, 2s, 4s, ... max 60s)
+2. **Discord Rate Limits**: Respect `Retry-After` header, queue messages
+3. **Invalid Messages**: Log and skip, don't crash
+4. **Webhook Errors**: Log error, continue processing other messages
+5. **Config Reload Errors**: Log error, keep using previous config
+
+## Consequences
+
+### Positive
+
+- **Tiny footprint**: ~10MB image, 16MB memory
+- **Hot reload**: Secrets update without pod restart
+- **Robust**: Proper reconnection and error handling
+- **Observable**: Structured logging, Prometheus metrics, health endpoints
+- **Fast startup**: <100ms cold start
+- **Cloud native**: Static binary, distroless image
+
+### Negative
+
+- **Go learning curve**: Different patterns than Python services
+- **Operational Overhead**: Another service to maintain
+- **Latency**: Adds ~50-100ms to notification delivery
+
+### Neutral
+
+- Webhook URL must be maintained in Vault
+- Service logs should be monitored for errors
+
+## Implementation Checklist
+
+- [x] Create `ntfy-discord` repository
+- [ ] Implement core bridge logic
+- [ ] Add SSE client with reconnection
+- [ ] Implement message transformation
+- [ ] Add fsnotify hot reload for secrets
+- [ ] Add health/ready/metrics endpoints
+- [ ] Write unit tests
+- [ ] Create multi-stage Dockerfile (scratch base)
+- [ ] Set up CI/CD pipeline (Gitea Actions)
+- [ ] Add ExternalSecret for Discord webhook
+- [ ] Create Kubernetes manifests
+- [ ] Deploy to observability namespace
+- [ ] Verify notifications flowing to Discord
+
+## Related
+
+- ADR-0021: Notification Architecture
+- ADR-0015: CI Notifications and Semantic Versioning
--- a/decisions/0023-valkey-ml-caching.md
+++ b/decisions/0023-valkey-ml-caching.md
@@ -0,0 +1,108 @@
+# ADR-0023: Valkey for ML Inference Caching
+
+## Status
+
+Accepted
+
+## Context
+
+The AI/ML platform requires caching infrastructure for multiple use cases:
+
+1. **KV-Cache Offloading**: vLLM can offload key-value cache tensors to external storage, reducing GPU memory pressure and enabling longer context windows
+2. **Embedding Cache**: Frequently requested embeddings can be cached to avoid redundant GPU computation
+3. **Session State**: Conversation history and intermediate results for multi-turn interactions
+4. **Ray Object Store Spillover**: Large Ray objects can spill to external storage when memory is constrained
+
+Previously, two separate Valkey instances existed:
+- `valkey` - General-purpose with 10Gi persistent storage
+- `mlcache` - ML-optimized ephemeral cache with 4GB memory limit and LRU eviction
+
+Analysis revealed that `mlcache` had **zero consumers** in the codebase - no services were actually connecting to it.
+
+## Decision
+
+### Consolidate to Single Valkey Instance
+
+Remove `mlcache` and use the existing `valkey` instance for all caching needs. When vLLM KV-cache offloading is implemented in the RayService deployment, configure it to use the existing Valkey instance.
+
+### Valkey Configuration
+
+The current `valkey` instance at `valkey.ai-ml.svc.cluster.local:6379`:
+
+| Setting | Value | Rationale |
+|---------|-------|-----------|
+| Persistence | 10Gi Longhorn PVC | Survive restarts, cache warm-up |
+| Memory | 512Mi request, 2Gi limit | Sufficient for current workloads |
+| Auth | Disabled | Internal cluster-only access |
+| Metrics | Prometheus ServiceMonitor | Observability |
+
+### Future: vLLM KV-Cache Integration
+
+When implementing LMCache or similar KV-cache offloading for vLLM:
+
+```python
+# In ray_serve/serve_llm.py
+from vllm import AsyncLLMEngine
+
+engine = AsyncLLMEngine.from_engine_args(
+    engine_args,
+    kv_cache_config={
+        "type": "redis",
+        "url": "redis://valkey.ai-ml.svc.cluster.local:6379",
+        "prefix": "vllm:kv:",
+        "ttl": 3600,  # 1 hour cache lifetime
+    }
+)
+```
+
+If memory pressure becomes an issue, scale Valkey resources:
+
+```yaml
+resources:
+  limits:
+    memory: "8Gi"  # Increase for larger KV-cache
+extraArgs:
+  - --maxmemory
+  - 6gb
+  - --maxmemory-policy
+  - allkeys-lru
+```
+
+### Key Prefixes Convention
+
+To avoid collisions when multiple services share Valkey:
+
+| Service | Prefix | Example Key |
+|---------|--------|-------------|
+| vLLM KV-Cache | `vllm:kv:` | `vllm:kv:layer0:tok123` |
+| Embeddings Cache | `emb:` | `emb:sha256:abc123` |
+| Ray State | `ray:` | `ray:actor:xyz` |
+| Session State | `session:` | `session:user:123` |
+
+## Consequences
+
+### Positive
+
+- **Reduced complexity**: One cache instance instead of two
+- **Resource efficiency**: No unused mlcache consuming 4GB memory allocation
+- **Operational simplicity**: Single point of monitoring and maintenance
+- **Cost savings**: One less PVC, pod, and service to manage
+
+### Negative
+
+- **Shared resource contention**: All workloads share the same cache
+- **Single point of failure**: Cache unavailability affects all consumers
+
+### Mitigations
+
+- **Namespace isolation via prefixes**: Prevents key collisions
+- **LRU eviction**: Automatic cleanup when memory is constrained
+- **Persistent storage**: Cache survives pod restarts
+- **Monitoring**: Prometheus metrics for memory usage alerts
+
+## References
+
+- [vLLM Distributed KV-Cache](https://docs.vllm.ai/en/latest/serving/distributed_serving.html)
+- [LMCache Project](https://github.com/LMCache/LMCache)
+- [Valkey Documentation](https://valkey.io/docs/)
+- [Ray External Storage](https://docs.ray.io/en/latest/ray-core/objects/object-spilling.html)
--- a/decisions/0024-ray-repository-structure.md
+++ b/decisions/0024-ray-repository-structure.md
@@ -0,0 +1,184 @@
+# ADR-0024: Ray Repository Structure
+
+## Status
+
+Accepted
+
+## Date
+
+2026-02-03
+
+## Context
+
+| Factor | Details |
+|--------|---------|
+| Problem | Need to document the Ray-specific repository structure |
+| Impact | Clarity on where Ray components live post-migration |
+| Current State | kuberay-images standalone, ray-serve needs extraction |
+| Goal | Clean separation with independent release cycles |
+
+### Historical Context
+
+`llm-workflows` was the original monolithic repository containing all ML/AI infrastructure code. It has been **archived** after being fully decomposed into focused, independent repositories:
+
+| Repository | Purpose |
+|------------|---------|
+| `ai-apps` | Gradio applications (STT, TTS, embeddings UIs) |
+| `ai-pipelines` | Kubeflow pipeline definitions |
+| `ai-services` | Core ML service implementations |
+| `chat-handler` | Chat orchestration and routing |
+| `handler-base` | Base handler framework |
+| `pipeline-bridge` | Bridge between pipelines and services |
+| `stt-module` | Speech-to-text service |
+| `tts-module` | Text-to-speech service |
+| `voice-assistant` | Voice assistant integration |
+| `gradio-ui` | Shared Gradio UI components |
+| `kuberay-images` | GPU-specific Ray worker base images |
+| `ntfy-discord` | Notification bridge |
+| `spark-analytics-jobs` | Spark batch analytics |
+| `flink-analytics-jobs` | Flink streaming analytics |
+
+### Remaining Ray Component
+
+The `ray-serve` code still needs a dedicated repository for Ray Serve model inference services.
+
+| Component | Current Location | Purpose |
+|-----------|------------------|---------|
+| kuberay-images | `kuberay-images/` (standalone) | Docker images for Ray workers (NVIDIA, AMD, Intel) |
+| ray-serve | `llm-workflows/ray-serve/` | Ray Serve inference services |
+| llm-workflows | `llm-workflows/` | Pipelines, handlers, STT/TTS, embeddings |
+
+### Problems with Current Structure
+
+1. **Tight Coupling**: ray-serve changes require llm-workflows repo access
+2. **CI/CD Complexity**: Building ray-serve images triggers unrelated workflow steps
+3. **Version Management**: Can't independently version ray-serve deployments
+4. **Team Access**: Contributors to ray-serve need access to entire llm-workflows repo
+5. **Build Times**: Changes to unrelated code can trigger ray-serve rebuilds
+
+## Decision
+
+**Establish two dedicated Ray repositories with distinct purposes:**
+
+| Repository | Type | Contents | Release Cycle |
+|------------|------|----------|---------------|
+| `kuberay-images` | Docker images | Ray worker base images (GPU-specific) | On dependency updates |
+| `ray-serve` | PyPI package | Ray Serve application code | Per model/feature update |
+
+### Key Design: Dynamic Code Loading
+
+Ray Serve applications are deployed as **PyPI packages**, not baked into Docker images. This enables:
+
+- **Dynamic Decoupling**: Update model serving logic without rebuilding containers
+- **Runtime Flexibility**: Ray cluster pulls code via `pip install` at runtime
+- **Faster Iteration**: Code changes don't require image rebuilds or pod restarts
+- **Version Pinning**: Kubernetes manifests specify package versions independently
+
+### Repository Structure
+
+```
+kuberay-images/                    # Docker images - GPU runtime environments
+├── Dockerfile.ray-worker-nvidia
+├── Dockerfile.ray-worker-rdna2
+├── Dockerfile.ray-worker-strixhalo
+├── Dockerfile.ray-worker-intel
+├── Makefile
+└── .gitea/workflows/
+    └── build-push.yaml            # Builds & pushes to container registry
+
+ray-serve/                         # PyPI package - application code
+├── src/
+│   └── ray_serve/
+│       ├── __init__.py
+│       ├── model_configs.py
+│       └── serve_apps.py
+├── pyproject.toml
+├── README.md
+└── .gitea/workflows/
+    └── publish-ray-serve.yaml     # Publishes to PyPI registry
+```
+
+**Note**: Kubernetes deployment manifests live in `homelab-k8s2`, not in either Ray repo. This maintains separation between:
+- **Infrastructure** (kuberay-images) - How to run Ray workers
+- **Application** (ray-serve) - What code to run
+- **Orchestration** (homelab-k8s2) - Where and when to deploy
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                        RAY INFRASTRUCTURE                            │
+└─────────────────────────────────────────────────────────────────────┘
+                                  │
+              ┌───────────────────┴───────────────────┐
+              │                                       │
+              ▼                                       ▼
+      ┌───────────────┐                       ┌───────────────┐
+      │ kuberay-images│                       │   ray-serve   │
+      │               │                       │               │
+      │ Base worker   │                       │ PyPI package  │
+      │ Docker images │                       │ Ray Serve     │
+      │               │                       │ application   │
+      │ NVIDIA/AMD/   │                       │               │
+      │ Intel GPUs    │                       │ Model configs │
+      └───────────────┘                       └───────────────┘
+              │                                       │
+              ▼                                       ▼
+      ┌───────────────┐                       ┌───────────────┐
+      │ Container     │                       │ PyPI          │
+      │ Registry      │                       │ Registry      │
+      │ registry.lab/ │                       │ registry.lab/ │
+      │ kuberay/*     │                       │ pypi/ray-serve│
+      └───────────────┘                       └───────────────┘
+              │                                       │
+              └───────────────────┬───────────────────┘
+                                  │
+                                  ▼
+                        ┌───────────────────┐
+                        │   Ray Cluster     │
+                        │                   │
+                        │ 1. Pull container │
+                        │ 2. pip install    │
+                        │    ray-serve      │
+                        │ 3. Run serve app  │
+                        └───────────────────┘
+```
+
+## Consequences
+
+### Positive
+
+- **Dynamic Updates**: Deploy new model serving code without rebuilding images
+- **Independent Releases**: Containers and application code versioned separately
+- **Faster Iteration**: PyPI publish is seconds vs minutes for Docker builds
+- **Clear Separation**: Infrastructure (images) vs Application (code) vs Orchestration (k8s)
+- **Runtime Flexibility**: Same container can run different ray-serve versions
+
+### Negative
+
+- **Runtime Dependencies**: Pod startup requires `pip install` (cached in practice)
+- **Version Coordination**: Must track compatible versions between kuberay-images and ray-serve
+
+### Migration Steps
+
+1. ✅ `kuberay-images` already exists as standalone repo
+2. ✅ `llm-workflows` archived - all components extracted to dedicated repos
+3. [ ] Create `ray-serve` repo on Gitea
+4. [ ] Move `.gitea/workflows/publish-ray-serve.yaml` to new repo
+5. [ ] Set up pyproject.toml for PyPI publishing
+6. [ ] Update RayService manifests to `pip install ray-serve==X.Y.Z`
+7. [ ] Verify Ray cluster pulls package correctly at runtime
+
+## Version Compatibility Matrix
+
+| kuberay-images | ray-serve | Notes |
+|----------------|-----------|-------|
+| 1.0.0 | 1.0.0 | Initial structure |
+
+## References
+
+- [ADR-0020: Internal Registry for CI/CD](./ADR-0020-internal-registry-for-cicd.md)
+- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
+- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)
+- [KubeRay Documentation](https://ray-project.github.io/kuberay/)
+- [Ray Serve Documentation](https://docs.ray.io/en/latest/serve/index.html)