diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
index d554afa..727a9eb 100644
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -60,15 +60,24 @@ The homelab is a production-grade Kubernetes cluster running on bare-metal hardw
                                   │
                                   ▼
 ┌─────────────────────────────────────────────────────────────────────────────┐
-│                         AI SERVICES LAYER                                    │
+│                      GPU INFERENCE LAYER (KubeRay)                           │
 ├─────────────────────────────────────────────────────────────────────────────┤
-│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐   │
-│  │ Whisper │ │  XTTS   │ │  vLLM   │ │ Milvus  │ │   BGE   │ │Reranker │   │
-│  │  (STT)  │ │  (TTS)  │ │  (LLM)  │ │  (RAG)  │ │(Embed)  │ │  (BGE)  │   │
-│  ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤   │
-│  │ KServe  │ │ KServe  │ │ vLLM    │ │  Helm   │ │ KServe  │ │ KServe  │   │
-│  │ nvidia  │ │ nvidia  │ │ ROCm    │ │ Minio   │ │ rdna2   │ │ intel   │   │
-│  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘   │
+│  RayService: ai-inference-serve-svc:8000                                    │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │                    Ray Serve (Unified Endpoint)                      │    │
+│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐   │    │
+│  │  │ /whisper │ │   /tts   │ │   /llm   │ │/embeddings│ │/reranker │   │    │
+│  │  │ Whisper  │ │  XTTS    │ │  vLLM    │ │  BGE-L    │ │ BGE-Rnk  │   │    │
+│  │  │ (0.5 GPU)│ │(0.5 GPU) │ │(0.95 GPU)│ │ (0.8 GPU) │ │(0.8 GPU) │   │    │
+│  │  ├──────────┤ ├──────────┤ ├──────────┤ ├──────────┤ ├──────────┤   │    │
+│  │  │elminster │ │elminster │ │ khelben  │ │  drizzt  │ │  danilo  │   │    │
+│  │  │RTX 2070  │ │RTX 2070  │ │Strix Halo│ │Radeon 680│ │Intel Arc │   │    │
+│  │  │  CUDA    │ │  CUDA    │ │  ROCm    │ │  ROCm    │ │  Intel   │   │    │
+│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘   │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                                                              │
+│  KServe Aliases: {whisper,tts,llm,embeddings,reranker}-predictor.ai-ml     │
+│  Milvus: Vector database for RAG (Helm, MinIO backend)                      │
 └─────────────────────────────────────────────────────────────────────────────┘
                                   │
                                   ▼
@@ -279,6 +288,8 @@ Applications ──► OpenTelemetry SDK ──► Jaeger/Tempo ──► Grafan
 | MessagePack over JSON | Binary efficiency for audio | [ADR-0004](decisions/0004-use-messagepack-for-nats.md) |
 | Multi-GPU heterogeneous | Cost optimization, workload matching | [ADR-0005](decisions/0005-multi-gpu-strategy.md) |
 | GitOps with Flux | Declarative, auditable, secure | [ADR-0006](decisions/0006-gitops-with-flux.md) |
+| KServe for inference | Standardized API, autoscaling | [ADR-0007](decisions/0007-use-kserve-for-inference.md) |
+| KubeRay unified backend | Fractional GPU, single endpoint | [ADR-0011](decisions/0011-kuberay-unified-gpu-backend.md) |
 
 ## Related Documents
 
diff --git a/CODING-CONVENTIONS.md b/CODING-CONVENTIONS.md
index 30aac87..cc6ebf9 100644
--- a/CODING-CONVENTIONS.md
+++ b/CODING-CONVENTIONS.md
@@ -36,13 +36,19 @@ handler-base/                # Shared library for all handlers
 │   ├── health.py            # K8s probes
 │   ├── telemetry.py         # OpenTelemetry
 │   └── clients/             # Service clients
+├── tests/
 └── pyproject.toml
 
 chat-handler/                # Text chat service
 voice-assistant/             # Voice pipeline service
-├── {name}.py                # Standalone version
-├── {name}_v2.py             # Handler-base version (preferred)
-└── Dockerfile.v2
+pipeline-bridge/             # Workflow engine bridge
+├── {name}.py                # Handler implementation (uses handler-base)
+├── pyproject.toml           # PEP 621 project metadata (see ADR-0012)
+├── uv.lock                  # Deterministic lock file
+├── tests/
+│   ├── conftest.py
+│   └── test_{name}.py
+└── Dockerfile
 
 argo/                        # Argo WorkflowTemplates
 ├── {workflow-name}.yaml
@@ -59,6 +65,29 @@ kuberay-images/              # GPU worker images
 
 ## Python Conventions
 
+### Package Management (ADR-0012)
+
+Use **uv** for local development and **pip** in Docker for reproducibility:
+
+```bash
+# Install uv (one-time)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Create virtual environment and install
+uv venv
+source .venv/bin/activate
+uv pip install -e ".[dev]"
+
+# Or use uv sync with lock file
+uv sync
+
+# Update lock file after changing pyproject.toml
+uv lock
+
+# Run tests
+uv run pytest
+```
+
 ### Project Structure
 
 ```python
diff --git a/TECH-STACK.md b/TECH-STACK.md
index 03e5fce..72ee526 100644
--- a/TECH-STACK.md
+++ b/TECH-STACK.md
@@ -31,22 +31,27 @@
 
 ## AI/ML Layer
 
-### Inference Engines
+### GPU Inference (KubeRay RayService)
 
-| Service | Framework | GPU | Model Type |
-|---------|-----------|-----|------------|
-| [vLLM](https://vllm.ai) | ROCm | AMD Strix Halo | Large Language Models |
-| [faster-whisper](https://github.com/guillaumekln/faster-whisper) | CUDA | NVIDIA RTX 2070 | Speech-to-Text |
-| [XTTS](https://github.com/coqui-ai/TTS) | CUDA | NVIDIA RTX 2070 | Text-to-Speech |
-| [BGE Embeddings](https://huggingface.co/BAAI/bge-large-en-v1.5) | ROCm | AMD Radeon 680M | Text Embeddings |
-| [BGE Reranker](https://huggingface.co/BAAI/bge-reranker-large) | Intel | Intel Arc | Document Reranking |
+All AI inference runs on a unified Ray Serve endpoint with fractional GPU allocation:
 
-### ML Serving
+| Service | Model | GPU Node | GPU Type | Allocation |
+|---------|-------|----------|----------|------------|
+| `/llm` | [vLLM](https://vllm.ai) (Llama 3.1 70B) | khelben | AMD Strix Halo 64GB | 0.95 GPU |
+| `/whisper` | [faster-whisper](https://github.com/guillaumekln/faster-whisper) v3 | elminster | NVIDIA RTX 2070 8GB | 0.5 GPU |
+| `/tts` | [XTTS](https://github.com/coqui-ai/TTS) | elminster | NVIDIA RTX 2070 8GB | 0.5 GPU |
+| `/embeddings` | [BGE-Large](https://huggingface.co/BAAI/bge-large-en-v1.5) | drizzt | AMD Radeon 680M 12GB | 0.8 GPU |
+| `/reranker` | [BGE-Reranker](https://huggingface.co/BAAI/bge-reranker-large) | danilo | Intel Arc 16GB | 0.8 GPU |
+
+**Endpoint**: `ai-inference-serve-svc.ai-ml.svc.cluster.local:8000/{service}`
+
+### ML Serving Stack
 
 | Component | Version | Purpose |
 |-----------|---------|---------|
-| [KServe](https://kserve.github.io) | v0.12+ | Model serving framework |
+| [KubeRay](https://ray-project.github.io/kuberay/) | 1.4+ | Ray cluster operator |
 | [Ray Serve](https://ray.io/serve) | 2.53.0 | Unified inference endpoints |
+| [KServe](https://kserve.github.io) | v0.12+ | Abstraction layer (ExternalName aliases) |
 
 ### ML Workflows
 
diff --git a/decisions/0007-use-kserve-for-inference.md b/decisions/0007-use-kserve-for-inference.md
index da9598f..d3e5695 100644
--- a/decisions/0007-use-kserve-for-inference.md
+++ b/decisions/0007-use-kserve-for-inference.md
@@ -1,7 +1,7 @@
 # Use KServe for ML Model Serving
 
-* Status: accepted
-* Date: 2025-12-15
+* Status: superseded by [ADR-0011](0011-kuberay-unified-gpu-backend.md)
+* Date: 2025-12-15 (Updated: 2026-02-02)
 * Deciders: Billy Davies
 * Technical Story: Selecting model serving platform for inference services
 
@@ -30,6 +30,15 @@ We need to deploy multiple ML models (Whisper, XTTS, BGE, vLLM) as inference end
 
 Chosen option: "KServe InferenceService", because it provides a standardized, Kubernetes-native approach to model serving with built-in autoscaling and traffic management.
 
+**UPDATE (2026-02-02)**: While KServe remains installed, all GPU inference now runs on **KubeRay RayService with Ray Serve** (see [ADR-0011](0011-kuberay-unified-gpu-backend.md)). KServe now serves as an **abstraction layer** via ExternalName services that provide KServe-compatible naming (`{model}-predictor.ai-ml`) while routing to the unified Ray Serve endpoint.
+
+### Current Role of KServe
+
+KServe is retained for:
+- **Service naming convention**: `{model}-predictor.ai-ml.svc.cluster.local`
+- **Future flexibility**: Can be used for non-GPU models or canary deployments
+- **Kubeflow integration**: KServe InferenceServices appear in Kubeflow UI
+
 ### Positive Consequences
 
 * Standardized V2 inference protocol
@@ -90,26 +99,34 @@ Chosen option: "KServe InferenceService", because it provides a standardized, Ku
 
 ## Current Configuration
 
+KServe-compatible ExternalName services route to the unified Ray Serve endpoint:
+
 ```yaml
-apiVersion: serving.kserve.io/v1beta1
-kind: InferenceService
+# KServe-compatible service alias (services-ray-aliases.yaml)
+apiVersion: v1
+kind: Service
 metadata:
-  name: whisper
+  name: whisper-predictor
   namespace: ai-ml
+  labels:
+    serving.kserve.io/inferenceservice: whisper
 spec:
-  predictor:
-    minReplicas: 1
-    maxReplicas: 3
-    containers:
-      - name: whisper
-        image: ghcr.io/org/whisper:latest
-        resources:
-          limits:
-            nvidia.com/gpu: 1
+  type: ExternalName
+  externalName: ai-inference-serve-svc.ai-ml.svc.cluster.local
+  ports:
+    - port: 8000
+      targetPort: 8000
+---
+# Usage: http://whisper-predictor.ai-ml.svc.cluster.local:8000/whisper/...
+# All traffic routes to Ray Serve, which handles GPU allocation
 ```
 
+For the actual Ray Serve configuration, see [ADR-0011](0011-kuberay-unified-gpu-backend.md).
+
 ## Links
 
 * [KServe](https://kserve.github.io)
 * [V2 Inference Protocol](https://kserve.github.io/website/latest/modelserving/data_plane/v2_protocol/)
+* [KubeRay](https://ray-project.github.io/kuberay/)
 * Related: [ADR-0005](0005-multi-gpu-strategy.md) - GPU allocation
+* Superseded by: [ADR-0011](0011-kuberay-unified-gpu-backend.md) - KubeRay unified backend
diff --git a/decisions/0011-kuberay-unified-gpu-backend.md b/decisions/0011-kuberay-unified-gpu-backend.md
new file mode 100644
index 0000000..3bc6535
--- /dev/null
+++ b/decisions/0011-kuberay-unified-gpu-backend.md
@@ -0,0 +1,146 @@
+# Use KubeRay as Unified GPU Backend
+
+* Status: accepted
+* Date: 2026-02-02
+* Deciders: Billy Davies
+* Technical Story: Consolidating GPU inference workloads onto a single Ray cluster
+
+## Context and Problem Statement
+
+We have multiple AI inference services (LLM, STT, TTS, Embeddings, Reranker) running on a heterogeneous GPU fleet (AMD Strix Halo, NVIDIA RTX 2070, AMD 680M iGPU, Intel Arc). Initially, each service was deployed as a standalone KServe InferenceService, including a llama.cpp proof-of-concept for LLM inference. This resulted in:
+
+1. Complex scheduling across GPU types
+2. No GPU sharing (each pod claimed entire GPU)
+3. Multiple containers competing for GPU memory
+4. Inconsistent service discovery patterns
+
+How do we efficiently utilize our GPU fleet while providing unified inference endpoints?
+
+## Decision Drivers
+
+* Fractional GPU allocation (multiple models per GPU)
+* Unified endpoint for all AI services
+* Heterogeneous GPU support (CUDA, ROCm, Intel)
+* Simplified service discovery
+* GPU memory optimization
+* Single point of observability
+
+## Considered Options
+
+* Standalone KServe InferenceServices per model
+* NVIDIA MPS for GPU sharing
+* KubeRay RayService with Ray Serve
+* vLLM standalone deployment
+
+## Decision Outcome
+
+Chosen option: "KubeRay RayService with Ray Serve", because it provides native fractional GPU allocation, supports all GPU types, and unifies all inference services behind a single endpoint with path-based routing.
+
+The llama.cpp proof-of-concept has been deprecated and removed. vLLM now runs as a Ray Serve deployment within the RayService.
+
+### Positive Consequences
+
+* Fractional GPU: Whisper (0.5) + TTS (0.5) share RTX 2070
+* Single service endpoint: `ai-inference-serve-svc:8000/{model}`
+* Path-based routing: `/whisper`, `/tts`, `/llm`, `/embeddings`, `/reranker`
+* GPU-aware scheduling via Ray's resource system
+* Unified metrics and logging through Ray Dashboard
+* Hot-reloading of models without restarting pods
+
+### Negative Consequences
+
+* Ray cluster overhead (head node, dashboard)
+* Learning curve for Ray Serve configuration
+* Custom container images per GPU architecture
+* Less granular scaling (RayService vs per-model replicas)
+
+## Pros and Cons of the Options
+
+### Standalone KServe InferenceServices
+
+* Good, because simple per-model configuration
+* Good, because independent scaling per model
+* Good, because standard Kubernetes resources
+* Bad, because no GPU sharing (1 GPU per pod)
+* Bad, because multiple service endpoints
+* Bad, because scheduling complexity across GPU types
+
+### NVIDIA MPS for GPU sharing
+
+* Good, because transparent GPU sharing
+* Good, because works with existing containers
+* Bad, because NVIDIA-only (no ROCm, no Intel)
+* Bad, because limited memory isolation
+* Bad, because complex setup per node
+
+### KubeRay RayService with Ray Serve
+
+* Good, because fractional GPU allocation
+* Good, because unified endpoint
+* Good, because multi-GPU-vendor support
+* Good, because built-in autoscaling
+* Good, because hot model reloading
+* Bad, because Ray cluster overhead
+* Bad, because custom Ray Serve deployment code
+
+### vLLM standalone deployment
+
+* Good, because optimized for LLM inference
+* Good, because OpenAI-compatible API
+* Bad, because LLM-only (not STT/TTS/Embeddings)
+* Bad, because requires dedicated GPU
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                              KubeRay RayService                              │
+├─────────────────────────────────────────────────────────────────────────────┤
+│  Service: ai-inference-serve-svc:8000                                       │
+│                                                                              │
+│  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐                │
+│  │  /llm           │ │  /whisper       │ │  /tts           │                │
+│  │  vLLM 70B       │ │  Whisper v3     │ │  XTTS           │                │
+│  │  ───────────    │ │  ───────────    │ │  ───────────    │                │
+│  │  khelben        │ │  elminster      │ │  elminster      │                │
+│  │  Strix Halo     │ │  RTX 2070       │ │  RTX 2070       │                │
+│  │  (0.95 GPU)     │ │  (0.5 GPU)      │ │  (0.5 GPU)      │                │
+│  └─────────────────┘ └─────────────────┘ └─────────────────┘                │
+│                                                                              │
+│  ┌─────────────────┐ ┌─────────────────┐                                    │
+│  │  /embeddings    │ │  /reranker      │                                    │
+│  │  BGE-Large      │ │  BGE-Reranker   │                                    │
+│  │  ───────────    │ │  ───────────    │                                    │
+│  │  drizzt         │ │  danilo         │                                    │
+│  │  Radeon 680M    │ │  Intel Arc      │                                    │
+│  │  (0.8 GPU)      │ │  (0.8 GPU)      │                                    │
+│  └─────────────────┘ └─────────────────┘                                    │
+└─────────────────────────────────────────────────────────────────────────────┘
+                                  │
+                                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         KServe Compatibility Layer                           │
+├─────────────────────────────────────────────────────────────────────────────┤
+│  ExternalName Services (KServe-style naming):                               │
+│  • whisper-predictor.ai-ml → ai-inference-serve-svc:8000                    │
+│  • tts-predictor.ai-ml → ai-inference-serve-svc:8000                        │
+│  • embeddings-predictor.ai-ml → ai-inference-serve-svc:8000                 │
+│  • reranker-predictor.ai-ml → ai-inference-serve-svc:8000                   │
+│  • llm-predictor.ai-ml → ai-inference-serve-svc:8000                        │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Migration Notes
+
+1. **Removed**: `kubernetes/apps/ai-ml/llm-inference/` - llama.cpp proof-of-concept
+2. **Added**: Ray Serve deployments in `kuberay/app/rayservice.yaml`
+3. **Added**: KServe-compatible ExternalName services in `kuberay/app/services-ray-aliases.yaml`
+4. **Updated**: All clients now use `ai-inference-serve-svc:8000/{model}`
+
+## Links
+
+* [Ray Serve](https://docs.ray.io/en/latest/serve/)
+* [KubeRay](https://ray-project.github.io/kuberay/)
+* [vLLM on Ray Serve](https://docs.vllm.ai/en/latest/serving/distributed_serving.html)
+* Related: [ADR-0005](0005-multi-gpu-strategy.md) - Multi-GPU strategy
+* Related: [ADR-0007](0007-use-kserve-for-inference.md) - KServe for inference (now abstraction layer)
diff --git a/decisions/0012-use-uv-for-python-development.md b/decisions/0012-use-uv-for-python-development.md
new file mode 100644
index 0000000..7a991b2
--- /dev/null
+++ b/decisions/0012-use-uv-for-python-development.md
@@ -0,0 +1,195 @@
+# Use uv for Python Development, pip for Docker Builds
+
+* Status: accepted
+* Date: 2026-02-02
+* Deciders: Billy Davies
+* Technical Story: Standardizing Python package management across development and production
+
+## Context and Problem Statement
+
+Our Python projects use a mix of `requirements.txt` and `pyproject.toml` for dependency management. Local development with `pip` is slow, and we need a consistent approach across all repositories while maintaining reproducible Docker builds.
+
+## Decision Drivers
+
+* Fast local development iteration
+* Reproducible production builds
+* Modern Python packaging standards (PEP 517/518/621)
+* Lock file support for deterministic installs
+* Compatibility with existing CI/CD pipelines
+
+## Considered Options
+
+* pip only (traditional)
+* Poetry
+* PDM
+* uv (by Astral)
+* uv for development, pip for Docker
+
+## Decision Outcome
+
+Chosen option: "uv for development, pip for Docker", because uv provides extremely fast package resolution and installation for local development (10-100x faster than pip), while pip in Docker ensures maximum compatibility and reproducibility without requiring uv to be installed in production images.
+
+### Positive Consequences
+
+* 10-100x faster package installs during development
+* `uv.lock` provides deterministic dependency resolution
+* `pyproject.toml` is the modern Python standard (PEP 621)
+* Docker builds remain simple with standard pip
+* `uv pip compile` can generate `requirements.txt` from `pyproject.toml`
+* No uv runtime dependency in production containers
+
+### Negative Consequences
+
+* Two tools to maintain (uv locally, pip in Docker)
+* Team must install uv for local development
+* Lock file must be kept in sync with pyproject.toml
+
+## Pros and Cons of the Options
+
+### pip only (traditional)
+
+* Good, because universal compatibility
+* Good, because no additional tools
+* Bad, because slow resolution and installation
+* Bad, because no built-in lock file
+* Bad, because `requirements.txt` lacks metadata
+
+### Poetry
+
+* Good, because mature ecosystem
+* Good, because lock file support
+* Good, because virtual environment management
+* Bad, because slower than uv
+* Bad, because non-standard `pyproject.toml` sections
+* Bad, because complex dependency resolver
+
+### PDM
+
+* Good, because PEP 621 compliant
+* Good, because lock file support
+* Good, because fast resolver
+* Bad, because less adoption than Poetry
+* Bad, because still slower than uv
+
+### uv (by Astral)
+
+* Good, because 10-100x faster than pip
+* Good, because drop-in pip replacement
+* Good, because supports PEP 621 pyproject.toml
+* Good, because uv.lock for deterministic builds
+* Good, because from the creators of Ruff
+* Bad, because newer tool (less mature)
+* Bad, because requires installation
+
+### uv for development, pip for Docker (Chosen)
+
+* Good, because fast local development
+* Good, because simple Docker builds
+* Good, because no uv in production images
+* Good, because pip compatibility maintained
+* Bad, because two tools in workflow
+* Bad, because must sync lock file
+
+## Implementation
+
+### Local Development Setup
+
+```bash
+# Install uv (one-time)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Create virtual environment and install dependencies
+uv venv
+source .venv/bin/activate
+uv pip install -e ".[dev]"
+
+# Or use uv sync with lock file
+uv sync
+```
+
+### Project Structure
+
+```
+my-handler/
+├── pyproject.toml       # PEP 621 project metadata and dependencies
+├── uv.lock              # Deterministic lock file (committed)
+├── requirements.txt     # Generated from uv.lock for Docker (optional)
+├── src/
+│   └── my_handler/
+└── tests/
+```
+
+### pyproject.toml Example
+
+```toml
+[project]
+name = "my-handler"
+version = "1.0.0"
+requires-python = ">=3.11"
+dependencies = [
+    "handler-base @ git+https://git.daviestechlabs.io/daviestechlabs/handler-base.git",
+    "httpx>=0.27.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-asyncio>=0.23.0",
+    "ruff>=0.1.0",
+]
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+```
+
+### Dockerfile Pattern
+
+The Dockerfile uses uv for speed but installs via pip-compatible interface:
+
+```dockerfile
+FROM python:3.13-slim
+
+# Copy uv for fast installs (optional - can use pip directly)
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
+
+# Install from pyproject.toml
+COPY pyproject.toml ./
+RUN uv pip install --system --no-cache .
+
+# OR for maximum reproducibility, use requirements.txt
+COPY requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+```
+
+### Generating requirements.txt from uv.lock
+
+```bash
+# Generate pinned requirements from lock file
+uv pip compile pyproject.toml -o requirements.txt
+
+# Or export from lock
+uv export --format requirements-txt > requirements.txt
+```
+
+## Workflow
+
+1. **Add dependency**: Edit `pyproject.toml`
+2. **Update lock**: Run `uv lock`
+3. **Install locally**: Run `uv sync`
+4. **For Docker**: Optionally generate `requirements.txt` or use `uv pip install` in Dockerfile
+5. **Commit**: Both `pyproject.toml` and `uv.lock`
+
+## Migration Path
+
+1. Create `pyproject.toml` from existing `requirements.txt`
+2. Run `uv lock` to generate `uv.lock`
+3. Update Dockerfile to use pyproject.toml
+4. Delete `requirements.txt` (or keep as generated artifact)
+
+## Links
+
+* [uv Documentation](https://docs.astral.sh/uv/)
+* [PEP 621 - Project Metadata](https://peps.python.org/pep-0621/)
+* [Astral (uv creators)](https://astral.sh/)
+* Related: handler-base already uses uv in Dockerfile