adding in proto for mr beaker.

2026-02-26 06:17:00 -05:00
parent b64102d853
commit 6e574ffc4b
1 changed files with 445 additions and 0 deletions
--- a/decisions/0064-waterdeep-coding-agent.md
+++ b/decisions/0064-waterdeep-coding-agent.md
@@ -0,0 +1,445 @@
 # waterdeep (Mac Mini M4 Pro) as Dedicated Coding Agent with Fine-Tuned Model
 * Status: proposed
 * Date: 2026-02-26
 * Deciders: Billy
 * Technical Story: Repurpose waterdeep as a dedicated local coding agent serving a fine-tuned code-completion model for OpenCode, Copilot Chat, and other AI coding tools, with a pipeline for continually tuning the model on the homelab codebase
 ## Context and Problem Statement
 **waterdeep** is a Mac Mini M4 Pro with 48 GB of unified memory ([ADR-0059](0059-mac-mini-ray-worker.md)). Its current role as a 3D avatar creation workstation ([ADR-0059](0059-mac-mini-ray-worker.md)) is being superseded by the automated ComfyUI pipeline ([ADR-0063](0063-comfyui-3d-avatar-pipeline.md)), which handles avatar generation on a personal desktop as an on-demand Ray worker. This frees waterdeep for a higher-value use case.
 GitHub Copilot and cloud-hosted coding assistants work well for general code, but they have no knowledge of DaviesTechLabs-specific patterns: the handler-base module API, NATS protobuf message conventions, Kubeflow pipeline structure, Ray Serve deployment patterns, Flux/Kustomize layout, or the Go handler lifecycle used across chat-handler, voice-assistant, pipeline-bridge, stt-module, and tts-module. A model fine-tuned on the homelab codebase would produce completions that follow project conventions out of the box.
 With 48 GB of unified memory and no other workloads, waterdeep can serve **Qwen 2.5 Coder 32B Instruct** at Q8_0 quantisation (~34 GB) via MLX with ample headroom for KV cache, leaving the machine responsive for the inference server and macOS overhead. This is the largest purpose-built coding model that fits at high quantisation on this hardware, and it consistently outperforms general-purpose 70B models at Q4 on coding benchmarks.
 How should we configure waterdeep as a dedicated coding agent and build a pipeline for fine-tuning the model on our codebase?
 ## Decision Drivers
 * waterdeep's 48 GB unified memory is fully available — no competing workloads after ComfyUI pipeline takeover
 * Qwen 2.5 Coder 32B Instruct is the highest-quality open-source coding model that fits at Q8_0 (~34 GB weights + ~10 GB KV cache headroom)
 * MLX on Apple Silicon provides native Metal-accelerated inference with no framework overhead — purpose-built for M-series chips
 * OpenCode and VS Code Copilot Chat both support OpenAI-compatible API endpoints — a local server is a drop-in replacement
 * The homelab codebase has strong conventions (handler-base, protobuf messages, Kubeflow pipelines, Ray Serve apps, Flux GitOps) that a general model doesn't know
 * Existing training infrastructure ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)) provides Kubeflow Pipelines + MLflow + S3 data flow for fine-tuning orchestration
 * LoRA adapters are small (~50–200 MB) and can be merged into the base model or hot-swapped in mlx-lm-server
 * The cluster's CPU training capacity (126 cores, 378 GB RAM across 14 nodes) can prepare training datasets; waterdeep itself can run the LoRA fine-tune on its Metal GPU
 ## Considered Options
 1. **Qwen 2.5 Coder 32B Instruct (Q8_0) via mlx-lm-server on waterdeep** — fine-tuned with LoRA on the homelab codebase using MLX
 2. **Llama 3.1 70B Instruct (Q4_K_M) via llama.cpp on waterdeep** — larger general-purpose model at aggressive quantisation
 3. **DeepSeek Coder V2 Lite 16B via MLX on waterdeep** — smaller coding model, lower resource usage
 4. **Keep using cloud Copilot only** — no local model, no fine-tuning
 ## Decision Outcome
 Chosen option: **Option 1 — Qwen 2.5 Coder 32B Instruct (Q8_0) via mlx-lm-server**, because it is the best-in-class open-source coding model at a quantisation level that preserves near-full quality, fits comfortably within the 48 GB memory budget with room for KV cache, and MLX provides the optimal inference stack for Apple Silicon. Fine-tuning with LoRA on the homelab codebase will specialise the model to project conventions.
 ### Positive Consequences
 * Purpose-built coding model — Qwen 2.5 Coder 32B tops open-source coding benchmarks (HumanEval, MBPP, BigCodeBench)
 * Q8_0 quantisation preserves >99% of full-precision quality — minimal degradation vs Q4
 * ~34 GB model weights + ~10 GB KV cache headroom = comfortable fit in 48 GB unified memory
 * MLX inference leverages Metal GPU for token generation — fast enough for interactive coding assistance
 * OpenAI-compatible API via mlx-lm-server — works with OpenCode, VS Code Copilot Chat (custom endpoint), Continue.dev, and any OpenAI SDK client
 * Fine-tuned LoRA adapter teaches project-specific patterns: handler-base API, NATS message conventions, Kubeflow pipeline structure, Flux layout
 * LoRA fine-tuning runs directly on waterdeep using mlx-lm — no cluster resources needed for training
 * Adapter files are small (~50–200 MB) — easy to version in Gitea and track in MLflow
 * Fully offline — no cloud dependency, no data leaves the network
 * Frees Copilot quota for non-coding tasks — local model handles bulk code completion
 ### Negative Consequences
 * waterdeep is dedicated to this role — cannot simultaneously serve other workloads (Blender, etc.)
 * Model updates require manual download and conversion to MLX format
 * LoRA fine-tuning quality depends on training data curation — garbage in, garbage out
 * 32B model is slower than cloud Copilot for very long completions — acceptable for interactive use
 * Single point of failure — if waterdeep is down, fall back to cloud Copilot
 ## Pros and Cons of the Options
 ### Option 1: Qwen 2.5 Coder 32B Instruct (Q8_0) via MLX
 * Good, because purpose-built for code — trained on 5.5T tokens of code data
 * Good, because 32B at Q8_0 (~34 GB) fits in 48 GB with KV cache headroom
 * Good, because Q8_0 preserves near-full quality (vs Q4 which drops noticeably on coding tasks)
 * Good, because MLX is Apple's native framework — zero-copy unified memory, Metal GPU kernels
 * Good, because mlx-lm supports LoRA fine-tuning natively — train and serve on the same machine
 * Good, because OpenAI-compatible API (mlx-lm-server) — drop-in for any coding tool
 * Bad, because 32B generates ~15–25 tokens/sec on M4 Pro — adequate but not instant for long outputs
 * Bad, because MLX model format requires conversion from HuggingFace (one-time, scripted)
 ### Option 2: Llama 3.1 70B Instruct (Q4_K_M) via llama.cpp
 * Good, because 70B is a larger, more capable general model
 * Good, because llama.cpp is mature and well-supported on macOS
 * Bad, because Q4_K_M quantisation loses meaningful quality — especially on code tasks where precision matters
 * Bad, because ~42 GB weights leaves only ~6 GB for KV cache — tight, risks OOM on long contexts
 * Bad, because general-purpose model — not trained specifically for code, underperforms Qwen 2.5 Coder 32B on coding benchmarks despite being 2× larger
 * Bad, because slower token generation (~8–12 tok/s) due to larger model size
 * Bad, because llama.cpp doesn't natively support LoRA fine-tuning — need a separate training framework
 ### Option 3: DeepSeek Coder V2 Lite 16B via MLX
 * Good, because smaller model — faster inference (~30–40 tok/s), lighter memory footprint
 * Good, because still a capable coding model
 * Bad, because significantly less capable than Qwen 2.5 Coder 32B on benchmarks
 * Bad, because leaves 30+ GB of unified memory unused — not maximising the hardware
 * Bad, because fewer parameters mean less capacity to absorb fine-tuning knowledge
 ### Option 4: Cloud Copilot only
 * Good, because zero local infrastructure to maintain
 * Good, because always up-to-date with latest model improvements
 * Bad, because no knowledge of homelab-specific conventions — completions require heavy editing
 * Bad, because cloud latency for every completion
 * Bad, because data (code context) leaves the network
 * Bad, because wastes waterdeep's 48 GB of unified memory sitting idle
 ## Architecture
 ### Inference Server
 ```
 ┌──────────────────────────────────────────────────────────────────────────┐
 │  waterdeep (Mac Mini M4 Pro · 48 GB unified · Metal GPU · dedicated)    │
 │                                                                          │
 │  ┌────────────────────────────────────────────────────────────────────┐  │
 │  │  mlx-lm-server (launchd-managed)                                   │  │
 │  │                                                                    │  │
 │  │  Model: Qwen2.5-Coder-32B-Instruct (Q8_0, MLX format)             │  │
 │  │  LoRA:  ~/.mlx-models/adapters/homelab-coder/latest/               │  │
 │  │                                                                    │  │
 │  │  Endpoint: http://waterdeep.lab.daviestechlabs.io:8080/v1          │  │
 │  │  ├── /v1/completions         (code completion, FIM)                │  │
 │  │  ├── /v1/chat/completions    (chat / instruct)                     │  │
 │  │  └── /v1/models              (model listing)                       │  │
 │  │                                                                    │  │
 │  │  Memory: ~34 GB model + ~10 GB KV cache = ~44 GB                   │  │
 │  └────────────────────────────────────────────────────────────────────┘  │
 │                                                                          │
 │  ┌─────────────────────────┐  ┌──────────────────────────────────────┐  │
 │  │  macOS overhead ~3 GB    │  │  Training (on-demand, same GPU)      │  │
 │  │  (kernel, WindowServer,  │  │  mlx-lm LoRA fine-tune               │  │
 │  │   mDNSResponder, etc.)   │  │  (server stopped during training)    │  │
 │  └─────────────────────────┘  └──────────────────────────────────────┘  │
 └──────────────────────────────────────────────────────────────────────────┘
         │
         │ HTTP :8080 (OpenAI-compatible API)
         │
    ┌────┴──────────────────────────────────────────────────────┐
    │                                                            │
    ▼                                                            ▼
 ┌─────────────────────────────┐    ┌─────────────────────────────────────┐
 │  VS Code (any machine)      │    │  OpenCode (terminal, any machine)   │
 │                              │    │                                     │
 │  Copilot Chat / Continue.dev │    │  OPENCODE_MODEL_PROVIDER=openai     │
 │  Custom endpoint →           │    │  OPENAI_API_BASE=                   │
 │  waterdeep:8080/v1           │    │    http://waterdeep:8080/v1         │
 └─────────────────────────────┘    └─────────────────────────────────────┘
 ```
 ### Fine-Tuning Pipeline
 ```
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                        Fine-Tuning Pipeline (Kubeflow)                      │
 │                                                                             │
 │  Trigger: weekly cron or manual (after significant codebase changes)        │
 │                                                                             │
 │  ┌──────────────┐    ┌──────────────────┐    ┌────────────────────────┐    │
 │  │ 1. Clone repos│    │ 2. Build training │    │ 3. Upload dataset to   │    │
 │  │    from Gitea │───▶│    dataset        │───▶│    S3                  │    │
 │  │    (all repos)│    │    (instruction   │    │    training-data/      │    │
 │  │               │    │     pairs + FIM)  │    │    code-finetune/      │    │
 │  └──────────────┘    └──────────────────┘    └──────────┬─────────────┘    │
 │                                                          │                  │
 │  ┌──────────────────────────────────────────────────────┐│                  │
 │  │ 4. Trigger LoRA fine-tune on waterdeep               ││                  │
 │  │    (SSH or webhook → mlx-lm lora on Metal GPU)       │◀                  │
 │  │                                                      │                   │
 │  │    Base: Qwen2.5-Coder-32B-Instruct (MLX Q8_0)      │                   │
 │  │    Method: LoRA (r=16, alpha=32)                     │                   │
 │  │    Data: instruction pairs + fill-in-middle samples  │                   │
 │  │    Epochs: 3–5                                       │                   │
 │  │    Output: adapter weights (~50–200 MB)              │                   │
 │  └──────────────────────┬───────────────────────────────┘                   │
 │                         │                                                    │
 │  ┌──────────────────────▼───────────────────────────────┐                   │
 │  │ 5. Evaluate adapter                                   │                   │
 │  │    • HumanEval pass@1 (baseline vs fine-tuned)        │                   │
 │  │    • Project-specific eval (handler-base patterns,    │                   │
 │  │      Kubeflow pipeline templates, Flux manifests)     │                   │
 │  └──────────────────────┬───────────────────────────────┘                   │
 │                         │                                                    │
 │  ┌──────────────────────▼───┐  ┌────────────────────────────────────────┐   │
 │  │ 6. Push adapter to Gitea │  │ 7. Log metrics to MLflow               │   │
 │  │    code-lora-adapters    │  │    experiment: waterdeep-coder-finetune │   │
 │  │    repo (versioned)      │  │    metrics: eval_loss, humaneval,       │   │
 │  └──────────────────────────┘  │             project_specific_score      │   │
 │                                └────────────────────────────────────────┘   │
 │                                                                             │
 │  ┌─────────────────────────────────────────────────────────────────────┐    │
 │  │ 8. Deploy adapter on waterdeep                                      │    │
 │  │    • Pull latest adapter from Gitea                                 │    │
 │  │    • Restart mlx-lm-server with --adapter-path pointing to new ver  │    │
 │  │    • Smoke test: send test completion requests                      │    │
 │  └─────────────────────────────────────────────────────────────────────┘    │
 └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ### Training Data Preparation
 The training dataset is built from all DaviesTechLabs repositories:
 | Source | Format | Purpose |
 |--------|--------|---------|
 | Go handlers (chat-handler, voice-assistant, etc.) | Instruction pairs | Teach handler-base API patterns, NATS message handling, protobuf encoding |
 | Kubeflow pipelines (kubeflow/*.py) | Instruction pairs | Teach pipeline structure, KFP component patterns, S3 data flow |
 | Ray Serve apps (ray-serve/) | Instruction pairs | Teach Ray Serve deployment, vLLM config, model serving patterns |
 | Flux manifests (homelab-k8s2/) | Instruction pairs | Teach HelmRelease, Kustomization, namespace layout |
 | Argo workflows (argo/*.yaml) | Instruction pairs | Teach WorkflowTemplate patterns, NATS triggers |
 | ADRs (homelab-design/decisions/) | Instruction pairs | Teach architecture rationale and decision format |
 | All source files | Fill-in-middle (FIM) | Teach code completion with project-specific context |
 **Instruction pair example (Go handler):**
 ```json
 {
  "instruction": "Create a new NATS handler module that bridges to an external gRPC service, following the handler-base pattern used in chat-handler and voice-assistant.",
  "output": "package main\n\nimport (\n\t\"context\"\n\t\"os\"\n\t\"os/signal\"\n\t\"syscall\"\n\n\t\"git.daviestechlabs.io/daviestechlabs/handler-base/config\"\n\t\"git.daviestechlabs.io/daviestechlabs/handler-base/handler\"\n\t\"git.daviestechlabs.io/daviestechlabs/handler-base/health\"\n\t..."
 }
 ```
 **Fill-in-middle example:**
 ```json
 {
  "prefix": "func (h *Handler) HandleMessage(ctx context.Context, msg *messages.UserMessage) (*messages.AssistantMessage, error) {\n\t",
  "suffix": "\n\treturn response, nil\n}",
  "middle": "response, err := h.client.Complete(ctx, msg.Content)\n\tif err != nil {\n\t\treturn nil, fmt.Errorf(\"completion failed: %w\", err)\n\t}"
 }
 ```
 ## Implementation Plan
 ### 1. Model Setup
 ```bash
 # Install MLX and mlx-lm via uv (per ADR-0012)
 uv tool install mlx-lm
 # Download and convert Qwen 2.5 Coder 32B Instruct to MLX Q8_0 format
 mlx_lm.convert \
  --hf-path Qwen/Qwen2.5-Coder-32B-Instruct \
  --mlx-path ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
  --quantize \
  --q-bits 8
 # Verify model loads and generates
 mlx_lm.generate \
  --model ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
  --prompt "def fibonacci(n: int) -> int:"
 ```
 ### 2. Inference Server (launchd)
 ```bash
 # Start the server manually first to verify
 mlx_lm.server \
  --model ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
  --adapter-path ~/.mlx-models/adapters/homelab-coder/latest \
  --host 0.0.0.0 \
  --port 8080
 # Verify OpenAI-compatible endpoint
 curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder-32b",
    "messages": [{"role": "user", "content": "Write a Go handler using handler-base that processes NATS messages"}],
    "max_tokens": 512
  }'
 ```
 **launchd plist** (`~/Library/LaunchAgents/io.daviestechlabs.mlx-coder.plist`):
 ```xml
 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
 <plist version="1.0">
 <dict>
  <key>Label</key>
  <string>io.daviestechlabs.mlx-coder</string>
  <key>ProgramArguments</key>
  <array>
    <string>/Users/billy/.local/bin/mlx_lm.server</string>
    <string>--model</string>
    <string>/Users/billy/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8</string>
    <string>--adapter-path</string>
    <string>/Users/billy/.mlx-models/adapters/homelab-coder/latest</string>
    <string>--host</string>
    <string>0.0.0.0</string>
    <string>--port</string>
    <string>8080</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>StandardOutPath</key>
  <string>/Users/billy/.mlx-models/logs/server.log</string>
  <key>StandardErrorPath</key>
  <string>/Users/billy/.mlx-models/logs/server.err</string>
 </dict>
 </plist>
 ```
 ```bash
 # Load the service
 launchctl load ~/Library/LaunchAgents/io.daviestechlabs.mlx-coder.plist
 # Verify it's running
 launchctl list | grep mlx-coder
 curl http://waterdeep.lab.daviestechlabs.io:8080/v1/models
 ```
 ### 3. Client Configuration
 **OpenCode** (`~/.config/opencode/config.json` on any dev machine):
 ```json
 {
  "provider": "openai",
  "model": "qwen2.5-coder-32b",
  "baseURL": "http://waterdeep.lab.daviestechlabs.io:8080/v1"
 }
 ```
 **VS Code** (settings.json — Continue.dev extension):
 ```json
 {
  "continue.models": [
    {
      "title": "waterdeep-coder",
      "provider": "openai",
      "model": "qwen2.5-coder-32b",
      "apiBase": "http://waterdeep.lab.daviestechlabs.io:8080/v1",
      "apiKey": "not-needed"
    }
  ]
 }
 ```
 ### 4. Fine-Tuning on waterdeep (MLX LoRA)
 ```bash
 # Prepare training data (run on cluster via Kubeflow, or locally)
 # Output: train.jsonl and valid.jsonl in chat/instruction format
 # Fine-tune with LoRA using mlx-lm
 mlx_lm.lora \
  --model ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
  --train \
  --data ~/.mlx-models/training-data/homelab-coder/ \
  --adapter-path ~/.mlx-models/adapters/homelab-coder/$(date +%Y%m%d)/ \
  --lora-layers 16 \
  --batch-size 1 \
  --iters 1000 \
  --learning-rate 1e-5 \
  --val-batches 25 \
  --save-every 100
 # Evaluate the adapter
 mlx_lm.generate \
  --model ~/.mlx-models/Qwen2.5-Coder-32B-Instruct-Q8 \
  --adapter-path ~/.mlx-models/adapters/homelab-coder/$(date +%Y%m%d)/ \
  --prompt "Create a new Go NATS handler using handler-base that..."
 # Update the 'latest' symlink
 ln -sfn ~/.mlx-models/adapters/homelab-coder/$(date +%Y%m%d) \
        ~/.mlx-models/adapters/homelab-coder/latest
 # Restart the server to pick up new adapter
 launchctl kickstart -k gui/$(id -u)/io.daviestechlabs.mlx-coder
 ```
 ### 5. Training Data Pipeline (Kubeflow)
 A new `code_finetune_pipeline.py` orchestrates dataset preparation on the cluster:
 ```
 code_finetune_pipeline.yaml
       │
       ├── 1. clone_repos           Clone all DaviesTechLabs repos from Gitea
       ├── 2. extract_patterns      Parse Go, Python, YAML files into instruction pairs
       ├── 3. generate_fim          Create fill-in-middle samples from source files
       ├── 4. deduplicate           Remove near-duplicate samples (MinHash)
       ├── 5. format_dataset        Convert to mlx-lm JSONL format (train + validation split)
       ├── 6. upload_to_s3          Push dataset to s3://training-data/code-finetune/{run_id}/
       └── 7. log_to_mlflow         Log dataset stats (num_samples, token_count, repo_coverage)
 ```
 The actual LoRA fine-tune runs on waterdeep (not the cluster) because:
 - mlx-lm LoRA leverages the M4 Pro's Metal GPU — significantly faster than CPU training
 - The model is already loaded on waterdeep — no need to transfer 34 GB to/from the cluster
 - Training a 32B model with LoRA requires ~40 GB — only waterdeep and khelben have enough memory
 ### 6. Memory Budget
 | Component | Memory |
 |-----------|--------|
 | macOS + system services | ~3 GB |
 | Qwen 2.5 Coder 32B (Q8_0 weights) | ~34 GB |
 | KV cache (8192 context) | ~6 GB |
 | mlx-lm-server overhead | ~1 GB |
 | **Total (inference)** | **~44 GB** |
 | **Headroom** | **~4 GB** |
 During LoRA fine-tuning (server stopped):
 | Component | Memory |
 |-----------|--------|
 | macOS + system services | ~3 GB |
 | Model weights (frozen, Q8_0) | ~34 GB |
 | LoRA adapter gradients + optimizer | ~4 GB |
 | Training batch + activations | ~5 GB |
 | **Total (training)** | **~46 GB** |
 | **Headroom** | **~2 GB** |
 Both workloads fit within the 48 GB budget. Inference and training are mutually exclusive — the server is stopped during fine-tuning runs to reclaim KV cache memory for training.
 ## Security Considerations
 * mlx-lm-server has no authentication — bind to LAN only; waterdeep's firewall blocks external access
 * No code leaves the network — all inference and training is local
 * Training data is sourced exclusively from Gitea (internal repos) — no external data contamination
 * Adapter weights are versioned in Gitea — auditable lineage from training data to deployed model
 * Consider adding a simple API key check via a reverse proxy (Caddy/nginx) if the LAN is not fully trusted
 ## Future Considerations
 * **DGX Spark** ([ADR-0058](0058-training-strategy-cpu-dgx-spark.md)): If acquired, DGX Spark could fine-tune larger coding models (70B+) or run full fine-tunes instead of LoRA. waterdeep would remain the serving endpoint unless the DGX Spark also serves inference.
 * **Adapter hot-swap**: mlx-lm supports loading adapters at request time — could serve multiple fine-tuned adapters (e.g., Go-specific, Python-specific, YAML-specific) from a single base model
 * **RAG augmentation**: Combine the fine-tuned model with a RAG pipeline that retrieves relevant code snippets from Milvus ([ADR-0008](0008-use-milvus-for-vectors.md)) for even better context-aware completions
 * **Continuous fine-tuning**: Trigger the pipeline automatically on Gitea push events via NATS — the model stays current with codebase changes
 * **Evaluation suite**: Build a project-specific eval set (handler-base patterns, pipeline templates, Flux manifests) to measure fine-tuning quality beyond generic benchmarks
 * **Newer models**: As new coding models are released (Qwen 3 Coder, DeepSeek Coder V3, etc.), re-evaluate which model maximises quality within the 48 GB budget
 ## Links
 * Updates: [ADR-0059](0059-mac-mini-ray-worker.md) — waterdeep repurposed from 3D avatar workstation to dedicated coding agent
 * Related: [ADR-0058](0058-training-strategy-cpu-dgx-spark.md) — Training strategy (distributed CPU + DGX Spark path)
 * Related: [ADR-0047](0047-mlflow-experiment-tracking.md) — MLflow experiment tracking
 * Related: [ADR-0054](0054-kubeflow-pipeline-cicd.md) — Kubeflow Pipeline CI/CD
 * Related: [ADR-0012](0012-use-uv-for-python-development.md) — uv for Python development
 * Related: [ADR-0037](0037-node-naming-conventions.md) — Node naming conventions (waterdeep)
 * Related: [ADR-0060](0060-internal-pki-vault.md) — Internal PKI (TLS for waterdeep endpoint)
 * [Qwen 2.5 Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) — Model card
 * [MLX LM](https://github.com/ml-explore/mlx-examples/tree/main/llms/mlx_lm) — Apple MLX language model framework
 * [OpenCode](https://opencode.ai) — Terminal-based AI coding assistant
 * [Continue.dev](https://continue.dev) — VS Code AI coding extension with custom model support