# Kubeflow Pipeline CI/CD * Status: accepted * Date: 2026-02-13 * Deciders: Billy * Technical Story: Automate compilation and upload of Kubeflow Pipelines on git push ## Context and Problem Statement Kubeflow Pipelines are defined as Python scripts (`*_pipeline.py`) that compile to YAML IR documents. These must be compiled with `kfp` and then uploaded to the Kubeflow Pipelines API. Doing this manually is error-prone and easy to forget — a push to `main` should automatically make pipelines available in the Kubeflow UI. How do we automate the compile-and-upload lifecycle for Kubeflow Pipelines using the existing Gitea Actions CI infrastructure? ## Decision Drivers * Pipeline definitions change frequently as new ML workflows are added * Manual `kfp pipeline upload` is tedious and easy to forget * Kubeflow Pipelines API is accessible within the cluster * Gitea Actions runners already exist (ADR-0031) * Notifications via ntfy are established (ADR-0015) ## Considered Options 1. **Gitea Actions workflow with in-cluster KFP API access** 2. **Argo Events watching git repo, triggering Argo Workflow to upload** 3. **CronJob polling for changes** 4. **Manual upload via CLI** ## Decision Outcome Chosen option: **Option 1 — Gitea Actions workflow**, because the runners are already in-cluster, the pattern is consistent with other CI workflows (ADR-0031), and it provides immediate feedback via ntfy. ### Positive Consequences * Zero-touch pipeline deployment — push to main and pipelines appear in Kubeflow * Consistent CI pattern across all repositories * Version tracking with timestamped tags (`v20260213-143022`) * Existing pipelines get new versions; new pipelines are auto-created * ntfy notifications on success/failure ### Negative Consequences * Requires NetworkPolicy to allow cross-namespace traffic (gitea → kubeflow) * Pipeline compilation happens in CI, not locally — compilation errors only surface in CI * KFP SDK version must be pinned in CI to match the cluster ## Implementation ### Workflow Structure The workflow (`.gitea/workflows/compile-upload.yaml`) has two jobs: | Job | Purpose | |-----|---------| | `compile-and-upload` | Find `*_pipeline.py`, compile each with KFP, upload YAML to Kubeflow | | `notify` | Send ntfy notification with compile/upload summary | ### Pipeline Discovery ```yaml on: push: branches: [main] paths: - "**/*_pipeline.py" - "**/*pipeline*.py" workflow_dispatch: ``` Pipelines are discovered at runtime with `find . -maxdepth 1 -name '*_pipeline.py'`, avoiding shell issues with glob expansion in CI variables. The `workflow_dispatch` trigger allows manual re-runs. ### Upload Strategy The upload step uses an inline Python script with the KFP client: 1. Connect to `ml-pipeline.kubeflow.svc.cluster.local:8888` 2. For each compiled YAML: - Check if a pipeline with that name already exists - **Exists** → upload as a new version with timestamp tag - **New** → create the pipeline 3. Report uploaded/failed counts as job outputs ### NetworkPolicy Requirement Gitea Actions runners run in the `gitea` namespace. Kubeflow's NetworkPolicies default-deny cross-namespace ingress. A dedicated policy was added: ```yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-gitea-ingress namespace: kubeflow spec: podSelector: {} policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: gitea ``` This joins existing policies for envoy (external access) and ai-ml namespace (pipeline-bridge, kfp-sync-job). ### Notification The `notify` job sends a summary to `ntfy.observability.svc.cluster.local:80/gitea-ci` including: - Compile count and upload count - Version tag - Failed pipeline names (on failure) - Clickable link to the CI run in Gitea ## Current Pipelines | Pipeline | Purpose | |----------|---------| | `document_ingestion_pipeline` | RAG document processing with MLflow | | `evaluation_pipeline` | Model evaluation | | `dvd_transcription_pipeline` | DVD audio → transcript via Whisper | | `qlora_pdf_pipeline` | QLoRA fine-tune on PDFs from S3 | | `voice_cloning_pipeline` | Speaker extraction + VITS voice training | | `vllm_tuning_pipeline` | vLLM inference parameter tuning | ## Links * Related to [ADR-0009](0009-dual-workflow-engines.md) (Kubeflow Pipelines) * Related to [ADR-0013](0013-gitea-actions-for-ci.md) (Gitea Actions) * Related to [ADR-0015](0015-ci-notifications-and-semantic-versioning.md) (ntfy notifications) * Related to [ADR-0031](0031-gitea-cicd-strategy.md) (Gitea CI/CD patterns) * Related to [ADR-0043](0043-cilium-cni-network-fabric.md) (NetworkPolicy)