4.6 KiB
Kubeflow Pipeline CI/CD
- Status: accepted
- Date: 2026-02-13
- Deciders: Billy
- Technical Story: Automate compilation and upload of Kubeflow Pipelines on git push
Context and Problem Statement
Kubeflow Pipelines are defined as Python scripts (*_pipeline.py) that compile to YAML IR documents. These must be compiled with kfp and then uploaded to the Kubeflow Pipelines API. Doing this manually is error-prone and easy to forget — a push to main should automatically make pipelines available in the Kubeflow UI.
How do we automate the compile-and-upload lifecycle for Kubeflow Pipelines using the existing Gitea Actions CI infrastructure?
Decision Drivers
- Pipeline definitions change frequently as new ML workflows are added
- Manual
kfp pipeline uploadis tedious and easy to forget - Kubeflow Pipelines API is accessible within the cluster
- Gitea Actions runners already exist (ADR-0031)
- Notifications via ntfy are established (ADR-0015)
Considered Options
- Gitea Actions workflow with in-cluster KFP API access
- Argo Events watching git repo, triggering Argo Workflow to upload
- CronJob polling for changes
- Manual upload via CLI
Decision Outcome
Chosen option: Option 1 — Gitea Actions workflow, because the runners are already in-cluster, the pattern is consistent with other CI workflows (ADR-0031), and it provides immediate feedback via ntfy.
Positive Consequences
- Zero-touch pipeline deployment — push to main and pipelines appear in Kubeflow
- Consistent CI pattern across all repositories
- Version tracking with timestamped tags (
v20260213-143022) - Existing pipelines get new versions; new pipelines are auto-created
- ntfy notifications on success/failure
Negative Consequences
- Requires NetworkPolicy to allow cross-namespace traffic (gitea → kubeflow)
- Pipeline compilation happens in CI, not locally — compilation errors only surface in CI
- KFP SDK version must be pinned in CI to match the cluster
Implementation
Workflow Structure
The workflow (.gitea/workflows/compile-upload.yaml) has two jobs:
| Job | Purpose |
|---|---|
compile-and-upload |
Find *_pipeline.py, compile each with KFP, upload YAML to Kubeflow |
notify |
Send ntfy notification with compile/upload summary |
Pipeline Discovery
on:
push:
branches: [main]
paths:
- "**/*_pipeline.py"
- "**/*pipeline*.py"
workflow_dispatch:
Pipelines are discovered at runtime with find . -maxdepth 1 -name '*_pipeline.py', avoiding shell issues with glob expansion in CI variables. The workflow_dispatch trigger allows manual re-runs.
Upload Strategy
The upload step uses an inline Python script with the KFP client:
- Connect to
ml-pipeline.kubeflow.svc.cluster.local:8888 - For each compiled YAML:
- Check if a pipeline with that name already exists
- Exists → upload as a new version with timestamp tag
- New → create the pipeline
- Report uploaded/failed counts as job outputs
NetworkPolicy Requirement
Gitea Actions runners run in the gitea namespace. Kubeflow's NetworkPolicies default-deny cross-namespace ingress. A dedicated policy was added:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-gitea-ingress
namespace: kubeflow
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: gitea
This joins existing policies for envoy (external access) and ai-ml namespace (pipeline-bridge, kfp-sync-job).
Notification
The notify job sends a summary to ntfy.observability.svc.cluster.local:80/gitea-ci including:
- Compile count and upload count
- Version tag
- Failed pipeline names (on failure)
- Clickable link to the CI run in Gitea
Current Pipelines
| Pipeline | Purpose |
|---|---|
document_ingestion_pipeline |
RAG document processing with MLflow |
evaluation_pipeline |
Model evaluation |
dvd_transcription_pipeline |
DVD audio → transcript via Whisper |
qlora_pdf_pipeline |
QLoRA fine-tune on PDFs from S3 |
voice_cloning_pipeline |
Speaker extraction + VITS voice training |
vllm_tuning_pipeline |
vLLM inference parameter tuning |