feat: Add ML training and batch inference workflows

- batch-inference: LLM inference with optional RAG - qlora-training: QLoRA adapter fine-tuning from Milvus - hybrid-ml-training: Multi-GPU distributed training - coqui-voice-training: XTTS voice cloning - document-ingestion: Ingest documents to Milvus - eventsource-kfp: Argo Events / Kubeflow integration - kfp-integration: Bridge between Argo and Kubeflow
2026-02-01 20:39:42 -05:00
parent a8fc72dd0b
commit 7104698eee
8 changed files with 3365 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,128 @@
-# argo
+# Argo Workflows

+ML training and batch inference workflows for the DaviesTechLabs AI/ML platform.
+
+## Workflows
+
+| Workflow | Description | Trigger |
+|----------|-------------|---------|
+| `batch-inference` | Run LLM inference on batch inputs | `ai.pipeline.trigger` (pipeline="batch-inference") |
+| `qlora-training` | Train QLoRA adapters from Milvus data | `ai.pipeline.trigger` (pipeline="qlora-training") |
+| `hybrid-ml-training` | Multi-GPU distributed training | `ai.pipeline.trigger` (pipeline="hybrid-ml-training") |
+| `coqui-voice-training` | XTTS voice cloning/training | `ai.pipeline.trigger` (pipeline="coqui-voice-training") |
+| `document-ingestion` | Ingest documents into Milvus | `ai.pipeline.trigger` (pipeline="document-ingestion") |
+
+## Integration
+
+| File | Description |
+|------|-------------|
+| `eventsource-kfp.yaml` | Argo Events source for Kubeflow Pipelines integration |
+| `kfp-integration.yaml` | Bridge workflows between Argo and Kubeflow |
+
+## Architecture
+
+```
+NATS (ai.pipeline.trigger)
+         │
+         ▼
+┌─────────────────┐
+│  Argo Events    │
+│  EventSource    │
+└─────────────────┘
+         │
+         ▼
+┌─────────────────┐
+│  Argo Sensor    │
+└─────────────────┘
+         │
+         ▼
+┌─────────────────┐
+│ WorkflowTemplate│
+│  (batch-inf,    │
+│   qlora, etc)   │
+└─────────────────┘
+         │
+         ├──▶ GPU Pods (AMD ROCm / NVIDIA CUDA)
+         ├──▶ Milvus Vector DB
+         ├──▶ vLLM / Ray Serve
+         └──▶ MLflow Tracking
+```
+
+## Workflow Details
+
+### batch-inference
+
+Batch LLM inference with optional RAG:
+
+```bash
+argo submit batch-inference.yaml \
+  -p input-url="s3://bucket/inputs.json" \
+  -p output-url="s3://bucket/outputs.json" \
+  -p use-rag="true" \
+  -p max-tokens="500"
+```
+
+### qlora-training
+
+Fine-tune QLoRA adapters from Milvus knowledge:
+
+```bash
+argo submit qlora-training.yaml \
+  -p reference-model="mistralai/Mistral-7B-Instruct-v0.3" \
+  -p output-name="my-adapter" \
+  -p milvus-collections="docs,wiki" \
+  -p num-epochs="3"
+```
+
+### coqui-voice-training
+
+Train XTTS voice models:
+
+```bash
+argo submit coqui-voice-training.yaml \
+  -p voice-name="my-voice" \
+  -p audio-samples-url="s3://bucket/samples/"
+```
+
+### document-ingestion
+
+Ingest documents into Milvus:
+
+```bash
+argo submit document-ingestion.yaml \
+  -p source-url="s3://bucket/docs/" \
+  -p collection="knowledge_base" \
+  -p chunk-size="512"
+```
+
+## NATS Trigger Format
+
+Workflows are triggered via NATS `ai.pipeline.trigger`:
+
+```json
+{
+  "pipeline": "qlora-training",
+  "parameters": {
+    "reference-model": "mistralai/Mistral-7B-Instruct-v0.3",
+    "output-name": "custom-adapter",
+    "num-epochs": "5"
+  }
+}
+```
+
+## GPU Scheduling
+
+Workflows use node affinity for GPU allocation:
+
+| Node | GPU | Best For |
+|------|-----|----------|
+| khelben | AMD Strix Halo 64GB | Large model training, vLLM |
+| elminster | NVIDIA RTX 2070 | Whisper, XTTS |
+| drizzt | AMD Radeon 680M | Embeddings |
+| danilo | Intel Arc | Reranker |
+
+## Related
+
+- [homelab-design](https://git.daviestechlabs.io/daviestechlabs/homelab-design) - Architecture docs
+- [kuberay-images](https://git.daviestechlabs.io/daviestechlabs/kuberay-images) - Ray worker images
+- [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base) - Handler library
--- a/batch-inference.yaml
+++ b/batch-inference.yaml
@@ -0,0 +1,328 @@
+# Batch Inference Workflow
+# Runs LLM inference on a batch of inputs
+# Triggered via NATS: ai.pipeline.trigger with pipeline="batch-inference"
+---
+apiVersion: argoproj.io/v1alpha1
+kind: WorkflowTemplate
+metadata:
+  name: batch-inference
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: batch-inference
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  entrypoint: batch-inference
+  serviceAccountName: argo-workflow
+  
+  arguments:
+    parameters:
+      - name: input-url
+        description: "URL to JSON file with inference requests"
+      - name: output-url
+        description: "URL to store results (S3 path)"
+        value: ""
+      - name: use-rag
+        value: "true"
+        description: "Whether to use RAG for context"
+      - name: max-tokens
+        value: "500"
+        description: "Maximum tokens per response"
+      - name: temperature
+        value: "0.7"
+        description: "LLM temperature"
+
+  templates:
+    - name: batch-inference
+      dag:
+        tasks:
+          - name: fetch-inputs
+            template: fetch-input-data
+            arguments:
+              parameters:
+                - name: input-url
+                  value: "{{workflow.parameters.input-url}}"
+
+          - name: run-inference
+            template: inference
+            dependencies: [fetch-inputs]
+            arguments:
+              parameters:
+                - name: use-rag
+                  value: "{{workflow.parameters.use-rag}}"
+                - name: max-tokens
+                  value: "{{workflow.parameters.max-tokens}}"
+                - name: temperature
+                  value: "{{workflow.parameters.temperature}}"
+              artifacts:
+                - name: inputs
+                  from: "{{tasks.fetch-inputs.outputs.artifacts.inputs}}"
+
+          - name: upload-results
+            template: upload-output
+            dependencies: [run-inference]
+            when: "{{workflow.parameters.output-url}} != ''"
+            arguments:
+              parameters:
+                - name: output-url
+                  value: "{{workflow.parameters.output-url}}"
+              artifacts:
+                - name: results
+                  from: "{{tasks.run-inference.outputs.artifacts.results}}"
+
+    - name: fetch-input-data
+      inputs:
+        parameters:
+          - name: input-url
+      outputs:
+        artifacts:
+          - name: inputs
+            path: /tmp/inputs
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import json
+            import urllib.request
+            from pathlib import Path
+
+            input_url = "{{inputs.parameters.input-url}}"
+            output_dir = Path("/tmp/inputs")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            print(f"Fetching inputs from: {input_url}")
+
+            if input_url.startswith("s3://"):
+                import subprocess
+                subprocess.run(["pip", "install", "boto3", "-q"], check=True)
+                import boto3
+                s3 = boto3.client("s3")
+                bucket, key = input_url[5:].split("/", 1)
+                s3.download_file(bucket, key, str(output_dir / "inputs.json"))
+            elif input_url.startswith("http"):
+                urllib.request.urlretrieve(input_url, output_dir / "inputs.json")
+            else:
+                print(f"Unsupported URL scheme: {input_url}")
+                exit(1)
+
+            # Validate JSON structure
+            with open(output_dir / "inputs.json") as f:
+                data = json.load(f)
+            
+            if "requests" not in data:
+                print("Error: JSON must contain 'requests' array")
+                exit(1)
+            
+            print(f"Loaded {len(data['requests'])} inference requests")
+        resources:
+          requests:
+            memory: 256Mi
+            cpu: 100m
+
+    - name: inference
+      inputs:
+        parameters:
+          - name: use-rag
+          - name: max-tokens
+          - name: temperature
+        artifacts:
+          - name: inputs
+            path: /tmp/inputs
+      outputs:
+        artifacts:
+          - name: results
+            path: /tmp/results
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import subprocess
+            subprocess.run(["pip", "install", "httpx", "pymilvus", "-q"], check=True)
+            
+            import json
+            import httpx
+            from pathlib import Path
+            from typing import List, Dict
+
+            # Configuration
+            VLLM_URL = "http://llm-draft.ai-ml.svc.cluster.local:8000"
+            EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
+            RERANKER_URL = "http://reranker-predictor.ai-ml.svc.cluster.local"
+            MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
+            LLM_MODEL = "mistralai/Mistral-7B-Instruct-v0.3"
+            
+            use_rag = "{{inputs.parameters.use-rag}}" == "true"
+            max_tokens = int("{{inputs.parameters.max-tokens}}")
+            temperature = float("{{inputs.parameters.temperature}}")
+
+            input_dir = Path("/tmp/inputs")
+            output_dir = Path("/tmp/results")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            # Load inputs
+            with open(input_dir / "inputs.json") as f:
+                data = json.load(f)
+            requests = data["requests"]
+            
+            print(f"Processing {len(requests)} requests (RAG: {use_rag})")
+
+            # Initialize Milvus if using RAG
+            collection = None
+            if use_rag:
+                try:
+                    from pymilvus import connections, Collection, utility
+                    connections.connect(host=MILVUS_HOST, port=19530)
+                    if utility.has_collection("knowledge_base"):
+                        collection = Collection("knowledge_base")
+                        collection.load()
+                        print("Milvus connected")
+                except Exception as e:
+                    print(f"Milvus connection failed: {e}")
+                    use_rag = False
+
+            def get_embeddings(texts: List[str], client: httpx.Client) -> List[List[float]]:
+                response = client.post(
+                    f"{EMBEDDINGS_URL}/embeddings",
+                    json={"input": texts, "model": "bge"}
+                )
+                result = response.json()
+                return [d["embedding"] for d in result.get("data", [])]
+
+            def search_milvus(embedding: List[float]) -> List[Dict]:
+                results = collection.search(
+                    data=[embedding],
+                    anns_field="embedding",
+                    param={"metric_type": "COSINE", "params": {"ef": 64}},
+                    limit=5,
+                    output_fields=["text", "source"]
+                )
+                docs = []
+                for hits in results:
+                    for hit in hits:
+                        docs.append({
+                            "text": hit.entity.get("text", ""),
+                            "source": hit.entity.get("source", ""),
+                            "score": hit.score
+                        })
+                return docs
+
+            def rerank(query: str, documents: List[str], client: httpx.Client) -> List[Dict]:
+                response = client.post(
+                    f"{RERANKER_URL}/v1/rerank",
+                    json={"query": query, "documents": documents}
+                )
+                return response.json().get("results", [])
+
+            # Process requests
+            results = []
+            with httpx.Client(timeout=120.0) as client:
+                for i, req in enumerate(requests):
+                    query = req.get("text", req.get("query", ""))
+                    req_id = req.get("id", str(i))
+                    
+                    print(f"Processing {i+1}/{len(requests)}: {query[:50]}...")
+                    
+                    context = ""
+                    rag_sources = []
+                    
+                    if use_rag and collection:
+                        try:
+                            # Get embeddings and search
+                            embeddings = get_embeddings([query], client)
+                            if embeddings:
+                                docs = search_milvus(embeddings[0])
+                                if docs:
+                                    doc_texts = [d["text"] for d in docs]
+                                    reranked = rerank(query, doc_texts, client)
+                                    sorted_docs = sorted(reranked, key=lambda x: x.get("relevance_score", 0), reverse=True)[:3]
+                                    context = "\n\n".join([doc_texts[d["index"]] for d in sorted_docs])
+                                    rag_sources = [docs[d["index"]].get("source", "") for d in sorted_docs]
+                        except Exception as e:
+                            print(f"  RAG failed: {e}")
+                    
+                    # Generate response
+                    try:
+                        messages = [{"role": "system", "content": "You are a helpful AI assistant."}]
+                        if context:
+                            messages.append({"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"})
+                        else:
+                            messages.append({"role": "user", "content": query})
+                        
+                        response = client.post(
+                            f"{VLLM_URL}/v1/chat/completions",
+                            json={
+                                "model": LLM_MODEL,
+                                "messages": messages,
+                                "max_tokens": max_tokens,
+                                "temperature": temperature
+                            }
+                        )
+                        result = response.json()
+                        answer = result["choices"][0]["message"]["content"]
+                    except Exception as e:
+                        answer = f"Error: {e}"
+                    
+                    results.append({
+                        "id": req_id,
+                        "query": query,
+                        "response": answer,
+                        "used_rag": bool(context),
+                        "rag_sources": rag_sources
+                    })
+
+            # Save results
+            with open(output_dir / "results.json", "w") as f:
+                json.dump({"results": results}, f, indent=2)
+            
+            print(f"Completed {len(results)} inferences")
+            
+            if collection:
+                from pymilvus import connections
+                connections.disconnect("default")
+        envFrom:
+          - configMapRef:
+              name: ai-services-config
+        resources:
+          requests:
+            memory: 1Gi
+            cpu: 500m
+
+    - name: upload-output
+      inputs:
+        parameters:
+          - name: output-url
+        artifacts:
+          - name: results
+            path: /tmp/results
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import subprocess
+            subprocess.run(["pip", "install", "boto3", "-q"], check=True)
+            
+            import boto3
+            from pathlib import Path
+
+            output_url = "{{inputs.parameters.output-url}}"
+            results_file = Path("/tmp/results/results.json")
+
+            print(f"Uploading results to: {output_url}")
+
+            if output_url.startswith("s3://"):
+                s3 = boto3.client("s3")
+                bucket, key = output_url[5:].split("/", 1)
+                s3.upload_file(str(results_file), bucket, key)
+                print("Upload complete")
+            else:
+                print(f"Unsupported URL scheme: {output_url}")
+                exit(1)
+        resources:
+          requests:
+            memory: 256Mi
+            cpu: 100m
--- a/coqui-voice-training.yaml
+++ b/coqui-voice-training.yaml
@@ -0,0 +1,969 @@
+# Coqui TTS Voice Training Workflow
+# Trains a custom voice model using Coqui TTS from audio samples
+# Triggered via NATS: ai.pipeline.trigger with pipeline="coqui-voice-training"
+---
+apiVersion: argoproj.io/v1alpha1
+kind: WorkflowTemplate
+metadata:
+  name: coqui-voice-training
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: coqui-voice-training
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  entrypoint: train-voice
+  serviceAccountName: argo-workflow
+  
+  arguments:
+    parameters:
+      - name: audio-source
+        description: "URL to audio files (S3 bucket, HTTP, or NFS path with .wav/.mp3 files)"
+      - name: transcripts-source
+        description: "URL to transcripts file (CSV with audio_file,transcript columns) - leave empty to auto-transcribe"
+        value: ""
+      - name: voice-name
+        description: "Name for the trained voice model"
+        value: "custom-voice"
+      - name: base-model
+        description: "Base TTS model to fine-tune from"
+        value: "tts_models/en/ljspeech/vits"
+      - name: language
+        description: "Language code (e.g., en, de, fr, es)"
+        value: "en"
+      - name: num-epochs
+        description: "Number of training epochs"
+        value: "100"
+      - name: batch-size
+        description: "Training batch size"
+        value: "16"
+      - name: learning-rate
+        description: "Learning rate for training"
+        value: "0.0001"
+      - name: sample-rate
+        description: "Target sample rate for audio (Hz)"
+        value: "22050"
+      - name: output-path
+        description: "Path to store the trained model (S3 or NFS)"
+        value: "/models/tts/custom"
+
+  volumeClaimTemplates:
+    - metadata:
+        name: training-workspace
+      spec:
+        accessModes: ["ReadWriteMany"]
+        storageClassName: nfs-slow
+        resources:
+          requests:
+            storage: 50Gi
+
+  templates:
+    - name: train-voice
+      dag:
+        tasks:
+          - name: fetch-audio
+            template: fetch-audio-files
+            arguments:
+              parameters:
+                - name: audio-source
+                  value: "{{workflow.parameters.audio-source}}"
+
+          - name: fetch-transcripts
+            template: fetch-transcript-file
+            arguments:
+              parameters:
+                - name: transcripts-source
+                  value: "{{workflow.parameters.transcripts-source}}"
+
+          - name: preprocess-audio
+            template: preprocess
+            dependencies: [fetch-audio]
+            arguments:
+              parameters:
+                - name: sample-rate
+                  value: "{{workflow.parameters.sample-rate}}"
+              artifacts:
+                - name: raw-audio
+                  from: "{{tasks.fetch-audio.outputs.artifacts.audio-files}}"
+
+          - name: generate-transcripts
+            template: transcribe-audio
+            dependencies: [preprocess-audio, fetch-transcripts]
+            when: "{{workflow.parameters.transcripts-source}} == ''"
+            arguments:
+              parameters:
+                - name: language
+                  value: "{{workflow.parameters.language}}"
+              artifacts:
+                - name: audio-files
+                  from: "{{tasks.preprocess-audio.outputs.artifacts.processed-audio}}"
+
+          - name: prepare-dataset
+            template: prepare-coqui-dataset
+            dependencies: [preprocess-audio, generate-transcripts, fetch-transcripts]
+            arguments:
+              parameters:
+                - name: voice-name
+                  value: "{{workflow.parameters.voice-name}}"
+                - name: language
+                  value: "{{workflow.parameters.language}}"
+              artifacts:
+                - name: audio-files
+                  from: "{{tasks.preprocess-audio.outputs.artifacts.processed-audio}}"
+                - name: transcripts
+                  from: "{{=workflow.parameters.transcriptsSource != '' ? tasks.fetch-transcripts.outputs.artifacts.transcripts : tasks.generate-transcripts.outputs.artifacts.transcripts}}"
+                  optional: true
+
+          - name: train-model
+            template: train-tts
+            dependencies: [prepare-dataset]
+            arguments:
+              parameters:
+                - name: voice-name
+                  value: "{{workflow.parameters.voice-name}}"
+                - name: base-model
+                  value: "{{workflow.parameters.base-model}}"
+                - name: language
+                  value: "{{workflow.parameters.language}}"
+                - name: num-epochs
+                  value: "{{workflow.parameters.num-epochs}}"
+                - name: batch-size
+                  value: "{{workflow.parameters.batch-size}}"
+                - name: learning-rate
+                  value: "{{workflow.parameters.learning-rate}}"
+              artifacts:
+                - name: dataset
+                  from: "{{tasks.prepare-dataset.outputs.artifacts.dataset}}"
+
+          - name: export-model
+            template: export-trained-model
+            dependencies: [train-model]
+            arguments:
+              parameters:
+                - name: voice-name
+                  value: "{{workflow.parameters.voice-name}}"
+                - name: output-path
+                  value: "{{workflow.parameters.output-path}}"
+              artifacts:
+                - name: trained-model
+                  from: "{{tasks.train-model.outputs.artifacts.model}}"
+
+    # Template: Fetch audio files from source
+    - name: fetch-audio-files
+      inputs:
+        parameters:
+          - name: audio-source
+      outputs:
+        artifacts:
+          - name: audio-files
+            path: /tmp/audio
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import os
+            import subprocess
+            import urllib.request
+            from pathlib import Path
+            import shutil
+
+            source_url = "{{inputs.parameters.audio-source}}"
+            output_dir = Path("/tmp/audio")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            print(f"Fetching audio from: {source_url}")
+
+            if source_url.startswith("s3://"):
+                subprocess.run(["pip", "install", "boto3", "-q"], check=True)
+                import boto3
+                s3 = boto3.client("s3")
+                bucket, prefix = source_url[5:].split("/", 1)
+                response = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
+                
+                audio_extensions = {".wav", ".mp3", ".flac", ".ogg", ".m4a"}
+                for obj in response.get("Contents", []):
+                    key = obj["Key"]
+                    if Path(key).suffix.lower() in audio_extensions:
+                        local_path = output_dir / Path(key).name
+                        s3.download_file(bucket, key, str(local_path))
+                        print(f"Downloaded: {key}")
+            
+            elif source_url.startswith("http"):
+                # Handle single file or directory listing
+                filename = source_url.split("/")[-1]
+                if any(ext in filename.lower() for ext in [".wav", ".mp3", ".flac", ".zip"]):
+                    local_path = output_dir / filename
+                    urllib.request.urlretrieve(source_url, local_path)
+                    print(f"Downloaded: {filename}")
+                    
+                    # Extract if zip
+                    if filename.endswith(".zip"):
+                        shutil.unpack_archive(local_path, output_dir)
+                        os.remove(local_path)
+                        print("Extracted zip archive")
+                else:
+                    print(f"URL doesn't appear to be an audio file: {source_url}")
+                    exit(1)
+            
+            elif source_url.startswith("/"):
+                # Local/NFS path
+                src_path = Path(source_url)
+                if src_path.is_dir():
+                    audio_extensions = {".wav", ".mp3", ".flac", ".ogg", ".m4a"}
+                    for f in src_path.iterdir():
+                        if f.suffix.lower() in audio_extensions:
+                            shutil.copy(f, output_dir / f.name)
+                            print(f"Copied: {f.name}")
+                elif src_path.is_file():
+                    shutil.copy(src_path, output_dir / src_path.name)
+                else:
+                    print(f"Path not found: {source_url}")
+                    exit(1)
+            else:
+                print(f"Unsupported source: {source_url}")
+                exit(1)
+
+            # Count files
+            audio_files = list(output_dir.glob("*"))
+            print(f"Total audio files: {len(audio_files)}")
+            
+            if len(audio_files) == 0:
+                print("Error: No audio files found!")
+                exit(1)
+        resources:
+          requests:
+            memory: 512Mi
+            cpu: 200m
+
+    # Template: Fetch transcripts file
+    - name: fetch-transcript-file
+      inputs:
+        parameters:
+          - name: transcripts-source
+      outputs:
+        artifacts:
+          - name: transcripts
+            path: /tmp/transcripts
+            optional: true
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import os
+            import subprocess
+            import urllib.request
+            from pathlib import Path
+            import shutil
+
+            source_url = "{{inputs.parameters.transcripts-source}}"
+            output_dir = Path("/tmp/transcripts")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            if not source_url or source_url.strip() == "":
+                print("No transcripts source provided - will auto-transcribe")
+                # Create empty placeholder
+                (output_dir / "placeholder.txt").write_text("auto-transcribe")
+                exit(0)
+
+            print(f"Fetching transcripts from: {source_url}")
+
+            if source_url.startswith("s3://"):
+                subprocess.run(["pip", "install", "boto3", "-q"], check=True)
+                import boto3
+                s3 = boto3.client("s3")
+                bucket, key = source_url[5:].split("/", 1)
+                local_path = output_dir / Path(key).name
+                s3.download_file(bucket, key, str(local_path))
+                print(f"Downloaded: {key}")
+            
+            elif source_url.startswith("http"):
+                filename = source_url.split("/")[-1] or "transcripts.csv"
+                local_path = output_dir / filename
+                urllib.request.urlretrieve(source_url, local_path)
+                print(f"Downloaded: {filename}")
+            
+            elif source_url.startswith("/"):
+                src_path = Path(source_url)
+                if src_path.is_file():
+                    shutil.copy(src_path, output_dir / src_path.name)
+                    print(f"Copied: {src_path.name}")
+                else:
+                    print(f"File not found: {source_url}")
+                    exit(1)
+            else:
+                print(f"Unsupported source: {source_url}")
+                exit(1)
+        resources:
+          requests:
+            memory: 256Mi
+            cpu: 100m
+
+    # Template: Preprocess audio files
+    - name: preprocess
+      inputs:
+        parameters:
+          - name: sample-rate
+        artifacts:
+          - name: raw-audio
+            path: /tmp/raw-audio
+      outputs:
+        artifacts:
+          - name: processed-audio
+            path: /tmp/processed-audio
+      container:
+        image: python:3.13-slim
+        command: [bash]
+        args:
+          - -c
+          - |
+            set -e
+            
+            # Install ffmpeg and dependencies
+            apt-get update && apt-get install -y ffmpeg > /dev/null 2>&1
+            pip install -q pydub numpy soundfile
+            
+            python3 << 'EOF'
+            import os
+            from pathlib import Path
+            from pydub import AudioSegment
+            import soundfile as sf
+
+            SAMPLE_RATE = int("{{inputs.parameters.sample-rate}}")
+            input_dir = Path("/tmp/raw-audio")
+            output_dir = Path("/tmp/processed-audio")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            audio_extensions = {".wav", ".mp3", ".flac", ".ogg", ".m4a"}
+            
+            for audio_file in input_dir.iterdir():
+                if audio_file.suffix.lower() not in audio_extensions:
+                    continue
+                    
+                print(f"Processing: {audio_file.name}")
+                
+                try:
+                    # Load audio
+                    audio = AudioSegment.from_file(str(audio_file))
+                    
+                    # Convert to mono if stereo
+                    if audio.channels > 1:
+                        audio = audio.set_channels(1)
+                    
+                    # Resample to target sample rate
+                    audio = audio.set_frame_rate(SAMPLE_RATE)
+                    
+                    # Normalize audio
+                    audio = audio.normalize()
+                    
+                    # Export as WAV
+                    output_file = output_dir / f"{audio_file.stem}.wav"
+                    audio.export(str(output_file), format="wav")
+                    print(f"  -> Saved: {output_file.name}")
+                    
+                except Exception as e:
+                    print(f"  -> Error processing {audio_file.name}: {e}")
+                    continue
+
+            processed_files = list(output_dir.glob("*.wav"))
+            print(f"\nProcessed {len(processed_files)} audio files")
+            
+            if len(processed_files) == 0:
+                print("Error: No files were successfully processed!")
+                exit(1)
+            EOF
+        resources:
+          requests:
+            memory: 2Gi
+            cpu: "1"
+
+    # Template: Auto-transcribe audio using Coqui STT
+    - name: transcribe-audio
+      inputs:
+        parameters:
+          - name: language
+        artifacts:
+          - name: audio-files
+            path: /tmp/audio
+      outputs:
+        artifacts:
+          - name: transcripts
+            path: /tmp/transcripts
+      container:
+        image: ghcr.io/coqui-ai/stt:latest
+        command: [bash]
+        args:
+          - -c
+          - |
+            set -e
+            
+            # Install additional dependencies
+            pip install -q numpy scipy
+            
+            python3 << 'EOF'
+            import csv
+            import os
+            import wave
+            import numpy as np
+            from pathlib import Path
+            from stt import Model
+
+            LANGUAGE = "{{inputs.parameters.language}}"
+            input_dir = Path("/tmp/audio")
+            output_dir = Path("/tmp/transcripts")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            # Model paths - Coqui STT models are typically pre-installed in the container
+            # or can be downloaded from https://coqui.ai/models
+            MODEL_DIR = Path("/models/stt")
+            
+            # Try to find model files
+            model_file = None
+            scorer_file = None
+            
+            # Check for language-specific models
+            lang_model_dir = MODEL_DIR / LANGUAGE
+            if lang_model_dir.exists():
+                for f in lang_model_dir.glob("*.tflite"):
+                    model_file = f
+                for f in lang_model_dir.glob("*.scorer"):
+                    scorer_file = f
+            
+            # Fallback to default English model location
+            if model_file is None:
+                default_paths = [
+                    MODEL_DIR / "model.tflite",
+                    Path("/usr/share/stt/model.tflite"),
+                    Path("/opt/stt/model.tflite"),
+                ]
+                for p in default_paths:
+                    if p.exists():
+                        model_file = p
+                        break
+            
+            if model_file is None:
+                # Download model if not found
+                print("Downloading Coqui STT model...")
+                import urllib.request
+                import tarfile
+                
+                model_url = "https://github.com/coqui-ai/STT-models/releases/download/english/coqui-stt-1.0.0-lg-vocab.tflite"
+                scorer_url = "https://github.com/coqui-ai/STT-models/releases/download/english/coqui-stt-1.0.0-lg-vocab.scorer"
+                
+                MODEL_DIR.mkdir(parents=True, exist_ok=True)
+                model_file = MODEL_DIR / "model.tflite"
+                scorer_file = MODEL_DIR / "model.scorer"
+                
+                urllib.request.urlretrieve(model_url, model_file)
+                urllib.request.urlretrieve(scorer_url, scorer_file)
+                print("Model downloaded successfully")
+
+            print(f"Loading Coqui STT model: {model_file}")
+            model = Model(str(model_file))
+            
+            if scorer_file and scorer_file.exists():
+                print(f"Loading scorer: {scorer_file}")
+                model.enableExternalScorer(str(scorer_file))
+
+            transcripts = []
+            
+            for audio_file in sorted(input_dir.glob("*.wav")):
+                print(f"Transcribing: {audio_file.name}")
+                
+                try:
+                    # Read WAV file
+                    with wave.open(str(audio_file), 'rb') as w:
+                        sample_rate = w.getframerate()
+                        frames = w.getnframes()
+                        audio_data = w.readframes(frames)
+                    
+                    # Convert to int16 array
+                    audio = np.frombuffer(audio_data, dtype=np.int16)
+                    
+                    # Resample if needed (Coqui STT expects 16kHz)
+                    if sample_rate != 16000:
+                        from scipy import signal
+                        audio = signal.resample(audio, int(len(audio) * 16000 / sample_rate))
+                        audio = audio.astype(np.int16)
+                    
+                    # Run inference
+                    text = model.stt(audio)
+                    
+                    transcripts.append({
+                        "audio_file": audio_file.name,
+                        "transcript": text
+                    })
+                    print(f"  -> {text[:100] if text else '(empty)'}...")
+                except Exception as e:
+                    print(f"  -> Error: {e}")
+                    continue
+
+            # Write CSV
+            csv_file = output_dir / "transcripts.csv"
+            with open(csv_file, "w", newline="", encoding="utf-8") as f:
+                writer = csv.DictWriter(f, fieldnames=["audio_file", "transcript"])
+                writer.writeheader()
+                writer.writerows(transcripts)
+
+            print(f"\nTranscribed {len(transcripts)} files")
+            print(f"Saved to: {csv_file}")
+            EOF
+        resources:
+          requests:
+            memory: 4Gi
+            cpu: "2"
+          limits:
+            memory: 8Gi
+            cpu: "4"
+
+    # Template: Prepare dataset in Coqui TTS format
+    - name: prepare-coqui-dataset
+      inputs:
+        parameters:
+          - name: voice-name
+          - name: language
+        artifacts:
+          - name: audio-files
+            path: /tmp/audio
+          - name: transcripts
+            path: /tmp/transcripts
+            optional: true
+      outputs:
+        artifacts:
+          - name: dataset
+            path: /tmp/dataset
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import csv
+            import json
+            import os
+            import shutil
+            from pathlib import Path
+
+            VOICE_NAME = "{{inputs.parameters.voice-name}}"
+            LANGUAGE = "{{inputs.parameters.language}}"
+            
+            audio_dir = Path("/tmp/audio")
+            transcripts_dir = Path("/tmp/transcripts")
+            output_dir = Path("/tmp/dataset")
+            wavs_dir = output_dir / "wavs"
+            wavs_dir.mkdir(parents=True, exist_ok=True)
+
+            print(f"Preparing Coqui TTS dataset for voice: {VOICE_NAME}")
+
+            # Find transcripts file
+            transcripts_file = None
+            for f in transcripts_dir.glob("*.csv"):
+                transcripts_file = f
+                break
+            
+            if transcripts_file is None:
+                # Check for .txt files (simple format: filename|text)
+                for f in transcripts_dir.glob("*.txt"):
+                    if f.name != "placeholder.txt":
+                        transcripts_file = f
+                        break
+
+            if transcripts_file is None:
+                print("Error: No transcripts file found!")
+                exit(1)
+
+            print(f"Using transcripts: {transcripts_file}")
+
+            # Parse transcripts
+            transcripts = {}
+            
+            if transcripts_file.suffix == ".csv":
+                with open(transcripts_file, "r", encoding="utf-8") as f:
+                    reader = csv.DictReader(f)
+                    for row in reader:
+                        # Handle various column name conventions
+                        audio = row.get("audio_file") or row.get("audio") or row.get("file") or row.get("wav")
+                        text = row.get("transcript") or row.get("text") or row.get("sentence")
+                        if audio and text:
+                            transcripts[audio] = text.strip()
+            else:
+                # Simple pipe-separated format: filename|text
+                with open(transcripts_file, "r", encoding="utf-8") as f:
+                    for line in f:
+                        line = line.strip()
+                        if "|" in line:
+                            parts = line.split("|", 1)
+                            if len(parts) == 2:
+                                transcripts[parts[0]] = parts[1]
+
+            print(f"Loaded {len(transcripts)} transcripts")
+
+            # Copy audio files and create metadata
+            metadata_lines = []
+            
+            for audio_file in sorted(audio_dir.glob("*.wav")):
+                # Try to match transcript
+                text = None
+                for key in [audio_file.name, audio_file.stem, audio_file.stem + ".wav"]:
+                    if key in transcripts:
+                        text = transcripts[key]
+                        break
+                
+                if text is None:
+                    print(f"Warning: No transcript for {audio_file.name}, skipping")
+                    continue
+                
+                # Copy audio file
+                dest_file = wavs_dir / audio_file.name
+                shutil.copy(audio_file, dest_file)
+                
+                # Add to metadata (LJSpeech format: filename|text|text)
+                # Coqui uses: audio_file|text|text (normalized text optional)
+                metadata_lines.append(f"{audio_file.stem}|{text}|{text}")
+
+            # Write metadata.csv
+            metadata_file = output_dir / "metadata.csv"
+            with open(metadata_file, "w", encoding="utf-8") as f:
+                f.write("\n".join(metadata_lines))
+
+            print(f"Created dataset with {len(metadata_lines)} samples")
+
+            # Create dataset config
+            config = {
+                "name": VOICE_NAME,
+                "language": LANGUAGE,
+                "num_samples": len(metadata_lines),
+                "format": "ljspeech"
+            }
+            
+            with open(output_dir / "dataset_config.json", "w") as f:
+                json.dump(config, f, indent=2)
+
+            print(f"Dataset ready at: {output_dir}")
+            
+            if len(metadata_lines) < 10:
+                print("Warning: Very small dataset! Recommend at least 100+ samples for good results.")
+        resources:
+          requests:
+            memory: 1Gi
+            cpu: 500m
+
+    # Template: Train Coqui TTS model
+    - name: train-tts
+      inputs:
+        parameters:
+          - name: voice-name
+          - name: base-model
+          - name: language
+          - name: num-epochs
+          - name: batch-size
+          - name: learning-rate
+        artifacts:
+          - name: dataset
+            path: /tmp/dataset
+      outputs:
+        artifacts:
+          - name: model
+            path: /tmp/output
+      container:
+        image: ghcr.io/coqui-ai/tts:latest
+        command: [bash]
+        args:
+          - -c
+          - |
+            set -e
+            
+            VOICE_NAME="{{inputs.parameters.voice-name}}"
+            BASE_MODEL="{{inputs.parameters.base-model}}"
+            LANGUAGE="{{inputs.parameters.language}}"
+            NUM_EPOCHS="{{inputs.parameters.num-epochs}}"
+            BATCH_SIZE="{{inputs.parameters.batch-size}}"
+            LEARNING_RATE="{{inputs.parameters.learning-rate}}"
+            
+            DATASET_DIR="/tmp/dataset"
+            OUTPUT_DIR="/tmp/output"
+            mkdir -p "$OUTPUT_DIR"
+            
+            echo "=== Coqui TTS Voice Training ==="
+            echo "Voice Name: $VOICE_NAME"
+            echo "Base Model: $BASE_MODEL"
+            echo "Language: $LANGUAGE"
+            echo "Epochs: $NUM_EPOCHS"
+            echo "Batch Size: $BATCH_SIZE"
+            echo "Learning Rate: $LEARNING_RATE"
+            echo ""
+            
+            # Download base model if specified for fine-tuning
+            RESTORE_PATH=""
+            if [ "$BASE_MODEL" != "" ] && [ "$BASE_MODEL" != "none" ]; then
+                echo "Downloading base model for fine-tuning: $BASE_MODEL"
+                # Use tts to download the model and get its path
+                MODEL_PATH=$(python3 -c "
+            from TTS.utils.manage import ModelManager
+            from TTS.utils.synthesizer import Synthesizer
+            from pathlib import Path
+            import os
+
+            model_name = '$BASE_MODEL'
+            manager = ModelManager()
+
+            # Download the model
+            model_path, config_path, _ = manager.download_model(model_name)
+            print(model_path)
+            ")
+                RESTORE_PATH="$MODEL_PATH"
+                echo "Base model path: $RESTORE_PATH"
+            fi
+            
+            # Create and run training script following Coqui docs pattern
+            python3 << EOF
+            import os
+            from pathlib import Path
+
+            # Trainer: Where the magic happens
+            from trainer import Trainer, TrainerArgs
+
+            # Model configs
+            from TTS.tts.configs.vits_config import VitsConfig
+            from TTS.tts.configs.shared_configs import BaseDatasetConfig
+            from TTS.tts.datasets import load_tts_samples
+            from TTS.tts.models.vits import Vits
+            from TTS.tts.utils.text.tokenizer import TTSTokenizer
+            from TTS.utils.audio import AudioProcessor
+
+            # Paths
+            DATASET_DIR = Path("$DATASET_DIR")
+            OUTPUT_DIR = Path("$OUTPUT_DIR")
+            RESTORE_PATH = "$RESTORE_PATH" if "$RESTORE_PATH" else None
+
+            print(f"Dataset: {DATASET_DIR}")
+            print(f"Output: {OUTPUT_DIR}")
+            print(f"Restore from: {RESTORE_PATH}")
+
+            # Define dataset config (LJSpeech format)
+            dataset_config = BaseDatasetConfig(
+                formatter="ljspeech",
+                meta_file_train="metadata.csv",
+                path=str(DATASET_DIR),
+                language="$LANGUAGE",
+            )
+
+            # Initialize training configuration
+            config = VitsConfig(
+                run_name="$VOICE_NAME",
+                output_path=str(OUTPUT_DIR),
+                datasets=[dataset_config],
+                batch_size=int("$BATCH_SIZE"),
+                eval_batch_size=max(1, int("$BATCH_SIZE") // 2),
+                num_loader_workers=4,
+                num_eval_loader_workers=2,
+                run_eval=True,
+                test_delay_epochs=5,
+                epochs=int("$NUM_EPOCHS"),
+                text_cleaner="phoneme_cleaners",
+                use_phonemes=True,
+                phoneme_language="$LANGUAGE",
+                phoneme_cache_path=str(OUTPUT_DIR / "phoneme_cache"),
+                compute_input_seq_cache=True,
+                print_step=25,
+                print_eval=False,
+                mixed_precision=True,
+                save_step=500,
+                save_n_checkpoints=3,
+                save_best_after=1000,
+                lr=float("$LEARNING_RATE"),
+                # Audio settings for typical voice cloning
+                audio={
+                    "sample_rate": 22050,
+                    "resample": True,
+                    "do_trim_silence": True,
+                    "trim_db": 45,
+                },
+            )
+
+            # Initialize the audio processor
+            # Used for feature extraction and audio I/O
+            ap = AudioProcessor.init_from_config(config)
+
+            # Initialize the tokenizer
+            # Converts text to sequences of token IDs
+            tokenizer, config = TTSTokenizer.init_from_config(config)
+
+            # Load data samples
+            # Each sample is [text, audio_file_path, speaker_name]
+            train_samples, eval_samples = load_tts_samples(
+                dataset_config,
+                eval_split=True,
+                eval_split_max_size=config.eval_split_max_size,
+                eval_split_size=config.eval_split_size,
+            )
+
+            print(f"Training samples: {len(train_samples)}")
+            print(f"Eval samples: {len(eval_samples)}")
+
+            # Initialize the model
+            model = Vits(config, ap, tokenizer, speaker_manager=None)
+
+            # Set up trainer arguments
+            trainer_args = TrainerArgs(
+                restore_path=RESTORE_PATH,
+                skip_train_epoch=False,
+            )
+
+            # Initialize the trainer
+            trainer = Trainer(
+                trainer_args,
+                config,
+                output_path=str(OUTPUT_DIR),
+                model=model,
+                train_samples=train_samples,
+                eval_samples=eval_samples,
+            )
+
+            # Start training
+            print("\n" + "=" * 50)
+            print("Starting training...")
+            print("=" * 50 + "\n")
+
+            trainer.fit()
+
+            print("\n" + "=" * 50)
+            print("Training complete!")
+            print("=" * 50)
+            EOF
+            
+            echo ""
+            echo "Training complete!"
+            echo "Model saved to: $OUTPUT_DIR"
+            ls -la "$OUTPUT_DIR"
+        resources:
+          requests:
+            memory: 16Gi
+            cpu: "4"
+            nvidia.com/gpu: "1"
+          limits:
+            memory: 32Gi
+            cpu: "8"
+            nvidia.com/gpu: "1"
+        volumeMounts:
+          - name: training-workspace
+            mountPath: /tmp/workspace
+
+    # Template: Export trained model
+    - name: export-trained-model
+      inputs:
+        parameters:
+          - name: voice-name
+          - name: output-path
+        artifacts:
+          - name: trained-model
+            path: /tmp/trained-model
+      outputs:
+        artifacts:
+          - name: exported-model
+            path: /tmp/exported
+      container:
+        image: python:3.13-slim
+        command: [bash]
+        args:
+          - -c
+          - |
+            set -e
+            
+            pip install -q boto3
+            
+            python3 << 'EOF'
+            import json
+            import os
+            import shutil
+            import subprocess
+            from pathlib import Path
+            from datetime import datetime
+
+            VOICE_NAME = "{{inputs.parameters.voice-name}}"
+            OUTPUT_PATH = "{{inputs.parameters.output-path}}"
+            
+            model_dir = Path("/tmp/trained-model")
+            export_dir = Path("/tmp/exported")
+            export_dir.mkdir(parents=True, exist_ok=True)
+
+            print(f"Exporting trained model: {VOICE_NAME}")
+            print(f"Target path: {OUTPUT_PATH}")
+
+            # Find best checkpoint
+            checkpoints = list(model_dir.glob("best_model*.pth")) + list(model_dir.glob("checkpoint_*.pth"))
+            if not checkpoints:
+                checkpoints = list(model_dir.glob("*.pth"))
+            
+            if not checkpoints:
+                print("Error: No model checkpoints found!")
+                exit(1)
+
+            # Sort by modification time and get newest
+            checkpoints.sort(key=lambda x: x.stat().st_mtime, reverse=True)
+            best_checkpoint = checkpoints[0]
+            print(f"Using checkpoint: {best_checkpoint.name}")
+
+            # Create export package
+            package_dir = export_dir / VOICE_NAME
+            package_dir.mkdir(parents=True, exist_ok=True)
+
+            # Copy model files
+            shutil.copy(best_checkpoint, package_dir / "model.pth")
+            
+            # Copy config if exists
+            config_file = model_dir / "config.json"
+            if config_file.exists():
+                shutil.copy(config_file, package_dir / "config.json")
+
+            # Create model info
+            model_info = {
+                "name": VOICE_NAME,
+                "created_at": datetime.now().isoformat(),
+                "checkpoint": best_checkpoint.name,
+                "type": "coqui-tts"
+            }
+            
+            with open(package_dir / "model_info.json", "w") as f:
+                json.dump(model_info, f, indent=2)
+
+            # Create tarball
+            archive_name = f"{VOICE_NAME}.tar.gz"
+            shutil.make_archive(
+                str(export_dir / VOICE_NAME),
+                "gztar",
+                export_dir,
+                VOICE_NAME
+            )
+            
+            print(f"Created archive: {archive_name}")
+
+            # Upload to destination
+            if OUTPUT_PATH.startswith("s3://"):
+                import boto3
+                s3 = boto3.client("s3")
+                bucket, key = OUTPUT_PATH[5:].split("/", 1)
+                key = f"{key}/{archive_name}"
+                s3.upload_file(str(export_dir / archive_name), bucket, key)
+                print(f"Uploaded to: s3://{bucket}/{key}")
+            
+            elif OUTPUT_PATH.startswith("/"):
+                # Local/NFS path
+                dest_path = Path(OUTPUT_PATH)
+                dest_path.mkdir(parents=True, exist_ok=True)
+                shutil.copy(export_dir / archive_name, dest_path / archive_name)
+                # Also copy uncompressed for easy access
+                shutil.copytree(package_dir, dest_path / VOICE_NAME, dirs_exist_ok=True)
+                print(f"Saved to: {dest_path / archive_name}")
+            
+            print("\nExport complete!")
+            print(f"Model package contents:")
+            for f in package_dir.iterdir():
+                print(f"  - {f.name}")
+            EOF
+        resources:
+          requests:
+            memory: 1Gi
+            cpu: 500m
--- a/document-ingestion.yaml
+++ b/document-ingestion.yaml
@@ -0,0 +1,369 @@
+# Document Ingestion Workflow
+# Ingests documents from a source URL into Milvus vector database
+# Triggered via NATS: ai.pipeline.trigger with pipeline="document-ingestion"
+---
+apiVersion: argoproj.io/v1alpha1
+kind: WorkflowTemplate
+metadata:
+  name: document-ingestion
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: document-ingestion
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  entrypoint: ingest-documents
+  serviceAccountName: argo-workflow
+  
+  arguments:
+    parameters:
+      - name: source-url
+        description: "URL to fetch documents from (S3, HTTP, or local path)"
+      - name: collection-name
+        value: "knowledge_base"
+        description: "Milvus collection name"
+      - name: chunk-size
+        value: "512"
+        description: "Text chunk size in characters"
+      - name: chunk-overlap
+        value: "50"
+        description: "Overlap between chunks"
+
+  templates:
+    - name: ingest-documents
+      dag:
+        tasks:
+          - name: fetch-documents
+            template: fetch-docs
+            arguments:
+              parameters:
+                - name: source-url
+                  value: "{{workflow.parameters.source-url}}"
+
+          - name: chunk-documents
+            template: chunk-docs
+            dependencies: [fetch-documents]
+            arguments:
+              parameters:
+                - name: chunk-size
+                  value: "{{workflow.parameters.chunk-size}}"
+                - name: chunk-overlap
+                  value: "{{workflow.parameters.chunk-overlap}}"
+              artifacts:
+                - name: documents
+                  from: "{{tasks.fetch-documents.outputs.artifacts.documents}}"
+
+          - name: generate-embeddings
+            template: embed-docs
+            dependencies: [chunk-documents]
+            arguments:
+              artifacts:
+                - name: chunks
+                  from: "{{tasks.chunk-documents.outputs.artifacts.chunks}}"
+
+          - name: store-in-milvus
+            template: store-docs
+            dependencies: [generate-embeddings]
+            arguments:
+              parameters:
+                - name: collection-name
+                  value: "{{workflow.parameters.collection-name}}"
+              artifacts:
+                - name: embeddings
+                  from: "{{tasks.generate-embeddings.outputs.artifacts.embeddings}}"
+
+    - name: fetch-docs
+      inputs:
+        parameters:
+          - name: source-url
+      outputs:
+        artifacts:
+          - name: documents
+            path: /tmp/documents
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import json
+            import os
+            import urllib.request
+            from pathlib import Path
+
+            source_url = "{{inputs.parameters.source-url}}"
+            output_dir = Path("/tmp/documents")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            print(f"Fetching documents from: {source_url}")
+
+            # Handle different source types
+            if source_url.startswith("s3://"):
+                import subprocess
+                subprocess.run(["pip", "install", "boto3", "-q"], check=True)
+                import boto3
+                s3 = boto3.client("s3")
+                bucket, prefix = source_url[5:].split("/", 1)
+                response = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
+                for obj in response.get("Contents", []):
+                    key = obj["Key"]
+                    local_path = output_dir / Path(key).name
+                    s3.download_file(bucket, key, str(local_path))
+                    print(f"Downloaded: {key}")
+            elif source_url.startswith("http"):
+                # Single file download
+                filename = source_url.split("/")[-1] or "document.txt"
+                local_path = output_dir / filename
+                urllib.request.urlretrieve(source_url, local_path)
+                print(f"Downloaded: {filename}")
+            else:
+                print(f"Unsupported URL scheme: {source_url}")
+                exit(1)
+
+            # List downloaded files
+            files = list(output_dir.glob("*"))
+            print(f"Downloaded {len(files)} files")
+            
+            # Create manifest
+            manifest = {"files": [str(f) for f in files]}
+            with open(output_dir / "manifest.json", "w") as f:
+                json.dump(manifest, f)
+        resources:
+          requests:
+            memory: 256Mi
+            cpu: 100m
+
+    - name: chunk-docs
+      inputs:
+        parameters:
+          - name: chunk-size
+          - name: chunk-overlap
+        artifacts:
+          - name: documents
+            path: /tmp/documents
+      outputs:
+        artifacts:
+          - name: chunks
+            path: /tmp/chunks
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import json
+            from pathlib import Path
+
+            chunk_size = int("{{inputs.parameters.chunk-size}}")
+            chunk_overlap = int("{{inputs.parameters.chunk-overlap}}")
+            
+            input_dir = Path("/tmp/documents")
+            output_dir = Path("/tmp/chunks")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            # Load manifest
+            with open(input_dir / "manifest.json") as f:
+                manifest = json.load(f)
+
+            all_chunks = []
+            
+            for filepath in manifest["files"]:
+                filepath = Path(filepath)
+                if not filepath.exists():
+                    continue
+                    
+                print(f"Processing: {filepath.name}")
+                
+                # Read file content
+                try:
+                    with open(filepath, "r", encoding="utf-8") as f:
+                        content = f.read()
+                except Exception as e:
+                    print(f"Error reading {filepath}: {e}")
+                    continue
+
+                # Simple chunking
+                chunks = []
+                start = 0
+                while start < len(content):
+                    end = start + chunk_size
+                    chunk = content[start:end]
+                    if chunk.strip():
+                        chunks.append({
+                            "text": chunk,
+                            "source": filepath.name,
+                            "chunk_index": len(chunks)
+                        })
+                    start = end - chunk_overlap
+
+                all_chunks.extend(chunks)
+                print(f"  Created {len(chunks)} chunks")
+
+            # Save chunks
+            with open(output_dir / "chunks.json", "w") as f:
+                json.dump({"chunks": all_chunks}, f)
+            
+            print(f"Total chunks: {len(all_chunks)}")
+        resources:
+          requests:
+            memory: 512Mi
+            cpu: 100m
+
+    - name: embed-docs
+      inputs:
+        artifacts:
+          - name: chunks
+            path: /tmp/chunks
+      outputs:
+        artifacts:
+          - name: embeddings
+            path: /tmp/embeddings
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import subprocess
+            subprocess.run(["pip", "install", "httpx", "-q"], check=True)
+            
+            import json
+            import httpx
+            from pathlib import Path
+
+            EMBEDDINGS_URL = "http://embeddings-predictor.ai-ml.svc.cluster.local"
+            BATCH_SIZE = 32
+
+            input_dir = Path("/tmp/chunks")
+            output_dir = Path("/tmp/embeddings")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            # Load chunks
+            with open(input_dir / "chunks.json") as f:
+                data = json.load(f)
+            chunks = data["chunks"]
+            
+            print(f"Generating embeddings for {len(chunks)} chunks")
+
+            # Generate embeddings in batches
+            all_embeddings = []
+            with httpx.Client(timeout=120.0) as client:
+                for i in range(0, len(chunks), BATCH_SIZE):
+                    batch = chunks[i:i+BATCH_SIZE]
+                    texts = [c["text"] for c in batch]
+                    
+                    response = client.post(
+                        f"{EMBEDDINGS_URL}/embeddings",
+                        json={"input": texts, "model": "bge"}
+                    )
+                    result = response.json()
+                    
+                    for j, emb_data in enumerate(result.get("data", [])):
+                        all_embeddings.append({
+                            "text": batch[j]["text"],
+                            "source": batch[j]["source"],
+                            "chunk_index": batch[j]["chunk_index"],
+                            "embedding": emb_data["embedding"]
+                        })
+                    
+                    print(f"  Processed batch {i//BATCH_SIZE + 1}/{(len(chunks)-1)//BATCH_SIZE + 1}")
+
+            # Save embeddings
+            with open(output_dir / "embeddings.json", "w") as f:
+                json.dump({"embeddings": all_embeddings}, f)
+            
+            print(f"Generated {len(all_embeddings)} embeddings")
+        envFrom:
+          - configMapRef:
+              name: ai-services-config
+        resources:
+          requests:
+            memory: 1Gi
+            cpu: 200m
+
+    - name: store-docs
+      inputs:
+        parameters:
+          - name: collection-name
+        artifacts:
+          - name: embeddings
+            path: /tmp/embeddings
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import subprocess
+            subprocess.run(["pip", "install", "pymilvus", "-q"], check=True)
+            
+            import json
+            from pathlib import Path
+            from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
+
+            MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
+            MILVUS_PORT = 19530
+            COLLECTION_NAME = "{{inputs.parameters.collection-name}}"
+            EMBEDDING_DIM = 1024  # BGE-large dimension
+
+            input_dir = Path("/tmp/embeddings")
+
+            # Load embeddings
+            with open(input_dir / "embeddings.json") as f:
+                data = json.load(f)
+            embeddings = data["embeddings"]
+            
+            print(f"Storing {len(embeddings)} embeddings in Milvus")
+
+            # Connect to Milvus
+            connections.connect(host=MILVUS_HOST, port=MILVUS_PORT)
+            print("Connected to Milvus")
+
+            # Create collection if not exists
+            if not utility.has_collection(COLLECTION_NAME):
+                fields = [
+                    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
+                    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
+                    FieldSchema(name="source", dtype=DataType.VARCHAR, max_length=1024),
+                    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM)
+                ]
+                schema = CollectionSchema(fields, description="Knowledge base documents")
+                collection = Collection(COLLECTION_NAME, schema)
+                
+                # Create HNSW index
+                index_params = {
+                    "metric_type": "COSINE",
+                    "index_type": "HNSW",
+                    "params": {"M": 16, "efConstruction": 256}
+                }
+                collection.create_index("embedding", index_params)
+                print(f"Created collection: {COLLECTION_NAME}")
+            else:
+                collection = Collection(COLLECTION_NAME)
+                print(f"Using existing collection: {COLLECTION_NAME}")
+
+            # Insert data in batches
+            BATCH_SIZE = 100
+            for i in range(0, len(embeddings), BATCH_SIZE):
+                batch = embeddings[i:i+BATCH_SIZE]
+                
+                data = [
+                    [e["text"] for e in batch],
+                    [e["source"] for e in batch],
+                    [e["embedding"] for e in batch]
+                ]
+                
+                collection.insert(data)
+                print(f"  Inserted batch {i//BATCH_SIZE + 1}/{(len(embeddings)-1)//BATCH_SIZE + 1}")
+
+            # Flush to ensure data is persisted
+            collection.flush()
+            print(f"Successfully stored {len(embeddings)} documents")
+
+            connections.disconnect("default")
+        envFrom:
+          - configMapRef:
+              name: ai-services-config
+        resources:
+          requests:
+            memory: 512Mi
+            cpu: 100m
--- a/eventsource-kfp.yaml
+++ b/eventsource-kfp.yaml
@@ -0,0 +1,270 @@
+# Argo Events - EventSource for KFP and NATS integration
+# Enables bidirectional triggering between Argo Workflows and Kubeflow Pipelines
+---
+apiVersion: argoproj.io/v1alpha1
+kind: EventSource
+metadata:
+  name: kfp-events
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: kfp-events
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  service:
+    ports:
+      - name: webhook
+        port: 12000
+        targetPort: 12000
+  # Webhook to receive KFP pipeline completion events
+  webhook:
+    kfp-completion:
+      port: "12000"
+      endpoint: /kfp/completion
+      method: POST
+    kfp-failure:
+      port: "12000"
+      endpoint: /kfp/failure
+      method: POST
+  # NATS for receiving pipeline trigger requests
+  nats:
+    pipeline-trigger:
+      url: nats://nats.ai-ml.svc.cluster.local:4222
+      subject: ai.pipeline.trigger
+      jsonBody: true
+    argo-trigger:
+      url: nats://nats.ai-ml.svc.cluster.local:4222
+      subject: ai.argo.trigger
+      jsonBody: true
+    kfp-trigger:
+      url: nats://nats.ai-ml.svc.cluster.local:4222
+      subject: ai.kfp.trigger
+      jsonBody: true
+---
+# Sensor for handling KFP completion events
+apiVersion: argoproj.io/v1alpha1
+kind: Sensor
+metadata:
+  name: kfp-completion-sensor
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: kfp-completion-sensor
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  dependencies:
+    - name: kfp-success
+      eventSourceName: kfp-events
+      eventName: kfp-completion
+      filters:
+        data:
+          - path: body.status
+            type: string
+            value:
+              - "SUCCEEDED"
+    - name: kfp-failure
+      eventSourceName: kfp-events
+      eventName: kfp-failure
+  triggers:
+    # On KFP success, publish to NATS
+    - template:
+        name: notify-kfp-success
+        nats:
+          url: nats://nats.ai-ml.svc.cluster.local:4222
+          subject: ai.pipeline.status.completed
+          payload:
+            - src:
+                dependencyName: kfp-success
+                dataKey: body.run_id
+              dest: run_id
+            - src:
+                dependencyName: kfp-success
+                dataKey: body.pipeline_name
+              dest: pipeline_name
+            - src:
+                dependencyName: kfp-success
+                dataKey: body.status
+              dest: status
+      retryStrategy:
+        steps: 3
+    # On KFP failure, trigger recovery workflow
+    - template:
+        name: kfp-failure-recovery
+        k8s:
+          operation: create
+          source:
+            resource:
+              apiVersion: argoproj.io/v1alpha1
+              kind: Workflow
+              metadata:
+                generateName: kfp-failure-handler-
+                namespace: ai-ml
+              spec:
+                entrypoint: notify-failure
+                arguments:
+                  parameters:
+                    - name: run-id
+                    - name: pipeline-name
+                    - name: error-message
+                templates:
+                  - name: notify-failure
+                    inputs:
+                      parameters:
+                        - name: run-id
+                        - name: pipeline-name
+                        - name: error-message
+                    script:
+                      image: python:3.13-slim
+                      command: [python]
+                      source: |
+                        import subprocess
+                        import sys
+                        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "nats-py"])
+
+                        import asyncio
+                        import json
+                        import nats
+
+                        async def notify():
+                            nc = await nats.connect("nats://nats.ai-ml.svc.cluster.local:4222")
+                            await nc.publish(
+                                "ai.pipeline.status.failed",
+                                json.dumps({
+                                    "run_id": "{{inputs.parameters.run-id}}",
+                                    "pipeline_name": "{{inputs.parameters.pipeline-name}}",
+                                    "error": "{{inputs.parameters.error-message}}",
+                                    "source": "kubeflow"
+                                }).encode()
+                            )
+                            await nc.close()
+
+                        asyncio.run(notify())
+          parameters:
+            - src:
+                dependencyName: kfp-failure
+                dataKey: body.run_id
+              dest: spec.arguments.parameters.0.value
+            - src:
+                dependencyName: kfp-failure
+                dataKey: body.pipeline_name
+              dest: spec.arguments.parameters.1.value
+            - src:
+                dependencyName: kfp-failure
+                dataKey: body.error
+              dest: spec.arguments.parameters.2.value
+      retryStrategy:
+        steps: 3
+---
+# Sensor for NATS-triggered Argo Workflows
+apiVersion: argoproj.io/v1alpha1
+kind: Sensor
+metadata:
+  name: nats-argo-sensor
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: nats-argo-sensor
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  dependencies:
+    - name: argo-trigger
+      eventSourceName: kfp-events
+      eventName: argo-trigger
+  triggers:
+    - template:
+        name: trigger-argo-workflow
+        k8s:
+          operation: create
+          source:
+            resource:
+              apiVersion: argoproj.io/v1alpha1
+              kind: Workflow
+              metadata:
+                generateName: nats-triggered-
+                namespace: ai-ml
+              spec:
+                workflowTemplateRef:
+                  name: placeholder
+                arguments:
+                  parameters: []
+          parameters:
+            - src:
+                dependencyName: argo-trigger
+                dataKey: body.template
+              dest: spec.workflowTemplateRef.name
+            - src:
+                dependencyName: argo-trigger
+                dataKey: body.parameters
+              dest: spec.arguments.parameters
+      retryStrategy:
+        steps: 3
+---
+# Sensor for NATS-triggered KFP Pipelines
+apiVersion: argoproj.io/v1alpha1
+kind: Sensor
+metadata:
+  name: nats-kfp-sensor
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: nats-kfp-sensor
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  dependencies:
+    - name: kfp-trigger
+      eventSourceName: kfp-events
+      eventName: kfp-trigger
+  triggers:
+    # Trigger KFP via Argo Workflow (uses kfp-trigger template)
+    - template:
+        name: trigger-kfp-via-argo
+        k8s:
+          operation: create
+          source:
+            resource:
+              apiVersion: argoproj.io/v1alpha1
+              kind: Workflow
+              metadata:
+                generateName: kfp-via-nats-
+                namespace: ai-ml
+              spec:
+                workflowTemplateRef:
+                  name: kfp-trigger
+                arguments:
+                  parameters:
+                    - name: pipeline-id
+                      value: ""
+                    - name: pipeline-params
+                      value: "{}"
+                    - name: wait-for-completion
+                      value: "true"
+          parameters:
+            - src:
+                dependencyName: kfp-trigger
+                dataKey: body.pipeline_id
+              dest: spec.arguments.parameters.0.value
+            - src:
+                dependencyName: kfp-trigger
+                dataKey: body.parameters
+              dest: spec.arguments.parameters.1.value
+              operation: "stringify"
+            - src:
+                dependencyName: kfp-trigger
+                dataKey: body.wait
+              dest: spec.arguments.parameters.2.value
+              operation: "stringify"
+      retryStrategy:
+        steps: 3
+---
+# Service for the EventSource webhook
+apiVersion: v1
+kind: Service
+metadata:
+  name: kfp-events-webhook
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: kfp-events
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  selector:
+    eventsource-name: kfp-events
+  ports:
+    - name: webhook
+      port: 12000
+      targetPort: 12000
--- a/hybrid-ml-training.yaml
+++ b/hybrid-ml-training.yaml
@@ -0,0 +1,555 @@
+# Hybrid ML Training Workflow
+# Combines Argo Workflows orchestration with Kubeflow Pipeline ML components
+# Use case: Train a model using data from Milvus, with checkpointing and evaluation
+---
+apiVersion: argoproj.io/v1alpha1
+kind: WorkflowTemplate
+metadata:
+  name: hybrid-ml-training
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: hybrid-ml-training
+    app.kubernetes.io/part-of: llm-workflows
+  annotations:
+    description: |
+      Demonstrates hybrid Argo+KFP workflow:
+      - Argo handles orchestration, branching, retry logic
+      - KFP pipelines handle ML-specific operations (with caching)
+      - NATS for status updates to frontends
+spec:
+  entrypoint: hybrid-training
+  serviceAccountName: argo-workflow
+
+  # Artifact repository for model checkpoints
+  artifactRepositoryRef:
+    configMap: artifact-repository
+    key: default
+
+  arguments:
+    parameters:
+      - name: collection-name
+        description: "Milvus collection to pull training data from"
+        value: "dnd_text_embeddings"
+      - name: model-name
+        description: "Base model for fine-tuning"
+        value: "mistralai/Mistral-7B-v0.3"
+      - name: lora-rank
+        description: "LoRA rank (higher = more params)"
+        value: "16"
+      - name: epochs
+        description: "Training epochs"
+        value: "3"
+      - name: batch-size
+        description: "Training batch size"
+        value: "4"
+      - name: output-path
+        description: "S3 path for model output"
+        value: "s3://models/lora-adapters"
+      - name: notify-nats
+        description: "Publish status to NATS"
+        value: "true"
+
+  # Volumes for GPU caching
+  volumes:
+    - name: model-cache
+      persistentVolumeClaim:
+        claimName: model-cache-pvc
+    - name: shm
+      emptyDir:
+        medium: Memory
+        sizeLimit: 16Gi
+
+  templates:
+    # Main DAG orchestrating the workflow
+    - name: hybrid-training
+      dag:
+        tasks:
+          - name: notify-start
+            template: nats-notify
+            when: "{{workflow.parameters.notify-nats}} == true"
+            arguments:
+              parameters:
+                - name: subject
+                  value: "ai.pipeline.status.{{workflow.name}}"
+                - name: message
+                  value: '{"status": "started", "pipeline": "hybrid-ml-training"}'
+
+          - name: prepare-data
+            template: extract-training-data
+            arguments:
+              parameters:
+                - name: collection-name
+                  value: "{{workflow.parameters.collection-name}}"
+
+          - name: validate-data
+            template: validate-dataset
+            dependencies: [prepare-data]
+            arguments:
+              artifacts:
+                - name: dataset
+                  from: "{{tasks.prepare-data.outputs.artifacts.dataset}}"
+
+          # KFP Pipeline: Run embedding generation if needed
+          - name: generate-embeddings
+            template: trigger-kfp
+            dependencies: [validate-data]
+            when: "{{tasks.validate-data.outputs.parameters.needs-embeddings}} == true"
+            arguments:
+              parameters:
+                - name: pipeline-id
+                  value: "embedding-generation"
+                - name: params
+                  value: '{"input_path": "{{tasks.prepare-data.outputs.parameters.data-path}}"}'
+
+          # Training step (runs on GPU)
+          - name: train-lora
+            template: lora-training
+            dependencies: [validate-data, generate-embeddings]
+            arguments:
+              parameters:
+                - name: model-name
+                  value: "{{workflow.parameters.model-name}}"
+                - name: lora-rank
+                  value: "{{workflow.parameters.lora-rank}}"
+                - name: epochs
+                  value: "{{workflow.parameters.epochs}}"
+                - name: batch-size
+                  value: "{{workflow.parameters.batch-size}}"
+              artifacts:
+                - name: dataset
+                  from: "{{tasks.prepare-data.outputs.artifacts.dataset}}"
+
+          # Evaluate model
+          - name: evaluate
+            template: evaluate-model
+            dependencies: [train-lora]
+            arguments:
+              artifacts:
+                - name: adapter
+                  from: "{{tasks.train-lora.outputs.artifacts.adapter}}"
+
+          # Branch based on evaluation results
+          - name: check-quality
+            template: quality-gate
+            dependencies: [evaluate]
+            arguments:
+              parameters:
+                - name: eval-score
+                  value: "{{tasks.evaluate.outputs.parameters.score}}"
+                - name: threshold
+                  value: "0.7"
+
+          # If quality is good, upload to S3
+          - name: upload-model
+            template: upload-to-s3
+            dependencies: [check-quality]
+            when: "{{tasks.check-quality.outputs.parameters.passed}} == true"
+            arguments:
+              parameters:
+                - name: output-path
+                  value: "{{workflow.parameters.output-path}}"
+              artifacts:
+                - name: adapter
+                  from: "{{tasks.train-lora.outputs.artifacts.adapter}}"
+
+          # If quality is poor, trigger retraining with different params
+          - name: retry-training
+            template: adjust-and-retry
+            dependencies: [check-quality]
+            when: "{{tasks.check-quality.outputs.parameters.passed}} == false"
+            arguments:
+              parameters:
+                - name: current-rank
+                  value: "{{workflow.parameters.lora-rank}}"
+                - name: current-epochs
+                  value: "{{workflow.parameters.epochs}}"
+
+          - name: notify-complete
+            template: nats-notify
+            when: "{{workflow.parameters.notify-nats}} == true"
+            dependencies: [upload-model]
+            arguments:
+              parameters:
+                - name: subject
+                  value: "ai.pipeline.status.{{workflow.name}}"
+                - name: message
+                  value: '{"status": "completed", "score": "{{tasks.evaluate.outputs.parameters.score}}"}'
+
+    # Extract training data from Milvus
+    - name: extract-training-data
+      inputs:
+        parameters:
+          - name: collection-name
+      outputs:
+        artifacts:
+          - name: dataset
+            path: /tmp/dataset
+        parameters:
+          - name: data-path
+            valueFrom:
+              path: /tmp/data-path
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          import subprocess
+          import sys
+          subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "pymilvus", "pandas", "pyarrow"])
+
+          from pymilvus import connections, Collection
+          import pandas as pd
+          from pathlib import Path
+
+          connections.connect(host="milvus.ai-ml.svc.cluster.local", port=19530)
+
+          collection = Collection("{{inputs.parameters.collection-name}}")
+          collection.load()
+
+          # Query all training samples
+          results = collection.query(
+              expr="source != ''",
+              output_fields=["text", "source", "metadata"],
+              limit=10000
+          )
+
+          # Convert to training format
+          df = pd.DataFrame(results)
+          output_dir = Path("/tmp/dataset")
+          output_dir.mkdir(parents=True, exist_ok=True)
+
+          # Save as parquet for efficient loading
+          df.to_parquet(output_dir / "train.parquet")
+
+          print(f"Extracted {len(df)} samples")
+          with open("/tmp/data-path", "w") as f:
+              f.write(str(output_dir / "train.parquet"))
+
+    # Validate dataset
+    - name: validate-dataset
+      inputs:
+        artifacts:
+          - name: dataset
+            path: /tmp/dataset
+      outputs:
+        parameters:
+          - name: needs-embeddings
+            valueFrom:
+              path: /tmp/needs-embeddings
+          - name: sample-count
+            valueFrom:
+              path: /tmp/sample-count
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          import subprocess
+          import sys
+          subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "pandas", "pyarrow"])
+
+          import pandas as pd
+          from pathlib import Path
+
+          dataset_dir = Path("/tmp/dataset")
+          parquet_files = list(dataset_dir.glob("*.parquet"))
+
+          if not parquet_files:
+              raise ValueError("No parquet files found in dataset")
+
+          df = pd.read_parquet(parquet_files[0])
+          sample_count = len(df)
+          print(f"Dataset contains {sample_count} samples")
+
+          # Check if embeddings column exists
+          needs_embeddings = "embedding" not in df.columns
+
+          with open("/tmp/needs-embeddings", "w") as f:
+              f.write(str(needs_embeddings).lower())
+
+          with open("/tmp/sample-count", "w") as f:
+              f.write(str(sample_count))
+
+    # Trigger KFP pipeline
+    - name: trigger-kfp
+      inputs:
+        parameters:
+          - name: pipeline-id
+          - name: params
+      outputs:
+        parameters:
+          - name: run-id
+            valueFrom:
+              path: /tmp/run-id
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          import subprocess
+          import sys
+          import json
+          subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "kfp==2.12.1"])
+
+          from kfp import Client
+
+          client = Client(host="http://ml-pipeline.kubeflow.svc.cluster.local:8888")
+          params = json.loads('''{{inputs.parameters.params}}''')
+
+          run = client.create_run_from_pipeline_func(
+              pipeline_func=None,
+              pipeline_id="{{inputs.parameters.pipeline-id}}",
+              arguments=params
+          )
+
+          print(f"Triggered KFP pipeline: {run.run_id}")
+          with open("/tmp/run-id", "w") as f:
+              f.write(run.run_id)
+
+    # LoRA training (GPU)
+    - name: lora-training
+      inputs:
+        parameters:
+          - name: model-name
+          - name: lora-rank
+          - name: epochs
+          - name: batch-size
+        artifacts:
+          - name: dataset
+            path: /data/dataset
+      outputs:
+        artifacts:
+          - name: adapter
+            path: /output/adapter
+          - name: logs
+            path: /output/logs
+      podSpecPatch: |
+        containers:
+          - name: main
+            resources:
+              requests:
+                amd.com/gpu: 1
+              limits:
+                amd.com/gpu: 1
+      script:
+        image: ghcr.io/billy-davies-2/lora-trainer:latest
+        command: [python]
+        env:
+          - name: HF_HOME
+            value: /cache/huggingface
+          - name: TRANSFORMERS_CACHE
+            value: /cache/huggingface
+        volumeMounts:
+          - name: model-cache
+            mountPath: /cache
+          - name: shm
+            mountPath: /dev/shm
+        source: |
+          import os
+          import sys
+          from pathlib import Path
+
+          # Training script
+          from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
+          from peft import LoraConfig, get_peft_model
+          from datasets import load_dataset
+          from trl import SFTTrainer
+
+          model_name = "{{inputs.parameters.model-name}}"
+          lora_rank = int("{{inputs.parameters.lora-rank}}")
+          epochs = int("{{inputs.parameters.epochs}}")
+          batch_size = int("{{inputs.parameters.batch-size}}")
+
+          # Load dataset
+          dataset = load_dataset("parquet", data_files="/data/dataset/*.parquet", split="train")
+
+          # Load model
+          model = AutoModelForCausalLM.from_pretrained(
+              model_name,
+              torch_dtype="auto",
+              device_map="auto"
+          )
+          tokenizer = AutoTokenizer.from_pretrained(model_name)
+
+          # Configure LoRA
+          lora_config = LoraConfig(
+              r=lora_rank,
+              lora_alpha=lora_rank * 2,
+              target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
+              lora_dropout=0.05,
+              bias="none",
+              task_type="CAUSAL_LM"
+          )
+
+          model = get_peft_model(model, lora_config)
+
+          # Training
+          training_args = TrainingArguments(
+              output_dir="/output/adapter",
+              num_train_epochs=epochs,
+              per_device_train_batch_size=batch_size,
+              gradient_accumulation_steps=4,
+              learning_rate=2e-4,
+              logging_dir="/output/logs",
+              save_strategy="epoch"
+          )
+
+          trainer = SFTTrainer(
+              model=model,
+              train_dataset=dataset,
+              tokenizer=tokenizer,
+              args=training_args,
+              dataset_text_field="text"
+          )
+
+          trainer.train()
+          trainer.save_model("/output/adapter")
+          print("Training complete!")
+
+    # Evaluate model
+    - name: evaluate-model
+      inputs:
+        artifacts:
+          - name: adapter
+            path: /input/adapter
+      outputs:
+        parameters:
+          - name: score
+            valueFrom:
+              path: /tmp/score
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          import subprocess
+          import sys
+          subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "httpx"])
+
+          import httpx
+          import json
+
+          # Run evaluation using vLLM with the adapter
+          test_prompts = [
+              "What is the capital of France?",
+              "Explain machine learning in simple terms.",
+              "Write a haiku about coding."
+          ]
+
+          scores = []
+          with httpx.Client(timeout=120.0) as client:
+              for prompt in test_prompts:
+                  response = client.post(
+                      "http://llm-draft.ai-ml.svc.cluster.local:8000/v1/chat/completions",
+                      json={
+                          "model": "local-adapter",
+                          "messages": [{"role": "user", "content": prompt}],
+                          "max_tokens": 200
+                      }
+                  )
+                  # Simple scoring based on response coherence
+                  result = response.json()
+                  content = result["choices"][0]["message"]["content"]
+                  score = min(1.0, len(content) / 100)  # Placeholder scoring
+                  scores.append(score)
+
+          avg_score = sum(scores) / len(scores)
+          print(f"Average evaluation score: {avg_score}")
+
+          with open("/tmp/score", "w") as f:
+              f.write(str(round(avg_score, 3)))
+
+    # Quality gate
+    - name: quality-gate
+      inputs:
+        parameters:
+          - name: eval-score
+          - name: threshold
+      outputs:
+        parameters:
+          - name: passed
+            valueFrom:
+              path: /tmp/passed
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          score = float("{{inputs.parameters.eval-score}}")
+          threshold = float("{{inputs.parameters.threshold}}")
+
+          passed = score >= threshold
+          print(f"Score {score} {'passed' if passed else 'failed'} threshold {threshold}")
+
+          with open("/tmp/passed", "w") as f:
+              f.write(str(passed).lower())
+
+    # Upload to S3
+    - name: upload-to-s3
+      inputs:
+        parameters:
+          - name: output-path
+        artifacts:
+          - name: adapter
+            path: /input/adapter
+      script:
+        image: amazon/aws-cli:latest
+        command: [bash]
+        env:
+          - name: AWS_ACCESS_KEY_ID
+            valueFrom:
+              secretKeyRef:
+                name: s3-credentials
+                key: access-key
+          - name: AWS_SECRET_ACCESS_KEY
+            valueFrom:
+              secretKeyRef:
+                name: s3-credentials
+                key: secret-key
+          - name: AWS_ENDPOINT_URL
+            value: "https://quobjects.billy.davies.cloud"
+        source: |
+          aws s3 cp --recursive /input/adapter "{{inputs.parameters.output-path}}/$(date +%Y%m%d-%H%M%S)/"
+          echo "Uploaded adapter to {{inputs.parameters.output-path}}"
+
+    # Adjust parameters and retry
+    - name: adjust-and-retry
+      inputs:
+        parameters:
+          - name: current-rank
+          - name: current-epochs
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          current_rank = int("{{inputs.parameters.current-rank}}")
+          current_epochs = int("{{inputs.parameters.current-epochs}}")
+
+          # Increase rank and epochs for next attempt
+          new_rank = min(64, current_rank * 2)
+          new_epochs = current_epochs + 2
+
+          print(f"Adjusting parameters: rank {current_rank}->{new_rank}, epochs {current_epochs}->{new_epochs}")
+          print("TODO: Trigger new workflow with adjusted parameters")
+
+    # NATS notification
+    - name: nats-notify
+      inputs:
+        parameters:
+          - name: subject
+          - name: message
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          import subprocess
+          import sys
+          subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "nats-py"])
+
+          import asyncio
+          import nats
+
+          async def notify():
+              nc = await nats.connect("nats://nats.ai-ml.svc.cluster.local:4222")
+              await nc.publish(
+                  "{{inputs.parameters.subject}}",
+                  b'''{{inputs.parameters.message}}'''
+              )
+              await nc.close()
+
+          asyncio.run(notify())
--- a/kfp-integration.yaml
+++ b/kfp-integration.yaml
@@ -0,0 +1,237 @@
+# Argo Workflows + Kubeflow Pipelines Integration
+# This template allows Argo Workflows to trigger KFP pipelines and vice versa
+---
+apiVersion: argoproj.io/v1alpha1
+kind: WorkflowTemplate
+metadata:
+  name: kfp-trigger
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: kfp-trigger
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  entrypoint: trigger-kfp-pipeline
+  serviceAccountName: argo-workflow
+
+  arguments:
+    parameters:
+      - name: pipeline-id
+        description: "Kubeflow Pipeline ID or name"
+      - name: pipeline-params
+        description: "JSON object of pipeline parameters"
+        value: "{}"
+      - name: experiment-name
+        description: "KFP Experiment to use"
+        value: "Default"
+      - name: wait-for-completion
+        description: "Wait for pipeline to complete"
+        value: "true"
+
+  templates:
+    - name: trigger-kfp-pipeline
+      steps:
+        - - name: submit-run
+            template: submit-kfp-run
+            arguments:
+              parameters:
+                - name: pipeline-id
+                  value: "{{workflow.parameters.pipeline-id}}"
+                - name: pipeline-params
+                  value: "{{workflow.parameters.pipeline-params}}"
+                - name: experiment-name
+                  value: "{{workflow.parameters.experiment-name}}"
+
+        - - name: wait-completion
+            template: wait-for-kfp
+            when: "{{workflow.parameters.wait-for-completion}} == true"
+            arguments:
+              parameters:
+                - name: run-id
+                  value: "{{steps.submit-run.outputs.parameters.run-id}}"
+
+    - name: submit-kfp-run
+      inputs:
+        parameters:
+          - name: pipeline-id
+          - name: pipeline-params
+          - name: experiment-name
+      outputs:
+        parameters:
+          - name: run-id
+            valueFrom:
+              path: /tmp/run-id
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          import json
+          import subprocess
+          import sys
+          subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "kfp==2.12.1"])
+
+          from kfp import Client
+
+          KUBEFLOW_HOST = "http://ml-pipeline.kubeflow.svc.cluster.local:8888"
+
+          client = Client(host=KUBEFLOW_HOST)
+
+          pipeline_id = "{{inputs.parameters.pipeline-id}}"
+          params = json.loads('''{{inputs.parameters.pipeline-params}}''')
+          experiment_name = "{{inputs.parameters.experiment-name}}"
+
+          # Get or create experiment
+          try:
+              experiment = client.get_experiment(experiment_name=experiment_name)
+          except:
+              experiment = client.create_experiment(name=experiment_name)
+
+          # Get pipeline by name or ID
+          try:
+              pipeline = client.get_pipeline(pipeline_id)
+          except:
+              # Try by name
+              pipelines = client.list_pipelines(filter=f'name="{pipeline_id}"')
+              if pipelines.pipelines:
+                  pipeline = pipelines.pipelines[0]
+              else:
+                  raise ValueError(f"Pipeline not found: {pipeline_id}")
+
+          # Create run
+          run = client.run_pipeline(
+              experiment_id=experiment.experiment_id,
+              job_name=f"{pipeline.display_name}-argo-{pipeline_id[:8]}",
+              pipeline_id=pipeline.pipeline_id,
+              params=params
+          )
+
+          print(f"Submitted KFP run: {run.run_id}")
+          with open("/tmp/run-id", "w") as f:
+              f.write(run.run_id)
+
+    - name: wait-for-kfp
+      inputs:
+        parameters:
+          - name: run-id
+      outputs:
+        parameters:
+          - name: status
+            valueFrom:
+              path: /tmp/status
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          import subprocess
+          import sys
+          import time
+          subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "kfp==2.12.1"])
+
+          from kfp import Client
+
+          KUBEFLOW_HOST = "http://ml-pipeline.kubeflow.svc.cluster.local:8888"
+          run_id = "{{inputs.parameters.run-id}}"
+
+          client = Client(host=KUBEFLOW_HOST)
+
+          while True:
+              run = client.get_run(run_id)
+              state = run.run.status
+
+              print(f"Run {run_id} status: {state}")
+
+              if state in ["SUCCEEDED", "SKIPPED"]:
+                  with open("/tmp/status", "w") as f:
+                      f.write("SUCCEEDED")
+                  break
+              elif state in ["FAILED", "ERROR", "CANCELLED"]:
+                  with open("/tmp/status", "w") as f:
+                      f.write(state)
+                  raise Exception(f"Pipeline failed with status: {state}")
+
+              time.sleep(30)
+
+---
+# WorkflowTemplate for running KFP pipeline components as Argo steps
+apiVersion: argoproj.io/v1alpha1
+kind: WorkflowTemplate
+metadata:
+  name: kfp-component-runner
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: kfp-component-runner
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  entrypoint: run-component
+  serviceAccountName: argo-workflow
+
+  arguments:
+    parameters:
+      - name: component-name
+        description: "Name of the KFP component to run"
+      - name: component-params
+        description: "JSON parameters for the component"
+        value: "{}"
+
+  templates:
+    - name: run-component
+      inputs:
+        parameters:
+          - name: component-name
+          - name: component-params
+      outputs:
+        parameters:
+          - name: result
+            valueFrom:
+              path: /tmp/result.json
+      script:
+        image: python:3.13-slim
+        command: [python]
+        source: |
+          import json
+          import subprocess
+          import sys
+          subprocess.check_call([
+              sys.executable, "-m", "pip", "install", "-q",
+              "httpx", "pymilvus"
+          ])
+
+          import httpx
+
+          component_name = "{{inputs.parameters.component-name}}"
+          params = json.loads('''{{inputs.parameters.component-params}}''')
+
+          # Component implementations (mirrors KFP components)
+          COMPONENTS = {
+              "transcribe_audio": {
+                  "url": "http://whisper-predictor.ai-ml.svc.cluster.local",
+                  "endpoint": "/v1/audio/transcriptions"
+              },
+              "generate_embeddings": {
+                  "url": "http://embeddings-predictor.ai-ml.svc.cluster.local",
+                  "endpoint": "/embeddings"
+              },
+              "generate_response": {
+                  "url": "http://llm-draft.ai-ml.svc.cluster.local:8000",
+                  "endpoint": "/v1/chat/completions"
+              },
+              "synthesize_speech": {
+                  "url": "http://tts-predictor.ai-ml.svc.cluster.local",
+                  "endpoint": "/v1/audio/speech"
+              }
+          }
+
+          if component_name not in COMPONENTS:
+              raise ValueError(f"Unknown component: {component_name}")
+
+          config = COMPONENTS[component_name]
+          with httpx.Client(timeout=120.0) as client:
+              response = client.post(
+                  f"{config['url']}{config['endpoint']}",
+                  json=params
+              )
+              result = response.json()
+
+          with open("/tmp/result.json", "w") as f:
+              json.dump(result, f)
+
+          print(f"Component {component_name} completed")
--- a/qlora-training.yaml
+++ b/qlora-training.yaml
@@ -0,0 +1,510 @@
+# QLoRA Fine-tuning Workflow
+# Trains QLora adapters from a reference model using data from Milvus vector database
+# Triggered via NATS: ai.pipeline.trigger with pipeline="qlora-training"
+---
+apiVersion: argoproj.io/v1alpha1
+kind: WorkflowTemplate
+metadata:
+  name: qlora-training
+  namespace: ai-ml
+  labels:
+    app.kubernetes.io/name: qlora-training
+    app.kubernetes.io/part-of: llm-workflows
+spec:
+  entrypoint: train-qlora
+  serviceAccountName: argo-workflow
+  
+  arguments:
+    parameters:
+      - name: reference-model
+        description: "Base model to fine-tune (HuggingFace model ID or path)"
+        value: "mistralai/Mistral-7B-Instruct-v0.3"
+      - name: output-name
+        description: "Name for the output QLora adapter"
+        value: "qlora-adapter"
+      - name: milvus-collections
+        description: "Comma-separated list of Milvus collections to use (empty = all available)"
+        value: ""
+      - name: learning-rate
+        value: "2e-4"
+        description: "Learning rate for training"
+      - name: num-epochs
+        value: "3"
+        description: "Number of training epochs"
+      - name: batch-size
+        value: "4"
+        description: "Training batch size"
+      - name: max-seq-length
+        value: "2048"
+        description: "Maximum sequence length"
+      - name: lora-r
+        value: "64"
+        description: "LoRA attention dimension"
+      - name: lora-alpha
+        value: "16"
+        description: "LoRA alpha parameter"
+      - name: lora-dropout
+        value: "0.05"
+        description: "LoRA dropout rate"
+
+  volumeClaimTemplates:
+    - metadata:
+        name: model-storage
+      spec:
+        accessModes: ["ReadWriteMany"]
+        storageClassName: nfs-slow
+        resources:
+          requests:
+            storage: 50Gi
+
+  templates:
+    - name: train-qlora
+      dag:
+        tasks:
+          - name: fetch-training-data
+            template: fetch-data
+            arguments:
+              parameters:
+                - name: milvus-collections
+                  value: "{{workflow.parameters.milvus-collections}}"
+
+          - name: prepare-dataset
+            template: prepare-data
+            dependencies: [fetch-training-data]
+            arguments:
+              parameters:
+                - name: max-seq-length
+                  value: "{{workflow.parameters.max-seq-length}}"
+              artifacts:
+                - name: raw-data
+                  from: "{{tasks.fetch-training-data.outputs.artifacts.raw-data}}"
+
+          - name: train-model
+            template: train
+            dependencies: [prepare-dataset]
+            arguments:
+              parameters:
+                - name: reference-model
+                  value: "{{workflow.parameters.reference-model}}"
+                - name: output-name
+                  value: "{{workflow.parameters.output-name}}"
+                - name: learning-rate
+                  value: "{{workflow.parameters.learning-rate}}"
+                - name: num-epochs
+                  value: "{{workflow.parameters.num-epochs}}"
+                - name: batch-size
+                  value: "{{workflow.parameters.batch-size}}"
+                - name: max-seq-length
+                  value: "{{workflow.parameters.max-seq-length}}"
+                - name: lora-r
+                  value: "{{workflow.parameters.lora-r}}"
+                - name: lora-alpha
+                  value: "{{workflow.parameters.lora-alpha}}"
+                - name: lora-dropout
+                  value: "{{workflow.parameters.lora-dropout}}"
+              artifacts:
+                - name: training-data
+                  from: "{{tasks.prepare-dataset.outputs.artifacts.training-data}}"
+
+    - name: fetch-data
+      inputs:
+        parameters:
+          - name: milvus-collections
+      outputs:
+        artifacts:
+          - name: raw-data
+            path: /tmp/raw-data
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import subprocess
+            subprocess.run(["pip", "install", "pymilvus", "-q"], check=True)
+            
+            import json
+            from pathlib import Path
+            from pymilvus import connections, Collection, utility
+
+            MILVUS_HOST = "milvus.ai-ml.svc.cluster.local"
+            MILVUS_PORT = 19530
+            collections_param = "{{inputs.parameters.milvus-collections}}"
+            
+            output_dir = Path("/tmp/raw-data")
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            print(f"Connecting to Milvus at {MILVUS_HOST}:{MILVUS_PORT}")
+            connections.connect(host=MILVUS_HOST, port=MILVUS_PORT)
+            
+            # Determine which collections to use
+            if collections_param and collections_param.strip():
+                collection_names = [c.strip() for c in collections_param.split(",")]
+                print(f"Using specified collections: {collection_names}")
+            else:
+                # Get all available collections
+                collection_names = utility.list_collections()
+                print(f"Using all available collections: {collection_names}")
+            
+            all_training_data = []
+            
+            for collection_name in collection_names:
+                if not utility.has_collection(collection_name):
+                    print(f"Warning: Collection {collection_name} not found, skipping")
+                    continue
+                
+                print(f"Fetching data from collection: {collection_name}")
+                collection = Collection(collection_name)
+                collection.load()
+                
+                # Query all data from the collection
+                # Note: Adjust field names based on your schema
+                try:
+                    # Get collection schema to determine fields
+                    schema = collection.schema
+                    field_names = [field.name for field in schema.fields if field.name != "id"]
+                    
+                    # Query all entities (limited to reasonable batch size)
+                    # For large collections, you may want to implement pagination
+                    results = collection.query(
+                        expr="id >= 0",
+                        output_fields=field_names,
+                        limit=100000
+                    )
+                    
+                    print(f"  Retrieved {len(results)} records from {collection_name}")
+                    
+                    for result in results:
+                        # Extract text field (adjust based on your schema)
+                        text_content = result.get("text", "")
+                        source = result.get("source", collection_name)
+                        
+                        if text_content:
+                            all_training_data.append({
+                                "text": text_content,
+                                "source": source,
+                                "collection": collection_name
+                            })
+                    
+                except Exception as e:
+                    print(f"Error querying collection {collection_name}: {e}")
+                    continue
+            
+            # Save all training data
+            output_file = output_dir / "training_data.json"
+            with open(output_file, "w") as f:
+                json.dump({"data": all_training_data}, f)
+            
+            print(f"Total training samples collected: {len(all_training_data)}")
+            print(f"Saved to {output_file}")
+            
+            connections.disconnect("default")
+        envFrom:
+          - configMapRef:
+              name: ai-services-config
+        resources:
+          requests:
+            memory: 1Gi
+            cpu: 500m
+
+    - name: prepare-data
+      inputs:
+        parameters:
+          - name: max-seq-length
+        artifacts:
+          - name: raw-data
+            path: /tmp/raw-data
+      outputs:
+        artifacts:
+          - name: training-data
+            path: /tmp/training-data
+      container:
+        image: python:3.13-slim
+        command: [python]
+        args:
+          - -c
+          - |
+            import json
+            from pathlib import Path
+            
+            max_seq_length = int("{{inputs.parameters.max-seq-length}}")
+            
+            input_dir = Path("/tmp/raw-data")
+            output_dir = Path("/tmp/training-data")
+            output_dir.mkdir(parents=True, exist_ok=True)
+            
+            # Load raw data
+            with open(input_dir / "training_data.json") as f:
+                data = json.load(f)
+            
+            raw_samples = data["data"]
+            print(f"Processing {len(raw_samples)} raw samples")
+            
+            # Prepare data in instruction format for fine-tuning
+            # Using Alpaca-style format: instruction + response
+            training_samples = []
+            
+            for sample in raw_samples:
+                text = sample["text"]
+                source = sample.get("source", "")
+                
+                # Create instruction-response pairs
+                # You can customize this based on your use case
+                training_sample = {
+                    "instruction": f"Based on the following information from {source}, provide a comprehensive response:",
+                    "input": text[:max_seq_length // 2],  # Truncate if needed
+                    "output": text[:max_seq_length // 2],
+                    "source": source
+                }
+                training_samples.append(training_sample)
+            
+            # Split into train/validation (90/10)
+            split_idx = int(len(training_samples) * 0.9)
+            train_data = training_samples[:split_idx]
+            val_data = training_samples[split_idx:]
+            
+            # Save prepared datasets
+            with open(output_dir / "train.json", "w") as f:
+                json.dump(train_data, f)
+            
+            with open(output_dir / "validation.json", "w") as f:
+                json.dump(val_data, f)
+            
+            print(f"Prepared {len(train_data)} training samples")
+            print(f"Prepared {len(val_data)} validation samples")
+            print("Data preparation complete")
+        resources:
+          requests:
+            memory: 2Gi
+            cpu: 500m
+
+    - name: train
+      inputs:
+        parameters:
+          - name: reference-model
+          - name: output-name
+          - name: learning-rate
+          - name: num-epochs
+          - name: batch-size
+          - name: max-seq-length
+          - name: lora-r
+          - name: lora-alpha
+          - name: lora-dropout
+        artifacts:
+          - name: training-data
+            path: /tmp/training-data
+      container:
+        image: python:3.13-slim
+        command: [bash]
+        args:
+          - -c
+          - |
+            set -e
+            
+            echo "Installing dependencies..."
+            pip install -q torch transformers peft datasets accelerate bitsandbytes scipy
+            
+            echo "Starting QLoRA training..."
+            
+            python << 'EOF'
+            import json
+            import os
+            from pathlib import Path
+            from datasets import Dataset
+            from transformers import (
+                AutoModelForCausalLM,
+                AutoTokenizer,
+                BitsAndBytesConfig,
+                TrainingArguments,
+                Trainer,
+                DataCollatorForLanguageModeling
+            )
+            from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
+            import torch
+
+            # Parameters
+            reference_model = "{{inputs.parameters.reference-model}}"
+            output_name = "{{inputs.parameters.output-name}}"
+            learning_rate = float("{{inputs.parameters.learning-rate}}")
+            num_epochs = int("{{inputs.parameters.num-epochs}}")
+            batch_size = int("{{inputs.parameters.batch-size}}")
+            max_seq_length = int("{{inputs.parameters.max-seq-length}}")
+            lora_r = int("{{inputs.parameters.lora-r}}")
+            lora_alpha = int("{{inputs.parameters.lora-alpha}}")
+            lora_dropout = float("{{inputs.parameters.lora-dropout}}")
+            
+            data_dir = Path("/tmp/training-data")
+            output_dir = Path("/mnt/model-storage") / output_name
+            output_dir.mkdir(parents=True, exist_ok=True)
+            
+            print(f"Training configuration:")
+            print(f"  Model: {reference_model}")
+            print(f"  Learning rate: {learning_rate}")
+            print(f"  Epochs: {num_epochs}")
+            print(f"  Batch size: {batch_size}")
+            print(f"  Max sequence length: {max_seq_length}")
+            print(f"  LoRA r: {lora_r}, alpha: {lora_alpha}, dropout: {lora_dropout}")
+            
+            # Load datasets
+            with open(data_dir / "train.json") as f:
+                train_data = json.load(f)
+            
+            with open(data_dir / "validation.json") as f:
+                val_data = json.load(f)
+            
+            print(f"Loaded {len(train_data)} training samples, {len(val_data)} validation samples")
+            
+            # Load tokenizer
+            print(f"Loading tokenizer from {reference_model}...")
+            tokenizer = AutoTokenizer.from_pretrained(reference_model, trust_remote_code=True)
+            if tokenizer.pad_token is None:
+                tokenizer.pad_token = tokenizer.eos_token
+            
+            # Prepare datasets
+            def format_sample(sample):
+                instruction = sample.get("instruction", "")
+                input_text = sample.get("input", "")
+                output = sample.get("output", "")
+                
+                if input_text:
+                    prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n{output}"
+                else:
+                    prompt = f"### Instruction:\n{instruction}\n\n### Response:\n{output}"
+                
+                return {"text": prompt}
+            
+            train_dataset = Dataset.from_list([format_sample(s) for s in train_data])
+            val_dataset = Dataset.from_list([format_sample(s) for s in val_data])
+            
+            # Tokenize datasets
+            def tokenize_function(examples):
+                return tokenizer(
+                    examples["text"],
+                    truncation=True,
+                    max_length=max_seq_length,
+                    padding="max_length"
+                )
+            
+            print("Tokenizing datasets...")
+            train_dataset = train_dataset.map(tokenize_function, batched=True, remove_columns=["text"])
+            val_dataset = val_dataset.map(tokenize_function, batched=True, remove_columns=["text"])
+            
+            # Configure quantization for QLoRA
+            bnb_config = BitsAndBytesConfig(
+                load_in_4bit=True,
+                bnb_4bit_quant_type="nf4",
+                bnb_4bit_compute_dtype=torch.float16,
+                bnb_4bit_use_double_quant=True,
+            )
+            
+            # Load model
+            print(f"Loading model {reference_model} with 4-bit quantization...")
+            model = AutoModelForCausalLM.from_pretrained(
+                reference_model,
+                quantization_config=bnb_config,
+                device_map="auto",
+                trust_remote_code=True
+            )
+            
+            # Prepare model for training
+            model = prepare_model_for_kbit_training(model)
+            
+            # Configure LoRA
+            lora_config = LoraConfig(
+                r=lora_r,
+                lora_alpha=lora_alpha,
+                target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
+                lora_dropout=lora_dropout,
+                bias="none",
+                task_type="CAUSAL_LM"
+            )
+            
+            # Add LoRA adapters
+            print("Adding LoRA adapters...")
+            model = get_peft_model(model, lora_config)
+            model.print_trainable_parameters()
+            
+            # Training arguments
+            training_args = TrainingArguments(
+                output_dir=str(output_dir / "checkpoints"),
+                num_train_epochs=num_epochs,
+                per_device_train_batch_size=batch_size,
+                per_device_eval_batch_size=batch_size,
+                gradient_accumulation_steps=4,
+                learning_rate=learning_rate,
+                fp16=True,
+                logging_steps=10,
+                evaluation_strategy="steps",
+                eval_steps=50,
+                save_strategy="steps",
+                save_steps=100,
+                save_total_limit=3,
+                load_best_model_at_end=True,
+                report_to="none",
+                remove_unused_columns=False,
+            )
+            
+            # Data collator
+            data_collator = DataCollatorForLanguageModeling(
+                tokenizer=tokenizer,
+                mlm=False
+            )
+            
+            # Trainer
+            trainer = Trainer(
+                model=model,
+                args=training_args,
+                train_dataset=train_dataset,
+                eval_dataset=val_dataset,
+                data_collator=data_collator,
+            )
+            
+            # Train
+            print("Starting training...")
+            trainer.train()
+            
+            # Save final model
+            print(f"Saving QLora adapter to {output_dir}")
+            model.save_pretrained(str(output_dir / "final"))
+            tokenizer.save_pretrained(str(output_dir / "final"))
+            
+            # Save training metadata
+            metadata = {
+                "reference_model": reference_model,
+                "output_name": output_name,
+                "training_params": {
+                    "learning_rate": learning_rate,
+                    "num_epochs": num_epochs,
+                    "batch_size": batch_size,
+                    "max_seq_length": max_seq_length,
+                    "lora_r": lora_r,
+                    "lora_alpha": lora_alpha,
+                    "lora_dropout": lora_dropout
+                },
+                "dataset_info": {
+                    "train_samples": len(train_data),
+                    "val_samples": len(val_data)
+                }
+            }
+            
+            with open(output_dir / "metadata.json", "w") as f:
+                json.dump(metadata, f, indent=2)
+            
+            print("Training complete!")
+            print(f"QLora adapter saved to: {output_dir}")
+            EOF
+        envFrom:
+          - configMapRef:
+              name: ai-services-config
+        volumeMounts:
+          - name: model-storage
+            mountPath: /mnt/model-storage
+        resources:
+          requests:
+            memory: 16Gi
+            cpu: 4
+            nvidia.com/gpu: 1
+          limits:
+            memory: 32Gi
+            cpu: 8
+            nvidia.com/gpu: 1