argo/README.md

# Argo Workflows

ML training and batch inference workflows for the DaviesTechLabs AI/ML platform.

## Workflows

| Workflow | Description | Trigger |
|----------|-------------|---------|
| `batch-inference` | Run LLM inference on batch inputs | `ai.pipeline.trigger` (pipeline="batch-inference") |
| `qlora-training` | Train QLoRA adapters from Milvus data | `ai.pipeline.trigger` (pipeline="qlora-training") |
| `hybrid-ml-training` | Multi-GPU distributed training | `ai.pipeline.trigger` (pipeline="hybrid-ml-training") |
| `coqui-voice-training` | XTTS voice cloning/training | `ai.pipeline.trigger` (pipeline="coqui-voice-training") |
| `document-ingestion` | Ingest documents into Milvus | `ai.pipeline.trigger` (pipeline="document-ingestion") |

## Integration

| File | Description |
|------|-------------|
| `eventsource-kfp.yaml` | Argo Events source for Kubeflow Pipelines integration |
| `kfp-integration.yaml` | Bridge workflows between Argo and Kubeflow |

## Architecture

```
NATS (ai.pipeline.trigger)
         │
         ▼
┌─────────────────┐
│  Argo Events    │
│  EventSource    │
└─────────────────┘
         │
         ▼
┌─────────────────┐
│  Argo Sensor    │
└─────────────────┘
         │
         ▼
┌─────────────────┐
│ WorkflowTemplate│
│  (batch-inf,    │
│   qlora, etc)   │
└─────────────────┘
         │
         ├──▶ GPU Pods (AMD ROCm / NVIDIA CUDA)
         ├──▶ Milvus Vector DB
         ├──▶ vLLM / Ray Serve
         └──▶ MLflow Tracking
```

## Workflow Details

### batch-inference

Batch LLM inference with optional RAG:

```bash
argo submit batch-inference.yaml \
  -p input-url="s3://bucket/inputs.json" \
  -p output-url="s3://bucket/outputs.json" \
  -p use-rag="true" \
  -p max-tokens="500"
```

### qlora-training

Fine-tune QLoRA adapters from Milvus knowledge:

```bash
argo submit qlora-training.yaml \
  -p reference-model="mistralai/Mistral-7B-Instruct-v0.3" \
  -p output-name="my-adapter" \
  -p milvus-collections="docs,wiki" \
  -p num-epochs="3"
```

### coqui-voice-training

Train XTTS voice models:

```bash
argo submit coqui-voice-training.yaml \
  -p voice-name="my-voice" \
  -p audio-samples-url="s3://bucket/samples/"
```

### document-ingestion

Ingest documents into Milvus:

```bash
argo submit document-ingestion.yaml \
  -p source-url="s3://bucket/docs/" \
  -p collection="knowledge_base" \
  -p chunk-size="512"
```

## NATS Trigger Format

Workflows are triggered via NATS `ai.pipeline.trigger`:

```json
{
  "pipeline": "qlora-training",
  "parameters": {
    "reference-model": "mistralai/Mistral-7B-Instruct-v0.3",
    "output-name": "custom-adapter",
    "num-epochs": "5"
  }
}
```

## GPU Scheduling

Workflows use node affinity for GPU allocation:

| Node | GPU | Best For |
|------|-----|----------|
| khelben | AMD Strix Halo 64GB | Large model training, vLLM |
| elminster | NVIDIA RTX 2070 | Whisper, XTTS |
| drizzt | AMD Radeon 680M | Embeddings |
| danilo | Intel Arc | Reranker |

## Related

- [homelab-design](https://git.daviestechlabs.io/daviestechlabs/homelab-design) - Architecture docs
- [kuberay-images](https://git.daviestechlabs.io/daviestechlabs/kuberay-images) - Ray worker images
- [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base) - Handler library