7104698eeed929930e93b41951ebbd1a582b079a
- batch-inference: LLM inference with optional RAG - qlora-training: QLoRA adapter fine-tuning from Milvus - hybrid-ml-training: Multi-GPU distributed training - coqui-voice-training: XTTS voice cloning - document-ingestion: Ingest documents to Milvus - eventsource-kfp: Argo Events / Kubeflow integration - kfp-integration: Bridge between Argo and Kubeflow
Argo Workflows
ML training and batch inference workflows for the DaviesTechLabs AI/ML platform.
Workflows
| Workflow | Description | Trigger |
|---|---|---|
batch-inference |
Run LLM inference on batch inputs | ai.pipeline.trigger (pipeline="batch-inference") |
qlora-training |
Train QLoRA adapters from Milvus data | ai.pipeline.trigger (pipeline="qlora-training") |
hybrid-ml-training |
Multi-GPU distributed training | ai.pipeline.trigger (pipeline="hybrid-ml-training") |
coqui-voice-training |
XTTS voice cloning/training | ai.pipeline.trigger (pipeline="coqui-voice-training") |
document-ingestion |
Ingest documents into Milvus | ai.pipeline.trigger (pipeline="document-ingestion") |
Integration
| File | Description |
|---|---|
eventsource-kfp.yaml |
Argo Events source for Kubeflow Pipelines integration |
kfp-integration.yaml |
Bridge workflows between Argo and Kubeflow |
Architecture
NATS (ai.pipeline.trigger)
│
▼
┌─────────────────┐
│ Argo Events │
│ EventSource │
└─────────────────┘
│
▼
┌─────────────────┐
│ Argo Sensor │
└─────────────────┘
│
▼
┌─────────────────┐
│ WorkflowTemplate│
│ (batch-inf, │
│ qlora, etc) │
└─────────────────┘
│
├──▶ GPU Pods (AMD ROCm / NVIDIA CUDA)
├──▶ Milvus Vector DB
├──▶ vLLM / Ray Serve
└──▶ MLflow Tracking
Workflow Details
batch-inference
Batch LLM inference with optional RAG:
argo submit batch-inference.yaml \
-p input-url="s3://bucket/inputs.json" \
-p output-url="s3://bucket/outputs.json" \
-p use-rag="true" \
-p max-tokens="500"
qlora-training
Fine-tune QLoRA adapters from Milvus knowledge:
argo submit qlora-training.yaml \
-p reference-model="mistralai/Mistral-7B-Instruct-v0.3" \
-p output-name="my-adapter" \
-p milvus-collections="docs,wiki" \
-p num-epochs="3"
coqui-voice-training
Train XTTS voice models:
argo submit coqui-voice-training.yaml \
-p voice-name="my-voice" \
-p audio-samples-url="s3://bucket/samples/"
document-ingestion
Ingest documents into Milvus:
argo submit document-ingestion.yaml \
-p source-url="s3://bucket/docs/" \
-p collection="knowledge_base" \
-p chunk-size="512"
NATS Trigger Format
Workflows are triggered via NATS ai.pipeline.trigger:
{
"pipeline": "qlora-training",
"parameters": {
"reference-model": "mistralai/Mistral-7B-Instruct-v0.3",
"output-name": "custom-adapter",
"num-epochs": "5"
}
}
GPU Scheduling
Workflows use node affinity for GPU allocation:
| Node | GPU | Best For |
|---|---|---|
| khelben | AMD Strix Halo 64GB | Large model training, vLLM |
| elminster | NVIDIA RTX 2070 | Whisper, XTTS |
| drizzt | AMD Radeon 680M | Embeddings |
| danilo | Intel Arc | Reranker |
Related
- homelab-design - Architecture docs
- kuberay-images - Ray worker images
- handler-base - Handler library