Commit Graph

6 Commits

Author SHA1 Message Date
45996a8dbf feat: add DVD/video transcription pipeline
5-step KFP pipeline:
1. extract_audio: ffmpeg extracts 16kHz mono WAV from DVD/video
2. chunk_audio: splits into 5-minute segments for Whisper
3. transcribe_chunks: sends each chunk to Whisper STT endpoint
4. format_transcript: produces SRT, VTT, or TXT with timestamps
5. log_metrics: logs run to MLflow (dvd-transcription experiment)
2026-02-13 09:22:56 -05:00
bc4b230dd9 feat: add vLLM tuning pipeline + recompile voice pipelines with MLflow
New:
- vllm_tuning_pipeline.py: A/B benchmark different vLLM configs,
  logs latency/TPS/TTFT to MLflow (vllm-tuning experiment)
- vllm_tuning_pipeline.yaml: compiled KFP YAML

Updated:
- voice_pipeline.py: per-step NamedTuple outputs with latency tracking,
  new log_pipeline_metrics MLflow component
- voice_pipeline.yaml, tts_pipeline.yaml, rag_pipeline.yaml: recompiled
2026-02-13 08:24:11 -05:00
cee21f124c feat: add MLflow tracking to evaluation pipeline
- Add create_mlflow_run and log_evaluation_to_mlflow KFP components
- Log accuracy, correct/total counts, pass/fail to MLflow experiment
- Upload evaluation_results.json as artifact
- Wire MLflow run into pipeline DAG before NATS publish
2026-02-12 06:15:13 -05:00
bd8c8616d0 updates. 2026-02-02 07:12:05 -05:00
c26e4e5ef0 feat: Add Kubeflow Pipeline definitions
- voice_pipeline: STT → RAG → LLM → TTS
- document_ingestion_pipeline: Extract → Chunk → Embed → Milvus
- document_ingestion_mlflow_pipeline: With MLflow tracking
- evaluation_pipeline: Model benchmarking
- kfp-sync-job: K8s job to sync pipelines
2026-02-01 20:41:13 -05:00
c36655b570 Initial commit 2026-02-02 01:40:30 +00:00