Files
mlflow/README.md
Billy D. 2df3f27af7 feat: Add MLflow integration utilities
- client: Connection management and helpers
- tracker: General experiment tracking
- inference_tracker: Async metrics for NATS handlers
- model_registry: Model registration with KServe metadata
- kfp_components: Kubeflow Pipeline components
- experiment_comparison: Run comparison tools
- cli: Command-line interface
2026-02-01 20:43:13 -05:00

130 lines
3.4 KiB
Markdown

# MLflow Utils
MLflow integration utilities for the DaviesTechLabs AI/ML platform.
## Installation
```bash
pip install -r requirements.txt
```
Or from Gitea:
```bash
pip install git+https://git.daviestechlabs.io/daviestechlabs/mlflow.git
```
## Modules
| Module | Description |
|--------|-------------|
| `client.py` | MLflow client configuration and helpers |
| `tracker.py` | General MLflowTracker for experiments |
| `inference_tracker.py` | Async inference metrics for NATS handlers |
| `model_registry.py` | Model Registry with KServe metadata |
| `kfp_components.py` | Kubeflow Pipeline MLflow components |
| `experiment_comparison.py` | Compare experiments and runs |
| `cli.py` | Command-line interface |
## Quick Start
```python
from mlflow_utils import get_mlflow_client, MLflowTracker
# Simple tracking
with MLflowTracker(experiment_name="my-experiment") as tracker:
tracker.log_params({"learning_rate": 0.001})
tracker.log_metrics({"accuracy": 0.95})
```
## Inference Tracking
For NATS handlers (chat-handler, voice-assistant):
```python
from mlflow_utils import InferenceMetricsTracker
from mlflow_utils.inference_tracker import InferenceMetrics
tracker = InferenceMetricsTracker(
experiment_name="voice-assistant-prod",
batch_size=100, # Batch metrics before logging
)
# During request handling
metrics = InferenceMetrics(
request_id="uuid",
total_latency=1.5,
llm_latency=0.8,
input_tokens=150,
output_tokens=200,
)
await tracker.log_inference(metrics)
```
## Model Registry
Register models with KServe deployment metadata:
```python
from mlflow_utils.model_registry import register_model_for_kserve
register_model_for_kserve(
model_name="my-qlora-adapter",
model_uri="runs:/abc123/model",
kserve_runtime="kserve-vllm",
gpu_type="amd-strixhalo",
)
```
## Kubeflow Components
Use in KFP pipelines:
```python
from mlflow_utils.kfp_components import (
log_experiment_component,
register_model_component,
)
```
## CLI
```bash
# List experiments
python -m mlflow_utils.cli list-experiments
# Compare runs
python -m mlflow_utils.cli compare-runs --experiment "qlora-training"
# Export metrics
python -m mlflow_utils.cli export --run-id abc123 --output metrics.json
```
## Configuration
| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `MLFLOW_TRACKING_URI` | `http://mlflow.mlflow.svc.cluster.local:80` | MLflow server |
| `MLFLOW_EXPERIMENT_NAME` | `default` | Default experiment |
| `MLFLOW_ENABLE_ASYNC` | `true` | Async logging for handlers |
## Module Structure
```
mlflow_utils/
├── __init__.py # Public API
├── client.py # Connection management
├── tracker.py # General experiment tracker
├── inference_tracker.py # Async inference metrics
├── model_registry.py # Model registration + KServe
├── kfp_components.py # Kubeflow components
├── experiment_comparison.py # Run comparison tools
└── cli.py # Command-line interface
```
## Related
- [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base) - Uses inference tracker
- [kubeflow](https://git.daviestechlabs.io/daviestechlabs/kubeflow) - KFP components
- [argo](https://git.daviestechlabs.io/daviestechlabs/argo) - Training workflows
- [homelab-design](https://git.daviestechlabs.io/daviestechlabs/homelab-design) - Architecture docs