mlflow/README.md

# MLflow Utils

MLflow integration utilities for the DaviesTechLabs AI/ML platform.

## Installation

```bash
pip install -r requirements.txt
```

Or from Gitea:
```bash
pip install git+https://git.daviestechlabs.io/daviestechlabs/mlflow.git
```

## Modules

| Module | Description |
|--------|-------------|
| `client.py` | MLflow client configuration and helpers |
| `tracker.py` | General MLflowTracker for experiments |
| `inference_tracker.py` | Async inference metrics for NATS handlers |
| `model_registry.py` | Model Registry with KServe metadata |
| `kfp_components.py` | Kubeflow Pipeline MLflow components |
| `experiment_comparison.py` | Compare experiments and runs |
| `cli.py` | Command-line interface |

## Quick Start

```python
from mlflow_utils import get_mlflow_client, MLflowTracker

# Simple tracking
with MLflowTracker(experiment_name="my-experiment") as tracker:
    tracker.log_params({"learning_rate": 0.001})
    tracker.log_metrics({"accuracy": 0.95})
```

## Inference Tracking

For NATS handlers (chat-handler, voice-assistant):

```python
from mlflow_utils import InferenceMetricsTracker
from mlflow_utils.inference_tracker import InferenceMetrics

tracker = InferenceMetricsTracker(
    experiment_name="voice-assistant-prod",
    batch_size=100,  # Batch metrics before logging
)

# During request handling
metrics = InferenceMetrics(
    request_id="uuid",
    total_latency=1.5,
    llm_latency=0.8,
    input_tokens=150,
    output_tokens=200,
)
await tracker.log_inference(metrics)
```

## Model Registry

Register models with KServe deployment metadata:

```python
from mlflow_utils.model_registry import register_model_for_kserve

register_model_for_kserve(
    model_name="my-qlora-adapter",
    model_uri="runs:/abc123/model",
    kserve_runtime="kserve-vllm",
    gpu_type="amd-strixhalo",
)
```

## Kubeflow Components

Use in KFP pipelines:

```python
from mlflow_utils.kfp_components import (
    log_experiment_component,
    register_model_component,
)
```

## CLI

```bash
# List experiments
python -m mlflow_utils.cli list-experiments

# Compare runs
python -m mlflow_utils.cli compare-runs --experiment "qlora-training"

# Export metrics
python -m mlflow_utils.cli export --run-id abc123 --output metrics.json
```

## Configuration

| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `MLFLOW_TRACKING_URI` | `http://mlflow.mlflow.svc.cluster.local:80` | MLflow server |
| `MLFLOW_EXPERIMENT_NAME` | `default` | Default experiment |
| `MLFLOW_ENABLE_ASYNC` | `true` | Async logging for handlers |

## Module Structure

```
mlflow_utils/
├── __init__.py              # Public API
├── client.py                # Connection management
├── tracker.py               # General experiment tracker
├── inference_tracker.py     # Async inference metrics
├── model_registry.py        # Model registration + KServe
├── kfp_components.py        # Kubeflow components
├── experiment_comparison.py # Run comparison tools
└── cli.py                   # Command-line interface
```

## Related

- [handler-base](https://git.daviestechlabs.io/daviestechlabs/handler-base) - Uses inference tracker
- [kubeflow](https://git.daviestechlabs.io/daviestechlabs/kubeflow) - KFP components
- [argo](https://git.daviestechlabs.io/daviestechlabs/argo) - Training workflows
- [homelab-design](https://git.daviestechlabs.io/daviestechlabs/homelab-design) - Architecture docs