文档
MLflow autolog:一行代码自动追踪实验
目标
用 mlflow.autolog() 一行代码自动记录 XGBoost 训练的所有参数、指标、模型,然后在 UI 中对比多次实验。
完整代码
import mlflow
import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# ─── 1. 设置 MLflow ───
mlflow.set_tracking_uri("http://localhost:5000") # 或删除此行用本地
mlflow.set_experiment("xgboost-breast-cancer")
# ─── 2. 开启 autolog ───
mlflow.xgboost.autolog() # 👈 一行搞定!自动记录所有
# ─── 3. 数据准备 ───
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# ─── 4. 多次实验:不同的超参 ───
experiments = [
{"n_estimators": 50, "max_depth": 3, "learning_rate": 0.1},
{"n_estimators": 100, "max_depth": 5, "learning_rate": 0.05},
{"n_estimators": 200, "max_depth": 7, "learning_rate": 0.01},
]
for params in experiments:
with mlflow.start_run(run_name=f"xgb_d{params['max_depth']}_lr{params['learning_rate']}"):
model = xgb.XGBClassifier(**params, eval_metric="logloss", random_state=42)
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
mlflow.log_metric("test_accuracy", acc)
print(f"✓ {params} → Accuracy: {acc:.4f}")
print(f"\n查看所有实验: mlflow ui --port 5000")
print(f"实验总数: {len(mlflow.search_runs())}")
运行步骤
# 终端 1: 启动 MLflow UI
pip install mlflow xgboost scikit-learn
mlflow ui --port 5000
# 终端 2: 运行实验
python mlflow_autolog.py
打开 http://localhost:5000 即可看到 3 个实验对比。
预期输出
✓ {'n_estimators': 50, 'max_depth': 3, 'learning_rate': 0.1} → Accuracy: 0.9737
✓ {'n_estimators': 100, 'max_depth': 5, 'learning_rate': 0.05} → Accuracy: 0.9825
✓ {'n_estimators': 200, 'max_depth': 7, 'learning_rate': 0.01} → Accuracy: 0.9825
查看所有实验: mlflow ui --port 5000
实验总数: 3
UI 中可以对比:参数表、指标曲线、运行时长、模型下载。