文档
Scikit-learn Hello World:鸢尾花分类
目标
使用经典的 Iris 数据集,训练一个随机森林分类器并评估准确率。这是 scikit-learn 最经典的入门示例。
完整代码
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
# 1. 加载数据
iris = load_iris()
X, y = iris.data, iris.target
print(f"特征形状: {X.shape}, 标签形状: {y.shape}")
print(f"类别名: {iris.target_names}")
# 2. 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# 3. 创建并训练模型
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
# 4. 预测并评估
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"\n准确率: {accuracy:.2%}")
print("\n分类报告:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
# 5. 特征重要性
for name, importance in zip(iris.feature_names, clf.feature_importances_):
print(f" {name}: {importance:.4f}")
运行步骤
pip install scikit-learn
python iris_classify.py
预期输出
特征形状: (150, 4), 标签形状: (150,)
类别名: ['setosa' 'versicolor' 'virginica']
准确率: 97.78%
分类报告:
precision recall f1-score support
setosa 1.00 1.00 1.00 19
versicolor 0.93 1.00 0.96 13
virginica 1.00 0.92 0.96 13
accuracy 0.98 45
sepal length (cm): 0.1081
sepal width (cm): 0.0304
petal length (cm): 0.4195
petal width (cm): 0.4420