SKlearn模型评估方法

发布 : 2020-01-23 分类 : 机器学习 浏览 :

准确率

1.accuracy_score

1
2
3
4
5
6
7
8
9
10
11
# 准确率
import numpy as np
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3,9,9,8,5,8]
y_true = [0, 1, 2, 3,2,6,3,5,9]

accuracy_score(y_true, y_pred)
Out[127]: 0.33333333333333331

accuracy_score(y_true, y_pred, normalize=False) # 类似海明距离,每个类别求准确后,再求微平均
Out[128]: 3

2.metrics

  • 宏平均比微平均更合理,但也不是说微平均一无是处,具体使用哪种评测机制,还是要取决于数据集中样本分布

  • 宏平均(Macro-averaging),是先对每一个类统计指标值,然后在对所有类求算术平均值。

  • 微平均(Micro-averaging),是对数据集中的每一个实例不分类别进行统计建立全局混淆矩阵,然后计算相应指标。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    from sklearn import metrics
    metrics.precision_score(y_true, y_pred, average='micro') # 微平均,精确率
    Out[130]: 0.33333333333333331

    metrics.precision_score(y_true, y_pred, average='macro') # 宏平均,精确率
    Out[131]: 0.375

    metrics.precision_score(y_true, y_pred, labels=[0, 1, 2, 3], average='macro') # 指定特定分类标签的精确率
    Out[133]: 0.5
  • 其中average参数有五种:(None, ‘micro’, ‘macro’, ‘weighted’, ‘samples’)
    召回率

    1
    2
    3
    4
    5
    metrics.recall_score(y_true, y_pred, average='micro')
    Out[134]: 0.33333333333333331

    metrics.recall_score(y_true, y_pred, average='macro')
    Out[135]: 0.3125

F1

1
2
metrics.f1_score(y_true, y_pred, average='weighted')  
Out[136]: 0.37037037037037035

F2

根据公式计算

1
2
3
4
5
6
from sklearn.metrics import precision_score, recall_score
def calc_f2(label, predict):
p = precision_score(label, predict)
r = recall_score(label, predict)
f2_score = 5*p*r / (4*p + r)
return f2_score

混淆矩阵

1
2
3
4
5
6
7
8
9
10
11
from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_pred)

Out[137]:
array([[1, 0, 0, ..., 0, 0, 0],
[0, 0, 1, ..., 0, 0, 0],
[0, 1, 0, ..., 0, 0, 1],
...,
[0, 0, 0, ..., 0, 0, 1],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 1, 0]])

分类报告

包含:precision/recall/fi-score/均值/分类个数

1
2
3
4
5
6
# 分类报告:precision/recall/fi-score/均值/分类个数
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 2, 0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))

输出

1
2
3
4
5
6
7
              precision    recall  f1-score   support

class 0 0.67 1.00 0.80 2
class 1 0.00 0.00 0.00 1
class 2 1.00 1.00 1.00 2

avg / total 0.67 0.80 0.72 5

kappa score

  • kappa score是一个介于(-1, 1)之间的数. score>0.8意味着好的分类;0或更低意味着不好(实际是随机标签)

    1
    2
    3
    4
    from sklearn.metrics import cohen_kappa_score
    y_true = [2, 0, 2, 2, 0, 1]
    y_pred = [0, 0, 2, 2, 0, 2]
    cohen_kappa_score(y_true, y_pred)
  • ROC
    1.计算ROC值

    1
    2
    3
    4
    5
    import numpy as np
    from sklearn.metrics import roc_auc_score
    y_true = np.array([0, 0, 1, 1])
    y_scores = np.array([0.1, 0.4, 0.35, 0.8])
    roc_auc_score(y_true, y_scores)
  • 2.ROC曲线

    1
    2
    3
    y = np.array([1, 1, 2, 2])
    scores = np.array([0.1, 0.4, 0.35, 0.8])
    fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2)
  • 海明距离

    1
    2
    3
    4
    5
    from sklearn.metrics import hamming_loss
    y_pred = [1, 2, 3, 4]
    y_true = [2, 2, 3, 4]
    hamming_loss(y_true, y_pred)
    0.25

Jaccard距离

1
2
3
4
5
6
7
8
import numpy as np
from sklearn.metrics import jaccard_similarity_score
y_pred = [0, 2, 1, 3,4]
y_true = [0, 1, 2, 3,4]
jaccard_similarity_score(y_true, y_pred)
0.5
jaccard_similarity_score(y_true, y_pred, normalize=False)
2

可释方差值(Explained variance score)

1
2
3
4
from sklearn.metrics import explained_variance_score
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
explained_variance_score(y_true, y_pred)

平均绝对误差(Mean absolute error)

1
2
3
4
from sklearn.metrics import mean_absolute_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_absolute_error(y_true, y_pred)

均方误差(Mean squared error)

1
2
3
4
from sklearn.metrics import mean_squared_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_squared_error(y_true, y_pred)

中值绝对误差(Median absolute error)

1
2
3
4
from sklearn.metrics import median_absolute_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
median_absolute_error(y_true, y_pred)

R方值,确定系数

1
2
3
4
from sklearn.metrics import r2_score
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
r2_score(y_true, y_pred)

参考文献

本文作者 : HeoLis
原文链接 : http://ishero.net/SKlearn%E6%A8%A1%E5%9E%8B%E8%AF%84%E4%BC%B0%E6%96%B9%E6%B3%95.html
版权声明 : 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明出处!

学习、记录、分享、获得

微信扫一扫, 向我投食

微信扫一扫, 向我投食