分类学习算法的性能度量指标综述

doi:10.11896/jsjkx.200900216

摘要/Abstract

摘要： 在机器学习的分类问题研究中,对分类学习算法的正确评价是非常重要的。现实中,许多性能度量指标被从不同的角度提出,文中主要介绍了基于错误率的、基于混淆矩阵的和基于统计显著性检验的三大类性能度量指标,详细地讨论了分类学习算法各性能度量指标的提出背景、意义以及适用范围,分析了各种性能度量之间的差异,提出和分析了各方法中有待进一步研究的问题和方向。进一步,通过实验数据横向(每类度量中各方法之间的类内差异)和纵向(3类度量之间的类间差异)对照了各性能度量指标之间的差异,分析了各性能度量指标在分类算法选择上的一致性。

关键词: 错误率, 混淆矩阵, 统计检验, 性能度量

Abstract: In the research of classification task of machine learning,it is important for correctly evaluating the performance of the learning algorithm.In practical application,many performance measure indexes are proposed based on different perspectives.Three kinds of performance measure indexes based on error rate,confusion matrix and statistical test are introduced in this paper.The background,significance and scope of each measure index are discussed.The differences of different methods are analyzed.The future research problems and directions are also put forward and analyzed.Furthermore,the differences of these performance measure indexes are also compared by experimental data in portrait and landscape.The consistency of these performance measure indexes is also analyzed in classification algorithm selection.

Key words: Confusion matrix, Error rate, Performance measure, Statistical test

中图分类号:

TP181

杨杏丽. 分类学习算法的性能度量指标综述[J]. 计算机科学, 2021, 48(8): 209-219. https://doi.org/10.11896/jsjkx.200900216

YANG Xing-li. Survey for Performance Measure Index of Classification Learning Algorithm[J]. Computer Science, 2021, 48(8): 209-219. https://doi.org/10.11896/jsjkx.200900216

参考文献

[1]ZHANG X G.Pattern Recognition[M].Beijing:Tsinghua University Press,2010.
[2]BIAN Z Q.Pattern Recognition[M].Beijing:Tsinghua University Press,1988.
[3]DUDA R O,HART P E,STORK D G.Pattern Classification [M].New York:Springer,2001.
[4]HASTIE T,TIBSHRANI R,FRIEDMAN J.The Elements ofStatistical Learning:Data Mining,Inference,and Prediction [M].New York:Springer,2001.
[5]VAPNIK V N.The Nature of Statistical Learning Theory[M].New York:Springer-Verlag,1999.
[6]FERRI C,HERNANDEZ-ORALLO J,MODROIU R.An Experimental Comparison of Performance Measures for Classification[J].Pattern Recognition Letters,2009,30(1):27-38.
[7]SOKOLOVA M,LAPALME G.A Systematic Analysis of Performance Measures for Classification Tasks [J].Information Processing & Management,2009,45(4):427-437.
[8]WEBB A R,COPSEY K D.Introduction to Statistical PatternRecognition [M].Academic Press,1972:2133-2143.
[9]TURNER K,GHOSH J.Estimating the Bayes Error Ratethrough Classifier Combining[C]//International Conference on Pattern Recognition.IEEE Computer Society,1996:695-699.
[10]BREIMAN L.The Little Bootstrap and other Methods for Dimensionality Selection in Regression:X-Fixed Prediction Error [J].Journal of the American Statistical Association,1992,87(419):738-754.
[11]SHAO J.Bootstrap Model Selection[J].Publications of theAmerican Statistical Association,1996,91(434):655-665.
[12]LOPES M E,WANG S,MAHONEY M W.A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication [J].Journal of Machine Learning Research,2019,20:1-40.
[13]BRADLEY E.Prediction,Estimation,and Attribution[J].Journal of the American Statistical Association,2020,115(530):636-655.
[14]YILDIZ O T,ÖZLEM A,AIPAYDIN E.Multivariate Statistical Tests for Comparing Classification Algorithms[C]//International Conference on Learning and Intelligent Optimization.Springer-Verlag,2011:1-15.
[15]GOUTTE C,GAUSSIER E.A Probabilistic Interpretation ofPrecision,Recall and F-Score,with Implication for Evaluation [J].International Journal of Radiation Biology & Related Stu-dies in Physics Chemistry & Medicine,2005,51(5):345-359.
[16]POWERS D M W.Evaluation:From Precision,Recall and F-measure to ROC,Informedness,Markedness and Correlation [J].Journal of Machine Learning Technology,2011,2:37-63.
[17]WANG Y,LI J H,LI Y F,et al.Confidence Interval for F1 Measure of Algorithm Performance based on Blocked 3×2 Cross-validation [J].IEEE Transactions on Knowledge & Data Engineering,2015,27(3):651-659.
[18]MUSCHELLI J.ROC and AUC with a Binary Predictor:a Potentially Misleading Metric [J].Journal of Classification,2020,37(3):696-708.
[19]FAWCETT T.An Introduction to ROC Analysis [J].Pattern Recognition Letters,2006,27(8):861-874.
[20]FLACH P A.The Geometry of ROC Space:Understanding Machine Learning Metrics through ROC Isometrics[C]//Machine Learning,Proceedings of the Twentieth International Confe-rence.DBLP,2003:194-201.
[21]LOBO J M,JIMENEZ-VALVERDE A,REAL R.AUC:a Misleading Measure of the Performance of Predictive Distribution Models [J].Global Ecology & Biogeography,2008,17(2):145-151.
[22]DIETTERICH T G.Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms [J].Neural Computation,1998,10(7):1895-1923.
[23]YANG L,WANG Y.Analysis of Variance of F1 Measure based on Blocked 3×2 Cross Validation [J].Journal of Frontiers of Computer Science and Technology,2016,10(8):1176-1183.
[24]WANG Y,WANG R B,JIA H C,et al.Blocked 3×2 Cross-validated t-test for Comparing Supervised Classification Learning Algorithms [J].Neural Computation,2014,26(1):208-235.
[25]WANG Y,LI J H.Credible Intervals for Precision and Recall Based on a K-Fold Cross-Validated Beta Distribution [J].Neural Computation,2016,28(8):1694-1722.
[26]BISANI M,NEY H.Bootstrap Estimates for Confidence Intervals in ASR Performance Evaluation[C]//IEEE International Conference on Acoustics,Speech,and Signal Processing,2004.
[27]LIU Y Q,WANG Y,LI J H.Model Selection Algorithm based on Blocked 3×2 Cross-validated t-test [J].Journal of Shanxi University of Science & Technology,2015,33(1):179-183.
[28]ZADROZNY B,ELKAN C.Obtaining Calibrated ProbabilityEstimates from Decision Trees and Naive Bayesian Classifiers[C]//Proceedings of the 18^th International Conference on Machine Learning (ICML).2001:609-616.
[29]CORTES C,MOHRI M.AUC Optimization vs. Error Rate Mi-nimization[C]//Advances in Neural Information Processing Systems 16(NIPS 2003).2003:313-320.
[30]ROSSET S.Model Selection via the AUC[C]//Machine Lear-ning Proceedings of the 21st International Conference (ICML).2004:89-96.
[31]FLACH P A.The Geometry of ROC Space:UnderstandingMachine Learning Metrics through ROC Isometrics[C]//Machine Learning,Proceedings of the 20^th International Conference (ICML).2003:194-201.
[32]FUERNKRANZ J,FLACH P A.ROC ‘n' Rule Learning-towards a Better Understanding of Covering Algorithms [J].Mach.Learn.,2005,58(1):39-77.
[33]BUJA A,STUETZLE W,SHEN Y.Loss Functions for Binary Class Probability Estimation:Structure and Applications [EB/OL].[2005-11-03].http://stat.wharton.upenn.edu/~buja/PAPERS/paper-proper-scoring.pdf.
[34]COSTA E P,LORENA A C,CARVALHO A C P L F,et al.A Review of Performance Evaluation Measures for Hierarchical Classifiers[C]//Proceedings of the AAAI 2007 Workshop Eva-luation Methods for Machine Learning.2007.
[35]DEMSAR J.Statistical Comparisons of Classifiers over Multiple Data Sets [J].Journal of Machine Learning Research,2007,7:1-30.
[36]FERRI C,FLACH P A,HERNANDEZ-ORALLO J.Improving the AUC of Probabilistic Estimation Trees[C]//14th European Conference on Machine Learning,Proceedings,Lecture Notes in Computer Science(ECML 2003).Springer,2003:121-132.
[37]ARIS F H,WENCESLAO G M.A Comparative Study of Me-thods for Testing the Equality of Two or More ROC Curves [J].Comput Stat,2018,33:357-377.
[38]MALACH T,POMENKOVA J.Comparing Classifier's Performance Based on Confidence Interval of the ROC[J].Radioen-gineering,2018,27(3):827-834.
[39]DAVIS J,GOADRICH M.The Relationship between Precision-recall and ROC Curves[C]//Proceedings of the 23rd International Conference on Machine Learning(ICML'06).2006:233-240.
[40]WU S,FLACH P A,FERRI C.An Improved Model Selection Heuristic for AUC[C]//18th European Conference on Machine Learning.2007:478-489.
[41]HUANG J,LING C X.Using AUC and Accuracy in Evaluating Learning Algorithms[J].IEEE Trans.Knowl.Data Eng.(TKDE),2005,17(3):299-310.
[42]CARUANA R,NICULESCU-MIZIL A.Data Mining in Metric Space:An Empirical Analysis of Supervised Learning Perfor-mance Criteria[C]//Proceedings of the 10^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning (KDD).2004:69-78.
[43]FERRI C,FLACH P A,HERNANDEZ-ORALLO J ,et al.Modifying ROC Curves to Incorporate Predicted Probabilities[C]//Second Workshop on ROC Analysis in ML.2004:33-40.
[44]BENGIO Y,GR Y.No Unbiased Estimator of the Variance of K-Fold Cross-Validation [J].Journal of Machine Learning Research,2004,5:1089-1105.
[45]MARKATOU M,TIAN H,BISWAS S,et al.Analysis of Va-riance of Cross-Validation Estimators of the Generalization Error[J].Journal of Machine Learning Research,2005,6(1):1127-1168.
[46]MORENOTORRES J G,SAEZ J A,HERRERA F.Study on the Impact of Partition-induced Dataset Shift on k-fold Cross-validation [J].IEEE Transactions on Neural Networks & Learning Systems,2012,23(8):1304-1312.
[47]AKAIKE H.Information Theory and an Extension of the Maximum Likelihood Principle[M]//Breakthroughs in Statistics.New York:Springer,1992:610-624.
[48]SCHWARZ G.Estimating the Dimension of a Model [J].Annals of Statistics,1978,6(2):15-18.
[49]FAWCETT T.Using Rule Sets to Maximize ROC Performance[C]//IEEE International Conference on Data Mining.IEEE Computer Society,2001:131-138.
[50]HAND D J,TILL R J.A Simple Generalisation of the Area under the ROC Curve for Multiple Class Classification Problems [J].Machine Learning,2001,45(2):171-186.
[51]LOPEZ V,FERNANDEZ A,HERRERA F.On the Importance of the Validation Technique for Classification with Imbalanced Datasets:Addressing Covariate Shift When Data is Skewed [J].Information Sciences,2014,257(2):1-13.
[52]EVERITT B S.The Analysis of Contingency Tables [M].London:Chapman and Hall,1977.
[53]WANG R,WANG Y,LI J,et al.Block-Regularized m×2 Cross-validated Estimator of the Generalization Error [J].Neural Computation,2017,29(2):519-554.
[54]HUANG J,LU J,LING C X.Comparing Naive Bayes,Decision Trees,and SVM with AUC and Accuracy[C]//Proceedings of the Third IEEE International Conference on Data Mining (ICDM).IEEE Computer Society,2003:553-556.
[55]DUA D,KARRA TANISKIDOU E.UCI Machine Learning Repository[EB/OL].Irvine,CA:University of California,School of Information and Computer Science,2017.http://archive.ics.uci.edu/ml/index.php.
[56]PAUL H,KENTA N.A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins [J].Intelligent Systems in Molecular Biology,1996,4:109-115.
[57]EVETT I W,SPIEHLER E J.Rule Induction in ForensicScience [J].Knowledge Based Systems,1989,1:152-160.
[58]SIGILLITO V G,WING S P,HUTTON L V,et al.Classification of Radar Returns from the Ionosphere Using Neural Networks [J].Johns Hopkins APL Technical Digest,1989,10:262-266.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed