计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 209-219.doi: 10.11896/jsjkx.200900216
杨杏丽
YANG Xing-li
摘要: 在机器学习的分类问题研究中,对分类学习算法的正确评价是非常重要的。现实中,许多性能度量指标被从不同的角度提出,文中主要介绍了基于错误率的、基于混淆矩阵的和基于统计显著性检验的三大类性能度量指标,详细地讨论了分类学习算法各性能度量指标的提出背景、意义以及适用范围,分析了各种性能度量之间的差异,提出和分析了各方法中有待进一步研究的问题和方向。进一步,通过实验数据横向(每类度量中各方法之间的类内差异)和纵向(3类度量之间的类间差异)对照了各性能度量指标之间的差异,分析了各性能度量指标在分类算法选择上的一致性。
中图分类号:
[1]ZHANG X G.Pattern Recognition[M].Beijing:Tsinghua University Press,2010. [2]BIAN Z Q.Pattern Recognition[M].Beijing:Tsinghua University Press,1988. [3]DUDA R O,HART P E,STORK D G.Pattern Classification [M].New York:Springer,2001. [4]HASTIE T,TIBSHRANI R,FRIEDMAN J.The Elements ofStatistical Learning:Data Mining,Inference,and Prediction [M].New York:Springer,2001. [5]VAPNIK V N.The Nature of Statistical Learning Theory[M].New York:Springer-Verlag,1999. [6]FERRI C,HERNANDEZ-ORALLO J,MODROIU R.An Experimental Comparison of Performance Measures for Classification[J].Pattern Recognition Letters,2009,30(1):27-38. [7]SOKOLOVA M,LAPALME G.A Systematic Analysis of Performance Measures for Classification Tasks [J].Information Processing & Management,2009,45(4):427-437. [8]WEBB A R,COPSEY K D.Introduction to Statistical PatternRecognition [M].Academic Press,1972:2133-2143. [9]TURNER K,GHOSH J.Estimating the Bayes Error Ratethrough Classifier Combining[C]//International Conference on Pattern Recognition.IEEE Computer Society,1996:695-699. [10]BREIMAN L.The Little Bootstrap and other Methods for Dimensionality Selection in Regression:X-Fixed Prediction Error [J].Journal of the American Statistical Association,1992,87(419):738-754. [11]SHAO J.Bootstrap Model Selection[J].Publications of theAmerican Statistical Association,1996,91(434):655-665. [12]LOPES M E,WANG S,MAHONEY M W.A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication [J].Journal of Machine Learning Research,2019,20:1-40. [13]BRADLEY E.Prediction,Estimation,and Attribution[J].Journal of the American Statistical Association,2020,115(530):636-655. [14]YILDIZ O T,ÖZLEM A,AIPAYDIN E.Multivariate Statistical Tests for Comparing Classification Algorithms[C]//International Conference on Learning and Intelligent Optimization.Springer-Verlag,2011:1-15. [15]GOUTTE C,GAUSSIER E.A Probabilistic Interpretation ofPrecision,Recall and F-Score,with Implication for Evaluation [J].International Journal of Radiation Biology & Related Stu-dies in Physics Chemistry & Medicine,2005,51(5):345-359. [16]POWERS D M W.Evaluation:From Precision,Recall and F-measure to ROC,Informedness,Markedness and Correlation [J].Journal of Machine Learning Technology,2011,2:37-63. [17]WANG Y,LI J H,LI Y F,et al.Confidence Interval for F1 Measure of Algorithm Performance based on Blocked 3×2 Cross-validation [J].IEEE Transactions on Knowledge & Data Engineering,2015,27(3):651-659. [18]MUSCHELLI J.ROC and AUC with a Binary Predictor:a Potentially Misleading Metric [J].Journal of Classification,2020,37(3):696-708. [19]FAWCETT T.An Introduction to ROC Analysis [J].Pattern Recognition Letters,2006,27(8):861-874. [20]FLACH P A.The Geometry of ROC Space:Understanding Machine Learning Metrics through ROC Isometrics[C]//Machine Learning,Proceedings of the Twentieth International Confe-rence.DBLP,2003:194-201. [21]LOBO J M,JIMENEZ-VALVERDE A,REAL R.AUC:a Misleading Measure of the Performance of Predictive Distribution Models [J].Global Ecology & Biogeography,2008,17(2):145-151. [22]DIETTERICH T G.Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms [J].Neural Computation,1998,10(7):1895-1923. [23]YANG L,WANG Y.Analysis of Variance of F1 Measure based on Blocked 3×2 Cross Validation [J].Journal of Frontiers of Computer Science and Technology,2016,10(8):1176-1183. [24]WANG Y,WANG R B,JIA H C,et al.Blocked 3×2 Cross-validated t-test for Comparing Supervised Classification Learning Algorithms [J].Neural Computation,2014,26(1):208-235. [25]WANG Y,LI J H.Credible Intervals for Precision and Recall Based on a K-Fold Cross-Validated Beta Distribution [J].Neural Computation,2016,28(8):1694-1722. [26]BISANI M,NEY H.Bootstrap Estimates for Confidence Intervals in ASR Performance Evaluation[C]//IEEE International Conference on Acoustics,Speech,and Signal Processing,2004. [27]LIU Y Q,WANG Y,LI J H.Model Selection Algorithm based on Blocked 3×2 Cross-validated t-test [J].Journal of Shanxi University of Science & Technology,2015,33(1):179-183. [28]ZADROZNY B,ELKAN C.Obtaining Calibrated ProbabilityEstimates from Decision Trees and Naive Bayesian Classifiers[C]//Proceedings of the 18th International Conference on Machine Learning (ICML).2001:609-616. [29]CORTES C,MOHRI M.AUC Optimization vs. Error Rate Mi-nimization[C]//Advances in Neural Information Processing Systems 16(NIPS 2003).2003:313-320. [30]ROSSET S.Model Selection via the AUC[C]//Machine Lear-ning Proceedings of the 21st International Conference (ICML).2004:89-96. [31]FLACH P A.The Geometry of ROC Space:UnderstandingMachine Learning Metrics through ROC Isometrics[C]//Machine Learning,Proceedings of the 20th International Conference (ICML).2003:194-201. [32]FUERNKRANZ J,FLACH P A.ROC ‘n' Rule Learning-towards a Better Understanding of Covering Algorithms [J].Mach.Learn.,2005,58(1):39-77. [33]BUJA A,STUETZLE W,SHEN Y.Loss Functions for Binary Class Probability Estimation:Structure and Applications [EB/OL].[2005-11-03].http://stat.wharton.upenn.edu/~buja/PAPERS/paper-proper-scoring.pdf. [34]COSTA E P,LORENA A C,CARVALHO A C P L F,et al.A Review of Performance Evaluation Measures for Hierarchical Classifiers[C]//Proceedings of the AAAI 2007 Workshop Eva-luation Methods for Machine Learning.2007. [35]DEMSAR J.Statistical Comparisons of Classifiers over Multiple Data Sets [J].Journal of Machine Learning Research,2007,7:1-30. [36]FERRI C,FLACH P A,HERNANDEZ-ORALLO J.Improving the AUC of Probabilistic Estimation Trees[C]//14th European Conference on Machine Learning,Proceedings,Lecture Notes in Computer Science(ECML 2003).Springer,2003:121-132. [37]ARIS F H,WENCESLAO G M.A Comparative Study of Me-thods for Testing the Equality of Two or More ROC Curves [J].Comput Stat,2018,33:357-377. [38]MALACH T,POMENKOVA J.Comparing Classifier's Performance Based on Confidence Interval of the ROC[J].Radioen-gineering,2018,27(3):827-834. [39]DAVIS J,GOADRICH M.The Relationship between Precision-recall and ROC Curves[C]//Proceedings of the 23rd International Conference on Machine Learning(ICML'06).2006:233-240. [40]WU S,FLACH P A,FERRI C.An Improved Model Selection Heuristic for AUC[C]//18th European Conference on Machine Learning.2007:478-489. [41]HUANG J,LING C X.Using AUC and Accuracy in Evaluating Learning Algorithms[J].IEEE Trans.Knowl.Data Eng.(TKDE),2005,17(3):299-310. [42]CARUANA R,NICULESCU-MIZIL A.Data Mining in Metric Space:An Empirical Analysis of Supervised Learning Perfor-mance Criteria[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning (KDD).2004:69-78. [43]FERRI C,FLACH P A,HERNANDEZ-ORALLO J ,et al.Modifying ROC Curves to Incorporate Predicted Probabilities[C]//Second Workshop on ROC Analysis in ML.2004:33-40. [44]BENGIO Y,GR Y.No Unbiased Estimator of the Variance of K-Fold Cross-Validation [J].Journal of Machine Learning Research,2004,5:1089-1105. [45]MARKATOU M,TIAN H,BISWAS S,et al.Analysis of Va-riance of Cross-Validation Estimators of the Generalization Error[J].Journal of Machine Learning Research,2005,6(1):1127-1168. [46]MORENOTORRES J G,SAEZ J A,HERRERA F.Study on the Impact of Partition-induced Dataset Shift on k-fold Cross-validation [J].IEEE Transactions on Neural Networks & Learning Systems,2012,23(8):1304-1312. [47]AKAIKE H.Information Theory and an Extension of the Maximum Likelihood Principle[M]//Breakthroughs in Statistics.New York:Springer,1992:610-624. [48]SCHWARZ G.Estimating the Dimension of a Model [J].Annals of Statistics,1978,6(2):15-18. [49]FAWCETT T.Using Rule Sets to Maximize ROC Performance[C]//IEEE International Conference on Data Mining.IEEE Computer Society,2001:131-138. [50]HAND D J,TILL R J.A Simple Generalisation of the Area under the ROC Curve for Multiple Class Classification Problems [J].Machine Learning,2001,45(2):171-186. [51]LOPEZ V,FERNANDEZ A,HERRERA F.On the Importance of the Validation Technique for Classification with Imbalanced Datasets:Addressing Covariate Shift When Data is Skewed [J].Information Sciences,2014,257(2):1-13. [52]EVERITT B S.The Analysis of Contingency Tables [M].London:Chapman and Hall,1977. [53]WANG R,WANG Y,LI J,et al.Block-Regularized m×2 Cross-validated Estimator of the Generalization Error [J].Neural Computation,2017,29(2):519-554. [54]HUANG J,LU J,LING C X.Comparing Naive Bayes,Decision Trees,and SVM with AUC and Accuracy[C]//Proceedings of the Third IEEE International Conference on Data Mining (ICDM).IEEE Computer Society,2003:553-556. [55]DUA D,KARRA TANISKIDOU E.UCI Machine Learning Repository[EB/OL].Irvine,CA:University of California,School of Information and Computer Science,2017.http://archive.ics.uci.edu/ml/index.php. [56]PAUL H,KENTA N.A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins [J].Intelligent Systems in Molecular Biology,1996,4:109-115. [57]EVETT I W,SPIEHLER E J.Rule Induction in ForensicScience [J].Knowledge Based Systems,1989,1:152-160. [58]SIGILLITO V G,WING S P,HUTTON L V,et al.Classification of Radar Returns from the Ionosphere Using Neural Networks [J].Johns Hopkins APL Technical Digest,1989,10:262-266. |
[1] | 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波. 语义通信系统的性能度量指标分析 Analysis of Performance Metrics of Semantic Communication Systems 计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071 |
[2] | 徐浩, 刘岳镭. 基于深度学习的无人机声音识别算法 UAV Sound Recognition Algorithm Based on Deep Learning 计算机科学, 2021, 48(7): 225-232. https://doi.org/10.11896/jsjkx.200500091 |
[3] | 程珍, 赵慧婷, 章益铭, 林飞. 扩散的多播分子通信网络的比特错误率分析 Bit Error Rate Analysis of Diffusion-based Multicast Molecular Communication Networks 计算机科学, 2019, 46(11): 80-87. https://doi.org/10.11896/jsjkx.181001925 |
[4] | 王裴岩,蔡东风. 基于统计检验的核函数度量方法研究 Statistical Testing Based Research on Kernel Evaluation Measures 计算机科学, 2015, 42(4): 199-205. https://doi.org/10.11896/j.issn.1002-137X.2015.04.040 |
[5] | . 基于新颜色空间YCgCr的复杂背景图像下的肤色区域检测 计算机科学, 2007, 34(8): 229-231. |
[6] | 赵学龙 王庆梅 许满武 刘凤玉. 基于一维扩展元胞自动机的伪随机数发生器研究 计算机科学, 2005, 32(4): 137-139. |
[7] | 魏玲 张文修. 决策表分析的统计依据 计算机科学, 2005, 32(4): 19-21. |
|