计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 209-219.doi: 10.11896/jsjkx.200900216

• 人工智能 • 上一篇    下一篇

分类学习算法的性能度量指标综述

杨杏丽   

  1. 山西大学数学科学学院 太原030006
    山西大学计算机与信息技术学院 太原030006
  • 收稿日期:2020-09-30 修回日期:2020-12-29 发布日期:2021-08-10
  • 通讯作者: 杨杏丽(yangxingli@sxu.edu.cn)
  • 基金资助:
    国家自然科学基金(62076156,61806115);山西省应用基础研究项目(201901D111034,201801D211002);统计与数据科学前沿理论及应用教育部重点实验室开放研究课题(KLATASDS2007)

Survey for Performance Measure Index of Classification Learning Algorithm

YANG Xing-li   

  1. School of Mathematical Sciences,Shanxi University,Taiyuan 030006,China;School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
  • Received:2020-09-30 Revised:2020-12-29 Published:2021-08-10
  • About author:YANG Xing-li,born in 1986,Ph.D candidate,lecturer,is a member of China Computer Federation.Her main research interest includes statistical machine learning.
  • Supported by:
    National Natural Science Foundation of China(62076156,61806115),Shanxi Applied Basic Research Program(201901D111034,201801D211002) and Open Research Fund of Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE,ECNU(KLATASDS2007).

摘要: 在机器学习的分类问题研究中,对分类学习算法的正确评价是非常重要的。现实中,许多性能度量指标被从不同的角度提出,文中主要介绍了基于错误率的、基于混淆矩阵的和基于统计显著性检验的三大类性能度量指标,详细地讨论了分类学习算法各性能度量指标的提出背景、意义以及适用范围,分析了各种性能度量之间的差异,提出和分析了各方法中有待进一步研究的问题和方向。进一步,通过实验数据横向(每类度量中各方法之间的类内差异)和纵向(3类度量之间的类间差异)对照了各性能度量指标之间的差异,分析了各性能度量指标在分类算法选择上的一致性。

关键词: 错误率, 混淆矩阵, 统计检验, 性能度量

Abstract: In the research of classification task of machine learning,it is important for correctly evaluating the performance of the learning algorithm.In practical application,many performance measure indexes are proposed based on different perspectives.Three kinds of performance measure indexes based on error rate,confusion matrix and statistical test are introduced in this paper.The background,significance and scope of each measure index are discussed.The differences of different methods are analyzed.The future research problems and directions are also put forward and analyzed.Furthermore,the differences of these performance measure indexes are also compared by experimental data in portrait and landscape.The consistency of these performance measure indexes is also analyzed in classification algorithm selection.

Key words: Confusion matrix, Error rate, Performance measure, Statistical test

中图分类号: 

  • TP181
[1]ZHANG X G.Pattern Recognition[M].Beijing:Tsinghua University Press,2010.
[2]BIAN Z Q.Pattern Recognition[M].Beijing:Tsinghua University Press,1988.
[3]DUDA R O,HART P E,STORK D G.Pattern Classification [M].New York:Springer,2001.
[4]HASTIE T,TIBSHRANI R,FRIEDMAN J.The Elements ofStatistical Learning:Data Mining,Inference,and Prediction [M].New York:Springer,2001.
[5]VAPNIK V N.The Nature of Statistical Learning Theory[M].New York:Springer-Verlag,1999.
[6]FERRI C,HERNANDEZ-ORALLO J,MODROIU R.An Experimental Comparison of Performance Measures for Classification[J].Pattern Recognition Letters,2009,30(1):27-38.
[7]SOKOLOVA M,LAPALME G.A Systematic Analysis of Performance Measures for Classification Tasks [J].Information Processing & Management,2009,45(4):427-437.
[8]WEBB A R,COPSEY K D.Introduction to Statistical PatternRecognition [M].Academic Press,1972:2133-2143.
[9]TURNER K,GHOSH J.Estimating the Bayes Error Ratethrough Classifier Combining[C]//International Conference on Pattern Recognition.IEEE Computer Society,1996:695-699.
[10]BREIMAN L.The Little Bootstrap and other Methods for Dimensionality Selection in Regression:X-Fixed Prediction Error [J].Journal of the American Statistical Association,1992,87(419):738-754.
[11]SHAO J.Bootstrap Model Selection[J].Publications of theAmerican Statistical Association,1996,91(434):655-665.
[12]LOPES M E,WANG S,MAHONEY M W.A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication [J].Journal of Machine Learning Research,2019,20:1-40.
[13]BRADLEY E.Prediction,Estimation,and Attribution[J].Journal of the American Statistical Association,2020,115(530):636-655.
[14]YILDIZ O T,ÖZLEM A,AIPAYDIN E.Multivariate Statistical Tests for Comparing Classification Algorithms[C]//International Conference on Learning and Intelligent Optimization.Springer-Verlag,2011:1-15.
[15]GOUTTE C,GAUSSIER E.A Probabilistic Interpretation ofPrecision,Recall and F-Score,with Implication for Evaluation [J].International Journal of Radiation Biology & Related Stu-dies in Physics Chemistry & Medicine,2005,51(5):345-359.
[16]POWERS D M W.Evaluation:From Precision,Recall and F-measure to ROC,Informedness,Markedness and Correlation [J].Journal of Machine Learning Technology,2011,2:37-63.
[17]WANG Y,LI J H,LI Y F,et al.Confidence Interval for F1 Measure of Algorithm Performance based on Blocked 3×2 Cross-validation [J].IEEE Transactions on Knowledge & Data Engineering,2015,27(3):651-659.
[18]MUSCHELLI J.ROC and AUC with a Binary Predictor:a Potentially Misleading Metric [J].Journal of Classification,2020,37(3):696-708.
[19]FAWCETT T.An Introduction to ROC Analysis [J].Pattern Recognition Letters,2006,27(8):861-874.
[20]FLACH P A.The Geometry of ROC Space:Understanding Machine Learning Metrics through ROC Isometrics[C]//Machine Learning,Proceedings of the Twentieth International Confe-rence.DBLP,2003:194-201.
[21]LOBO J M,JIMENEZ-VALVERDE A,REAL R.AUC:a Misleading Measure of the Performance of Predictive Distribution Models [J].Global Ecology & Biogeography,2008,17(2):145-151.
[22]DIETTERICH T G.Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms [J].Neural Computation,1998,10(7):1895-1923.
[23]YANG L,WANG Y.Analysis of Variance of F1 Measure based on Blocked 3×2 Cross Validation [J].Journal of Frontiers of Computer Science and Technology,2016,10(8):1176-1183.
[24]WANG Y,WANG R B,JIA H C,et al.Blocked 3×2 Cross-validated t-test for Comparing Supervised Classification Learning Algorithms [J].Neural Computation,2014,26(1):208-235.
[25]WANG Y,LI J H.Credible Intervals for Precision and Recall Based on a K-Fold Cross-Validated Beta Distribution [J].Neural Computation,2016,28(8):1694-1722.
[26]BISANI M,NEY H.Bootstrap Estimates for Confidence Intervals in ASR Performance Evaluation[C]//IEEE International Conference on Acoustics,Speech,and Signal Processing,2004.
[27]LIU Y Q,WANG Y,LI J H.Model Selection Algorithm based on Blocked 3×2 Cross-validated t-test [J].Journal of Shanxi University of Science & Technology,2015,33(1):179-183.
[28]ZADROZNY B,ELKAN C.Obtaining Calibrated ProbabilityEstimates from Decision Trees and Naive Bayesian Classifiers[C]//Proceedings of the 18th International Conference on Machine Learning (ICML).2001:609-616.
[29]CORTES C,MOHRI M.AUC Optimization vs. Error Rate Mi-nimization[C]//Advances in Neural Information Processing Systems 16(NIPS 2003).2003:313-320.
[30]ROSSET S.Model Selection via the AUC[C]//Machine Lear-ning Proceedings of the 21st International Conference (ICML).2004:89-96.
[31]FLACH P A.The Geometry of ROC Space:UnderstandingMachine Learning Metrics through ROC Isometrics[C]//Machine Learning,Proceedings of the 20th International Conference (ICML).2003:194-201.
[32]FUERNKRANZ J,FLACH P A.ROC ‘n' Rule Learning-towards a Better Understanding of Covering Algorithms [J].Mach.Learn.,2005,58(1):39-77.
[33]BUJA A,STUETZLE W,SHEN Y.Loss Functions for Binary Class Probability Estimation:Structure and Applications [EB/OL].[2005-11-03].http://stat.wharton.upenn.edu/~buja/PAPERS/paper-proper-scoring.pdf.
[34]COSTA E P,LORENA A C,CARVALHO A C P L F,et al.A Review of Performance Evaluation Measures for Hierarchical Classifiers[C]//Proceedings of the AAAI 2007 Workshop Eva-luation Methods for Machine Learning.2007.
[35]DEMSAR J.Statistical Comparisons of Classifiers over Multiple Data Sets [J].Journal of Machine Learning Research,2007,7:1-30.
[36]FERRI C,FLACH P A,HERNANDEZ-ORALLO J.Improving the AUC of Probabilistic Estimation Trees[C]//14th European Conference on Machine Learning,Proceedings,Lecture Notes in Computer Science(ECML 2003).Springer,2003:121-132.
[37]ARIS F H,WENCESLAO G M.A Comparative Study of Me-thods for Testing the Equality of Two or More ROC Curves [J].Comput Stat,2018,33:357-377.
[38]MALACH T,POMENKOVA J.Comparing Classifier's Performance Based on Confidence Interval of the ROC[J].Radioen-gineering,2018,27(3):827-834.
[39]DAVIS J,GOADRICH M.The Relationship between Precision-recall and ROC Curves[C]//Proceedings of the 23rd International Conference on Machine Learning(ICML'06).2006:233-240.
[40]WU S,FLACH P A,FERRI C.An Improved Model Selection Heuristic for AUC[C]//18th European Conference on Machine Learning.2007:478-489.
[41]HUANG J,LING C X.Using AUC and Accuracy in Evaluating Learning Algorithms[J].IEEE Trans.Knowl.Data Eng.(TKDE),2005,17(3):299-310.
[42]CARUANA R,NICULESCU-MIZIL A.Data Mining in Metric Space:An Empirical Analysis of Supervised Learning Perfor-mance Criteria[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning (KDD).2004:69-78.
[43]FERRI C,FLACH P A,HERNANDEZ-ORALLO J ,et al.Modifying ROC Curves to Incorporate Predicted Probabilities[C]//Second Workshop on ROC Analysis in ML.2004:33-40.
[44]BENGIO Y,GR Y.No Unbiased Estimator of the Variance of K-Fold Cross-Validation [J].Journal of Machine Learning Research,2004,5:1089-1105.
[45]MARKATOU M,TIAN H,BISWAS S,et al.Analysis of Va-riance of Cross-Validation Estimators of the Generalization Error[J].Journal of Machine Learning Research,2005,6(1):1127-1168.
[46]MORENOTORRES J G,SAEZ J A,HERRERA F.Study on the Impact of Partition-induced Dataset Shift on k-fold Cross-validation [J].IEEE Transactions on Neural Networks & Learning Systems,2012,23(8):1304-1312.
[47]AKAIKE H.Information Theory and an Extension of the Maximum Likelihood Principle[M]//Breakthroughs in Statistics.New York:Springer,1992:610-624.
[48]SCHWARZ G.Estimating the Dimension of a Model [J].Annals of Statistics,1978,6(2):15-18.
[49]FAWCETT T.Using Rule Sets to Maximize ROC Performance[C]//IEEE International Conference on Data Mining.IEEE Computer Society,2001:131-138.
[50]HAND D J,TILL R J.A Simple Generalisation of the Area under the ROC Curve for Multiple Class Classification Problems [J].Machine Learning,2001,45(2):171-186.
[51]LOPEZ V,FERNANDEZ A,HERRERA F.On the Importance of the Validation Technique for Classification with Imbalanced Datasets:Addressing Covariate Shift When Data is Skewed [J].Information Sciences,2014,257(2):1-13.
[52]EVERITT B S.The Analysis of Contingency Tables [M].London:Chapman and Hall,1977.
[53]WANG R,WANG Y,LI J,et al.Block-Regularized m×2 Cross-validated Estimator of the Generalization Error [J].Neural Computation,2017,29(2):519-554.
[54]HUANG J,LU J,LING C X.Comparing Naive Bayes,Decision Trees,and SVM with AUC and Accuracy[C]//Proceedings of the Third IEEE International Conference on Data Mining (ICDM).IEEE Computer Society,2003:553-556.
[55]DUA D,KARRA TANISKIDOU E.UCI Machine Learning Repository[EB/OL].Irvine,CA:University of California,School of Information and Computer Science,2017.http://archive.ics.uci.edu/ml/index.php.
[56]PAUL H,KENTA N.A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins [J].Intelligent Systems in Molecular Biology,1996,4:109-115.
[57]EVETT I W,SPIEHLER E J.Rule Induction in ForensicScience [J].Knowledge Based Systems,1989,1:152-160.
[58]SIGILLITO V G,WING S P,HUTTON L V,et al.Classification of Radar Returns from the Ionosphere Using Neural Networks [J].Johns Hopkins APL Technical Digest,1989,10:262-266.
[1] 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波.
语义通信系统的性能度量指标分析
Analysis of Performance Metrics of Semantic Communication Systems
计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071
[2] 徐浩, 刘岳镭.
基于深度学习的无人机声音识别算法
UAV Sound Recognition Algorithm Based on Deep Learning
计算机科学, 2021, 48(7): 225-232. https://doi.org/10.11896/jsjkx.200500091
[3] 程珍, 赵慧婷, 章益铭, 林飞.
扩散的多播分子通信网络的比特错误率分析
Bit Error Rate Analysis of Diffusion-based Multicast Molecular Communication Networks
计算机科学, 2019, 46(11): 80-87. https://doi.org/10.11896/jsjkx.181001925
[4] 王裴岩,蔡东风.
基于统计检验的核函数度量方法研究
Statistical Testing Based Research on Kernel Evaluation Measures
计算机科学, 2015, 42(4): 199-205. https://doi.org/10.11896/j.issn.1002-137X.2015.04.040
[5] .
基于新颜色空间YCgCr的复杂背景图像下的肤色区域检测

计算机科学, 2007, 34(8): 229-231.
[6] 赵学龙 王庆梅 许满武 刘凤玉.
基于一维扩展元胞自动机的伪随机数发生器研究

计算机科学, 2005, 32(4): 137-139.
[7] 魏玲 张文修.
决策表分析的统计依据

计算机科学, 2005, 32(4): 19-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!