计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 144-152.doi: 10.11896/jsjkx.230700115
姜高霞1, 王菲1, 许行1, 王文剑1,2
JIANG Gaoxia1, WANG Fei1, XU Hang1, WANG Wenjian1,2
摘要: 较大规模的标注数据集中难免会存在标签噪声,这在一定程度上限制了模型的泛化性能。有序回归数据集的标签是离散值,但不同标签之间又有一定次序关系。虽然有序回归的标签兼有分类和回归标签的特征,但面向分类和回归任务的标签噪声过滤算法对有序标签噪声并不完全适用。针对此问题,提出了标签含噪时回归模型的Akaike泛化误差估计,在此基础上设计了面向有序回归任务的标签噪声过滤框架。此外,提出了一种鲁棒的有序标签噪声估计方法,其采用基于中位数的融合策略以降低异常估计分量的干扰。最后,该方法与所提框架结合形成了噪声鲁棒融合过滤(Robust Fusion Filtering,RFF)算法。在标准数据集和真实年龄估计数据集上均验证了算法的有效性。实验结果表明,在有序回归任务中,RFF算法性能优于其他分类和回归过滤算法,能够适应不同类型的噪声数据,并有效提升数据质量和模型泛化性能。
中图分类号:
[1]HAN B,TSANG I W,CHEN L,et al.Beyond majority voting:a coarse-to-fine label filtration for heavily noisy labels[J].IEEE Transactions on Neural Networks and Learning Systems,2019,30(12):3774-3787. [2]SLUBAN B,GAMBERGER D,LAVRAC N.Ensemble-basednoise detection:noise ranking and visual performance evaluation[J].Data Mining and Knowledge Discovery,2014,28(2):265-303. [3]SHANAB A A,KHOSHGOFTAAR T M,WALD R.Robust-ness of threshold-based feature rankers with data sampling on noisy and imbalanced data[C]//International Conference on Florida Artificial Intelligence Research Society.Marco Island,Florida:AAAI Press,2012. [4]DOYLE O M,WESTMAN E,MARQUAND A F,et al.Predicting progression of Alzheimer’s disease using ordinal regression[J].Plos One,2014,9(8):e105542. [5]CHANG K Y,CHEN C S,HUNG Y P.Ordinal hyperplanes ranker with cost sensitivities for age estimation[C]//International Conference on Computer Vision Pattern Recognition.Piscataway,NJ:IEEE,2011:585-592. [6]FERNANDEZ-NAVARRO F,CAMPOY-MU NOZ P,et al.Addressing the EU sovereign ratings using an ordinal regression approach[J].IEEE Transactions on Cybernetics,2013,43(6):2228-2240. [7]GUTIERREZ P A,PEREZ-ORTIZ M,SANCHEZ-MONEDE-RO J,et al.Ordinal regression methods:survey and experimental study[J].IEEE Transactions on Knowledge and Data Enginee-ring,2015,28(1):127-146. [8]MA W J,DONG H B.Face age classification method based on ensemble learning of convolutional Neural Networks[J].Computer Science,2018,45(1):152-156. [9]AGRESTI A.Analysis of ordinal categorical data with misclassification[J].British Journal of Mathematical and Statistical Psychology,2011,27(391):317-318. [10]XU H,WANG W J,QIAN Y H.Fusing complete monotonic decision trees[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(10):2223-2235. [11]KORDOS M,BIALKA S,BLACHNIK M.Instance selection in logical rule extraction for regression problems[C]//Interna-tional Conference on Artificial Intelligence and Soft Computing.Berlin:Springer,2013:167-175. [12]ARNAIZ-GONZALEZ A,DIEZ-PASTOR J F,RODRÍGUZE J J,et al.Instance selection for regression by discretization[J].Expert Systems with Applications,2016,54:340-350. [13]BRODLEY C E,FRIEDL M A.Identifying mislabeled training data[J].Journal of Artificial Intelligence Research,1999,11:131-167. [14]KHOSHGOFTAAR T M,REBOURS P.Improving softwarequality prediction by noise filtering techniques[J].Journal of Computer Science and Technology,2007,22(3):387-396. [15]SLUBAN B,GAMBERGER D,LAVRA N.Advances in class noise detection[C]//European Conference on Artificial Intelligence.Netherlands:IOS Press,2010:1105-1106. [16]SAEZ J A,GALAR M,LUENGO J,et al.INFFC:An iterative class noise filter based on the fusion of classifiers with noise sensitivity control[J].Information Fusion,2016,27:19-32. [17]JIANG G X,WANG W J,QIAN Y H,et al.A unified sample selection framework for output noise filtering:an error-bound perspective[J].Journal of Machine Learning Research,2021,22(8):1-66. [18]JIANG G X,WANG W J.A Numerical Label noise filtering algorithm for regression task[J].Journal of Computer Research and Development,2022,59(8):1639-1652. [19]ZHANG Z H,JIANG G X,WANG W J.Label noise filtering method based on local probability sampling[J].Journal of Computer Applications,2019,41(1):67-73. [20]ZHANG Z H,JIANG G X,WANG W J.Label noise filteringmethod based on dynamic probability sampling[J].Journal of Computer Applications,2021,41(12):3485-3491. [21]XIA S Y,ZHENG S Y,WANG G Y,et al.Granular ball sampling for noisy label classification or imbalanced classification[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(4):2144-2155. [22]LI Y,HAN H,SHAN S,et al.DISC:Learning from noisy labels via dynamic instance-specific selection and correction[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2023:24070-24079. [23]WEI Q,SUN H,LU X,et al.Self-filtering:a noise-aware sample selection for label noise with confidence penalization[C]//European Conference on Computer Vision.Cham:Springer,2022:516-532. [24]JIANG G X,QIN P,WANG W J.Noise Estimation and Filtering Methods with Limit Distance[J].Computer Science,2023,50(6):151-158. [25]LI C,MAO Z Z.A label noise filtering method for regression based on adaptive threshold and noise score[J].Expert Systems with Applications,2023,228,120422. [26]CHERKASSKY V,MA Y Q.Comparison of model selection for regression[J].Neural Computation,2003,15(7):1691-1714. [27]CHU W,GHAHRAMANI Z,et al.Gaussian processes for ordinal regression[J].The Journal of Machine Research,2004,6(3):1019-1041. [28]ASUNCION A,NEWMAN D.UCI machine learning repository[DB/OL].http://www.ics.uci.edu/~mlearn/MLRepository.html. [29]PASCAL:Pascal(Pattern Analysis,Statistical Modelling andComputational Learning) machine learning benchmarks repository[DB/OL].http://mldata.org/. [30]ERAN E,ROEE E,TAL H.Age and gender estimation of unfiltered faces[J].Transactions on Information Forensics and Security,2014,9(12):2170-2179. |
|