计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 144-152.doi: 10.11896/jsjkx.230700115

• 数据库&大数据&数据科学 • 上一篇    下一篇

有序标签噪声的鲁棒估计与过滤方法

姜高霞1, 王菲1, 许行1, 王文剑1,2   

  1. 1 山西大学计算机与信息技术学院 太原 030006
    2 计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006
  • 收稿日期:2023-07-17 修回日期:2023-12-01 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 王文剑(wjwang@sxu.edu.cn)
  • 作者简介:(jianggaoxia@sxu.edu.cn)
  • 基金资助:
    国家自然科学基金(62276161,U21A20513,62076154,62206161);山西省重点研发计划(202202020101003,202302010101007);山西省基础研究计划(202303021221055)

Robust Estimation and Filtering Methods for Ordinal Label Noise

JIANG Gaoxia1, WANG Fei1, XU Hang1, WANG Wenjian1,2   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University),Taiyuan 030006,China
  • Received:2023-07-17 Revised:2023-12-01 Online:2024-06-15 Published:2024-06-05
  • About author:JIANG Gaoxia,born in 1987,Ph.D,associate professor,is a member of CCF(No.49561M).His main research interests include machine learning and data mining.
    WANG Wenjian,born in 1968,Ph.D,professor,is an outstanding member of CCF(No.16143D).Her main research interests include machine learning and computing intelligence.
  • Supported by:
    National Natural Science Foundation of China(62276161,U21A20513,62076154,62206161),Key R & D Program of Shanxi Province(202202020101003,202302010101007) and Fundamental Research Program of Shanxi Province(202303021221055).

摘要: 较大规模的标注数据集中难免会存在标签噪声,这在一定程度上限制了模型的泛化性能。有序回归数据集的标签是离散值,但不同标签之间又有一定次序关系。虽然有序回归的标签兼有分类和回归标签的特征,但面向分类和回归任务的标签噪声过滤算法对有序标签噪声并不完全适用。针对此问题,提出了标签含噪时回归模型的Akaike泛化误差估计,在此基础上设计了面向有序回归任务的标签噪声过滤框架。此外,提出了一种鲁棒的有序标签噪声估计方法,其采用基于中位数的融合策略以降低异常估计分量的干扰。最后,该方法与所提框架结合形成了噪声鲁棒融合过滤(Robust Fusion Filtering,RFF)算法。在标准数据集和真实年龄估计数据集上均验证了算法的有效性。实验结果表明,在有序回归任务中,RFF算法性能优于其他分类和回归过滤算法,能够适应不同类型的噪声数据,并有效提升数据质量和模型泛化性能。

关键词: 标签噪声, 有序回归, Akaike泛化误差估计, 噪声过滤, 鲁棒噪声估计

Abstract: Large-scale labeled datasets inevitably contain label noise,which limits the generalization performance of the model to some extent.The labels of ordinal regression datasets are discrete values,but there exist ordinal relationships between different labels.Although the labels of ordinal regression have the characteristics of both classification and regression labels,the label noise filtering algorithms for classification and regression tasks are not fully applicable to ordinal label noise.To solve this problem,the Akaike generalization error estimation of regression model with label noise is proposed.On this basis,a label noise filtering framework for ordinal regression task is designed.Besides,a robust ordinal label noise estimation method is proposed.It adopts a me-dian-based fusion strategy to reduce the interference of abnormal estimated components.Finally,this estimation method is combined with the proposed framework to form a noise robust fusion filtering(RFF) algorithm.The effectiveness of the RFF is verified on benchmark datasets and a real age estimation dataset.Experimental results show that the performance of RFF algorithm is better than that of other classification and regression filtering algorithms in ordinal regression tasks.It is adaptive to different kinds of noises and could effectively improve the data quality and model generalization performance.

Key words: Label noise, Ordinal regression, Akaike generalization error estimation, Noise filtering, Robust noise estimation

中图分类号: 

  • TP181
[1]HAN B,TSANG I W,CHEN L,et al.Beyond majority voting:a coarse-to-fine label filtration for heavily noisy labels[J].IEEE Transactions on Neural Networks and Learning Systems,2019,30(12):3774-3787.
[2]SLUBAN B,GAMBERGER D,LAVRAC N.Ensemble-basednoise detection:noise ranking and visual performance evaluation[J].Data Mining and Knowledge Discovery,2014,28(2):265-303.
[3]SHANAB A A,KHOSHGOFTAAR T M,WALD R.Robust-ness of threshold-based feature rankers with data sampling on noisy and imbalanced data[C]//International Conference on Florida Artificial Intelligence Research Society.Marco Island,Florida:AAAI Press,2012.
[4]DOYLE O M,WESTMAN E,MARQUAND A F,et al.Predicting progression of Alzheimer’s disease using ordinal regression[J].Plos One,2014,9(8):e105542.
[5]CHANG K Y,CHEN C S,HUNG Y P.Ordinal hyperplanes ranker with cost sensitivities for age estimation[C]//International Conference on Computer Vision Pattern Recognition.Piscataway,NJ:IEEE,2011:585-592.
[6]FERNANDEZ-NAVARRO F,CAMPOY-MU NOZ P,et al.Addressing the EU sovereign ratings using an ordinal regression approach[J].IEEE Transactions on Cybernetics,2013,43(6):2228-2240.
[7]GUTIERREZ P A,PEREZ-ORTIZ M,SANCHEZ-MONEDE-RO J,et al.Ordinal regression methods:survey and experimental study[J].IEEE Transactions on Knowledge and Data Enginee-ring,2015,28(1):127-146.
[8]MA W J,DONG H B.Face age classification method based on ensemble learning of convolutional Neural Networks[J].Computer Science,2018,45(1):152-156.
[9]AGRESTI A.Analysis of ordinal categorical data with misclassification[J].British Journal of Mathematical and Statistical Psychology,2011,27(391):317-318.
[10]XU H,WANG W J,QIAN Y H.Fusing complete monotonic decision trees[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(10):2223-2235.
[11]KORDOS M,BIALKA S,BLACHNIK M.Instance selection in logical rule extraction for regression problems[C]//Interna-tional Conference on Artificial Intelligence and Soft Computing.Berlin:Springer,2013:167-175.
[12]ARNAIZ-GONZALEZ A,DIEZ-PASTOR J F,RODRÍGUZE J J,et al.Instance selection for regression by discretization[J].Expert Systems with Applications,2016,54:340-350.
[13]BRODLEY C E,FRIEDL M A.Identifying mislabeled training data[J].Journal of Artificial Intelligence Research,1999,11:131-167.
[14]KHOSHGOFTAAR T M,REBOURS P.Improving softwarequality prediction by noise filtering techniques[J].Journal of Computer Science and Technology,2007,22(3):387-396.
[15]SLUBAN B,GAMBERGER D,LAVRA N.Advances in class noise detection[C]//European Conference on Artificial Intelligence.Netherlands:IOS Press,2010:1105-1106.
[16]SAEZ J A,GALAR M,LUENGO J,et al.INFFC:An iterative class noise filter based on the fusion of classifiers with noise sensitivity control[J].Information Fusion,2016,27:19-32.
[17]JIANG G X,WANG W J,QIAN Y H,et al.A unified sample selection framework for output noise filtering:an error-bound perspective[J].Journal of Machine Learning Research,2021,22(8):1-66.
[18]JIANG G X,WANG W J.A Numerical Label noise filtering algorithm for regression task[J].Journal of Computer Research and Development,2022,59(8):1639-1652.
[19]ZHANG Z H,JIANG G X,WANG W J.Label noise filtering method based on local probability sampling[J].Journal of Computer Applications,2019,41(1):67-73.
[20]ZHANG Z H,JIANG G X,WANG W J.Label noise filteringmethod based on dynamic probability sampling[J].Journal of Computer Applications,2021,41(12):3485-3491.
[21]XIA S Y,ZHENG S Y,WANG G Y,et al.Granular ball sampling for noisy label classification or imbalanced classification[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(4):2144-2155.
[22]LI Y,HAN H,SHAN S,et al.DISC:Learning from noisy labels via dynamic instance-specific selection and correction[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2023:24070-24079.
[23]WEI Q,SUN H,LU X,et al.Self-filtering:a noise-aware sample selection for label noise with confidence penalization[C]//European Conference on Computer Vision.Cham:Springer,2022:516-532.
[24]JIANG G X,QIN P,WANG W J.Noise Estimation and Filtering Methods with Limit Distance[J].Computer Science,2023,50(6):151-158.
[25]LI C,MAO Z Z.A label noise filtering method for regression based on adaptive threshold and noise score[J].Expert Systems with Applications,2023,228,120422.
[26]CHERKASSKY V,MA Y Q.Comparison of model selection for regression[J].Neural Computation,2003,15(7):1691-1714.
[27]CHU W,GHAHRAMANI Z,et al.Gaussian processes for ordinal regression[J].The Journal of Machine Research,2004,6(3):1019-1041.
[28]ASUNCION A,NEWMAN D.UCI machine learning repository[DB/OL].http://www.ics.uci.edu/~mlearn/MLRepository.html.
[29]PASCAL:Pascal(Pattern Analysis,Statistical Modelling andComputational Learning) machine learning benchmarks repository[DB/OL].http://mldata.org/.
[30]ERAN E,ROEE E,TAL H.Age and gender estimation of unfiltered faces[J].Transactions on Information Forensics and Security,2014,9(12):2170-2179.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!