Computer Science ›› 2024, Vol. 51 ›› Issue (6): 144-152.doi: 10.11896/jsjkx.230700115

• Database & Big Data & Data Science • Previous Articles     Next Articles

Robust Estimation and Filtering Methods for Ordinal Label Noise

JIANG Gaoxia1, WANG Fei1, XU Hang1, WANG Wenjian1,2   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University),Taiyuan 030006,China
  • Received:2023-07-17 Revised:2023-12-01 Online:2024-06-15 Published:2024-06-05
  • About author:JIANG Gaoxia,born in 1987,Ph.D,associate professor,is a member of CCF(No.49561M).His main research interests include machine learning and data mining.
    WANG Wenjian,born in 1968,Ph.D,professor,is an outstanding member of CCF(No.16143D).Her main research interests include machine learning and computing intelligence.
  • Supported by:
    National Natural Science Foundation of China(62276161,U21A20513,62076154,62206161),Key R & D Program of Shanxi Province(202202020101003,202302010101007) and Fundamental Research Program of Shanxi Province(202303021221055).

Abstract: Large-scale labeled datasets inevitably contain label noise,which limits the generalization performance of the model to some extent.The labels of ordinal regression datasets are discrete values,but there exist ordinal relationships between different labels.Although the labels of ordinal regression have the characteristics of both classification and regression labels,the label noise filtering algorithms for classification and regression tasks are not fully applicable to ordinal label noise.To solve this problem,the Akaike generalization error estimation of regression model with label noise is proposed.On this basis,a label noise filtering framework for ordinal regression task is designed.Besides,a robust ordinal label noise estimation method is proposed.It adopts a me-dian-based fusion strategy to reduce the interference of abnormal estimated components.Finally,this estimation method is combined with the proposed framework to form a noise robust fusion filtering(RFF) algorithm.The effectiveness of the RFF is verified on benchmark datasets and a real age estimation dataset.Experimental results show that the performance of RFF algorithm is better than that of other classification and regression filtering algorithms in ordinal regression tasks.It is adaptive to different kinds of noises and could effectively improve the data quality and model generalization performance.

Key words: Label noise, Ordinal regression, Akaike generalization error estimation, Noise filtering, Robust noise estimation

CLC Number: 

  • TP181
[1]HAN B,TSANG I W,CHEN L,et al.Beyond majority voting:a coarse-to-fine label filtration for heavily noisy labels[J].IEEE Transactions on Neural Networks and Learning Systems,2019,30(12):3774-3787.
[2]SLUBAN B,GAMBERGER D,LAVRAC N.Ensemble-basednoise detection:noise ranking and visual performance evaluation[J].Data Mining and Knowledge Discovery,2014,28(2):265-303.
[3]SHANAB A A,KHOSHGOFTAAR T M,WALD R.Robust-ness of threshold-based feature rankers with data sampling on noisy and imbalanced data[C]//International Conference on Florida Artificial Intelligence Research Society.Marco Island,Florida:AAAI Press,2012.
[4]DOYLE O M,WESTMAN E,MARQUAND A F,et al.Predicting progression of Alzheimer’s disease using ordinal regression[J].Plos One,2014,9(8):e105542.
[5]CHANG K Y,CHEN C S,HUNG Y P.Ordinal hyperplanes ranker with cost sensitivities for age estimation[C]//International Conference on Computer Vision Pattern Recognition.Piscataway,NJ:IEEE,2011:585-592.
[6]FERNANDEZ-NAVARRO F,CAMPOY-MU NOZ P,et al.Addressing the EU sovereign ratings using an ordinal regression approach[J].IEEE Transactions on Cybernetics,2013,43(6):2228-2240.
[7]GUTIERREZ P A,PEREZ-ORTIZ M,SANCHEZ-MONEDE-RO J,et al.Ordinal regression methods:survey and experimental study[J].IEEE Transactions on Knowledge and Data Enginee-ring,2015,28(1):127-146.
[8]MA W J,DONG H B.Face age classification method based on ensemble learning of convolutional Neural Networks[J].Computer Science,2018,45(1):152-156.
[9]AGRESTI A.Analysis of ordinal categorical data with misclassification[J].British Journal of Mathematical and Statistical Psychology,2011,27(391):317-318.
[10]XU H,WANG W J,QIAN Y H.Fusing complete monotonic decision trees[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(10):2223-2235.
[11]KORDOS M,BIALKA S,BLACHNIK M.Instance selection in logical rule extraction for regression problems[C]//Interna-tional Conference on Artificial Intelligence and Soft Computing.Berlin:Springer,2013:167-175.
[12]ARNAIZ-GONZALEZ A,DIEZ-PASTOR J F,RODRÍGUZE J J,et al.Instance selection for regression by discretization[J].Expert Systems with Applications,2016,54:340-350.
[13]BRODLEY C E,FRIEDL M A.Identifying mislabeled training data[J].Journal of Artificial Intelligence Research,1999,11:131-167.
[14]KHOSHGOFTAAR T M,REBOURS P.Improving softwarequality prediction by noise filtering techniques[J].Journal of Computer Science and Technology,2007,22(3):387-396.
[15]SLUBAN B,GAMBERGER D,LAVRA N.Advances in class noise detection[C]//European Conference on Artificial Intelligence.Netherlands:IOS Press,2010:1105-1106.
[16]SAEZ J A,GALAR M,LUENGO J,et al.INFFC:An iterative class noise filter based on the fusion of classifiers with noise sensitivity control[J].Information Fusion,2016,27:19-32.
[17]JIANG G X,WANG W J,QIAN Y H,et al.A unified sample selection framework for output noise filtering:an error-bound perspective[J].Journal of Machine Learning Research,2021,22(8):1-66.
[18]JIANG G X,WANG W J.A Numerical Label noise filtering algorithm for regression task[J].Journal of Computer Research and Development,2022,59(8):1639-1652.
[19]ZHANG Z H,JIANG G X,WANG W J.Label noise filtering method based on local probability sampling[J].Journal of Computer Applications,2019,41(1):67-73.
[20]ZHANG Z H,JIANG G X,WANG W J.Label noise filteringmethod based on dynamic probability sampling[J].Journal of Computer Applications,2021,41(12):3485-3491.
[21]XIA S Y,ZHENG S Y,WANG G Y,et al.Granular ball sampling for noisy label classification or imbalanced classification[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(4):2144-2155.
[22]LI Y,HAN H,SHAN S,et al.DISC:Learning from noisy labels via dynamic instance-specific selection and correction[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2023:24070-24079.
[23]WEI Q,SUN H,LU X,et al.Self-filtering:a noise-aware sample selection for label noise with confidence penalization[C]//European Conference on Computer Vision.Cham:Springer,2022:516-532.
[24]JIANG G X,QIN P,WANG W J.Noise Estimation and Filtering Methods with Limit Distance[J].Computer Science,2023,50(6):151-158.
[25]LI C,MAO Z Z.A label noise filtering method for regression based on adaptive threshold and noise score[J].Expert Systems with Applications,2023,228,120422.
[26]CHERKASSKY V,MA Y Q.Comparison of model selection for regression[J].Neural Computation,2003,15(7):1691-1714.
[27]CHU W,GHAHRAMANI Z,et al.Gaussian processes for ordinal regression[J].The Journal of Machine Research,2004,6(3):1019-1041.
[28]ASUNCION A,NEWMAN D.UCI machine learning repository[DB/OL].http://www.ics.uci.edu/~mlearn/MLRepository.html.
[29]PASCAL:Pascal(Pattern Analysis,Statistical Modelling andComputational Learning) machine learning benchmarks repository[DB/OL].http://mldata.org/.
[30]ERAN E,ROEE E,TAL H.Age and gender estimation of unfiltered faces[J].Transactions on Information Forensics and Security,2014,9(12):2170-2179.
[1] XU Maolong, JIANG Gaoxia, WANG Wenjian. Label Noise Filtering Framework Based on Outlier Detection [J]. Computer Science, 2024, 51(2): 87-99.
[2] JIANG Gaoxia, QIN Pei, WANG Wenjian. Noise Estimation and Filtering Methods with Limit Distance [J]. Computer Science, 2023, 50(6): 151-158.
[3] MA Jiye, ZHU Guosheng, WEI Cao, ZENG Yuxuan. Noise Tolerant Algorithm for Network Traffic Classification Method [J]. Computer Science, 2023, 50(11A): 220800120-7.
[4] TENG Jun-yuan, GAO Meng, ZHENG Xiao-meng, JIANG Yun-song. Noise Tolerable Feature Selection Method for Software Defect Prediction [J]. Computer Science, 2021, 48(12): 131-139.
[5] ZENG Qing-tian, LIU Chen-zheng, NI Wei-jian, DUAN Hua. Combined Feature Extraction Method for Ordinal Regression [J]. Computer Science, 2019, 46(6): 69-74.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!