Computer Science ›› 2023, Vol. 50 ›› Issue (6): 151-158.doi: 10.11896/jsjkx.220600130

• Database & Big Data & Data Science • Previous Articles     Next Articles

Noise Estimation and Filtering Methods with Limit Distance

JIANG Gaoxia1, QIN Pei1, WANG Wenjian1,2   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education (Shanxi University),Taiyuan 030006,China
  • Received:2022-06-14 Revised:2022-11-23 Online:2023-06-15 Published:2023-06-06
  • About author:JIANG Gaoxia,born in 1987,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include machine lear-ning and data mining.WANG Wenjian,born in 1968,Ph.D,professor,is a outstanding member of China Computer Federation.Her main research interests include machine learning and computing intelligence.
  • Supported by:
    National Natural Science Foundation of China(U21A20513,62276161,62076154,61906113,U1805263) and Key R & D Program of Shanxi Province International Cooperation(201903D421050).

Abstract: Machine learning has made remarkable progress and has been successfully applied to many fields in recent years.However,many learning models or algorithms are highly dependent on data quality.Complex label noise usually exists in a large number of datasets in practical applications,so machine learning faces severe challenges in low-quality data modeling and label noise processing.To solve the numerical label noise problem in regression,this paper studies the correlation between label estimation interval and the noise from the perspectives of theoretical analysis and simulation experiments,and proposes a limit distance noise estimation method.Under the optimal sample selection framework,a limit distance noise filtering(LDNF) algorithm is proposed based on this noise estimator.Experimental results show that the proposed noise estimation method has a higher correlation and a lower estimation bias with the true label noise.The proposed LDNF algorithm can effectively identify label noises and reduce the test error of the model in different noise environments on benchmark datasets and real-age estimation datasets,and it outperforms other latest filtering algorithms.

Key words: Numerical label noise, Regression, Noise estimation, Limit distance noise filtering

CLC Number: 

  • TP181
[1]ESTEVA,KUPREL B,NOVOA R A,et al.Dermatologist level classification of skin cancer with deep neural networks[J].Nature,2017,542(7639):115-118.
[2]MA W J,DONG H B.Face age classification method based on ensemble learning of convolutional neural networks[J].Compu-ter Science,2018,45(1):152-156.
[3]KERMANY D S,GOLDBAUM M,CAI W,et al.Identifyingmedical diagnoses and treatable diseases by image based deep learning[J].Cell,2018,172(5):1122-1131.
[4]NORTHCUTTC,JIANG L,CHUANG I.Confident learning:Estimating uncertainty in dataset labels[J].Journal of Artificial Intelligence Research,2021,70:1373-1411.
[5]KAHNEMAN D,SIBONY O,SUNSTEIN C R.Noise:A flaw in human judgment [M].New York:Little,Brown Spark,2021.
[6]GUAN D,YUAN W,LEE Y K,et al.Identifying mislabeled training data with the aid of unlabeled data[J].Applied Intelligence,2011,35(3):345-358.
[7]MALOSSINI A,BLANZIERI E,NG R T.Detecting potential labeling errors in microarrays by data perturbation[J].Bioinformatics,2006,22(17):2114-2121.
[8]ZHU X,WU X.Class noise vs attribute noise:a quantitative study[J].Artificial Intelligence Review,2004,22(3):177-210.
[9]LIU G F,ZHAO W Q.Attractors and Their Upper Semi-continuity of Stochastic Lorenz System Driven by Additive Noises[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2022,39(1):78-84.
[10]SAEZ J A,GALAR M,LUENGO J,et al.Analyzing the pre-sence of noise in multi-class problems:alleviating its influence with the One-vs-One decomposition[J].Knowledge and Information Systems,2014,38(1):179-206.
[11]FRENAY B,VERLEYSEN M.Classification in thepresence of label noise:a survey[J].IEEE Transactions on Neural Networks and Learning Systems,2014,25(5):845-869.
[12]PATRINI G,ROZZA A,MENON A K,et al.Making deep neural networks robust to label noise:a loss correction approach [C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2017:1944-1952.
[13]SABZEVARI M,MARTINEZ-MUNOZ G,SUAREZ A.Vote-boosting ensembles[J].Pattern Recognition,2018,83:119-133.
[14]SHU J,XIE Q,YI L X,et al.Meta- Weight-Net:learning an explicit mapping for sample weighting [C]//Advances in Neural Information Processing Systems.Cambridge,MA:MIT Press,2019:1917-1928.
[15]YAO J,WANG J,TSANG I W,et al.Deep learning from noisy image labels with quality embedding[J].IEEE Transactions on Image Processing,2018,28(4):1909-1922.
[16]HAN B,YAO Q,YU X,et al.Co-teaching:robust training of deep neural networks with extremely noisy labels [C]//Advances in Neural Information Processing Systems.Cambridge,MA:MIT Press,2018:8536-8546.
[17]CHEN Q Q,WANG W J,JIANG G X.Label noise filteringmethod based on data distribution[J].Journal of Tsinghua University(Science and Technology),2019,59(4):262-269.
[18]ZHANG Z H,JIANG G X,WANG W J.Label noise filtering method based on local probability sampling[J].Computer Application,2021,41(1):67-73.
[19]YU M C,MU J P,CAI J,et al.Noisy label classification learning based on relabeling method[J].Computer Science,2020,47(6):79-84.
[20]SEGATA N,BLANZIERI E,DELANY S J,et al.Noise reduction for instance based learning with alocal maximalmargin approach[J].Journal of Intelligent Information Systems,2010,35(2):301-331.
[21]HART P.The condensed nearest neighbor rule[J].IEEETransactions on Information Theory,1968,14(3):515-516.
[22]WILSON D L.Asymptotic properties of nearest neighbor rules using edited data[J].IEEE Transactions on Systems Man and Cybernetics,2007,2(3):408-421.
[23]CAO J,KWONG S,WANG R.A noise detection based adaboost algorithm for mislabeled data[J].Pattern Recognition,2012,45(12):4451-4465.
[24]KORDOS M,BIALKA S,BLACHNIK M.Instance selection in logical rule extraction for regression problems [C]//International Conference on Artificial Intelligence and Soft Computing,Berlin:Springer,2013:167-175.
[25]ARNAIZ-GONZALEZ A,DIEZ-PASTOR J F,RODRIGUEZ J J,et al.Instance selection for regression by discretization[J].Expert Systems with Applications,2016,54:340-350.
[26]GUILLEN A,HERRERA L J,RUBIO G,et al.New method for instance or prototype selection using mutual information in time series prediction[J].Neurocomputing,2010,73(10/11/12):2030-2038.
[27]BOZIC M,STOJANOVIC M,STAJICT Z,et al.Mutual information-based inputs selection for electric load time series forecasting[J].Entropy,2013,15(3):926-942.
[28]STOJANOVIC M M,BOZIC M M,STANKOVIC M M,et al.A methodology for training set instance selection using mutual information in time series prediction[J].Neurocomputing,2014,141:236-245.
[29]JIANG G X,WANG W J,QIAN Y H,et al.A unified sample selection framework for output noise filtering:an error bound perspective[J].Journal of Machine Learning Research,2021,22(18):1-66.
[30]JIANG G X,WANG W J.A numerical label noise filtering algorithm for regression[J].Journal of Computer Research and Development,2022,59(8):1639-1652.
[31]DUA D,GRAFF C.UCI machine learning repository [DB/OL].[2020-03-28].http://archive.ics.uci.edu/ml.
[32]HUO Z W,YANG X,XING C,et al.Deep age distributionlearning for apparent age estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops.Pisca-taway,NJ:IEEE,2016:722-729.
[33]ROTHE R,TIMOFTE R,VAN GOOL L.Deep expectation of real and apparent age from a single image without facial landmarks[J].International Journal of Computer Vision,2018,126(2):144-157.
[1] LEI Xuemei, LIU Li, WANG Qian. MEC Offloading Model Based on Linear Programming Relaxation [J]. Computer Science, 2023, 50(6A): 211200229-5.
[2] LI Junlin, OUYANG Zhi, DU Nisuo. Scene Text Detection with Improved Region Proposal Network [J]. Computer Science, 2023, 50(2): 201-208.
[3] LYU You, WU Wen-yuan. Privacy-preserving Linear Regression Scheme and Its Application [J]. Computer Science, 2022, 49(9): 318-325.
[4] WANG Wen-qiang, JIA Xing-xing, LI Peng. Adaptive Ensemble Ordering Algorithm [J]. Computer Science, 2022, 49(6A): 242-246.
[5] CHEN Yong-ping, ZHU Jian-qing, XIE Yi, WU Han-xiao, ZENG Huan-qiang. Real-time Helmet Detection Algorithm Based on Circumcircle Radius Difference Loss [J]. Computer Science, 2022, 49(6A): 424-428.
[6] LI Jing-tai, WANG Xiao-dan. XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function [J]. Computer Science, 2022, 49(5): 135-143.
[7] ZHAO Yue, YU Zhi-bin, LI Yong-chun. Cross-attention Guided Siamese Network Object Tracking Algorithm [J]. Computer Science, 2022, 49(3): 163-169.
[8] LI Zong-ran, CHEN XIU-Hong, LU Yun, SHAO Zheng-yi. Robust Joint Sparse Uncorrelated Regression [J]. Computer Science, 2022, 49(2): 191-197.
[9] GUO Yan-qing, LI Yu-hang, WANG Wan-wan, FU Hai-yan, WU Ming-kan, LI Yi. FL-GRM:Gamma Regression Algorithm Based on Federated Learning [J]. Computer Science, 2022, 49(12): 66-73.
[10] SUN Kai-wei, GUO Hao, ZENG Ya-yuan, FANG Yang, LIU Qi-lie. Multi-target Regression Method Based on Hypernetwork [J]. Computer Science, 2022, 49(11A): 211000205-9.
[11] WANG Jia-chang, ZHENG Dai-wei, TANG Lei, ZHENG Dan-chen, LIU Meng-juan. Empirical Research on Remaining Useful Life Prediction Based on Machine Learning [J]. Computer Science, 2022, 49(11A): 211100285-9.
[12] WANG Dong-xia, LEI Yong-mei, ZHANG Ze-yu. Communication Efficient Asynchronous ADMM for General Form Consensus Optimization [J]. Computer Science, 2022, 49(11): 309-315.
[13] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[14] CHEN Le, GAO Ling, REN Jie, DANG Xin, WANG Yi-hao, CAO Rui, ZHENG Jie, WANG Hai. Adaptive Bitrate Streaming for Energy-Efficiency Mobile Augmented Reality [J]. Computer Science, 2022, 49(1): 194-203.
[15] CHEN Chang-wei, ZHOU Xiao-feng. Fast Local Collaborative Representation Based Classifier and Its Applications in Face Recognition [J]. Computer Science, 2021, 48(9): 208-215.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!