Computer Science ›› 2020, Vol. 47 ›› Issue (6): 79-84.doi: 10.11896/jsjkx.190600041

• Databωe & Big Data & Data Science • Previous Articles     Next Articles

Noisy Label Classification Learning Based on Relabeling Method

YU Meng-chi, MU Jia-peng, CAI Jian, XU Jian   

  1. School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2019-06-11 Online:2020-06-15 Published:2020-06-10
  • About author:YU Meng-chi,born in 1995,master.His major research interests include ticket mining and its applications.
    XU Jian,Ph.D,professor.His main research interests include event mining,log mining and their applications to system management.
  • Supported by:
    The work was supported by the National Natural Science Foundation of China (61872186,61802205,91846104)

Abstract: The integrity of sample labels has a significant impact on the accuracy of supervised learning algorithms.However,in real data,due to the unprofessional and random nature of the labeling process,the label of the dataset is inevitably polluted by noise,i.e.the assigned label of sample is different from its real label.In order to reduce the negative impact of noise labels on the classification accuracy of classifiers,this paper proposes a noise label correction approach.It firstly identifies the noise label data by applying the base classifier to classify the samples and estimating the noise rate to identify noisy label data,and then uses the base classifier to relabel the noisy samples.As a result,the noisy samples are relabeled to obtain a sample dataset in which the noisy samples are corrected.Experiments on synthetic datasets and real datasets show that the relabel algorithm has a certain improvement effect on classification results under different base classifiers and different types of noise rate interference.Compared with the base classifier,the accuracy of relabel algorithm is improved by about 5% in the synthetic dataset,while in the high noise environment of CIFAR and MNIST datasets,the F1 score of the proposed algorithm is 7% higher than that of Elk08 and Nat13 on average,and is improved by 53% compared with base classifier.

Key words: Logistic Regression, Naive Bayes, Noisy label learning, Relabeling label

CLC Number: 

  • TP301
[1]MIRYLENKA K,GIANNAKOPOULOS G,DO L M,et al.On classif-ier behavior in the presence of mislabeling noise [J].Data Mining and Knowledge Discovery,2017,31(3):661-701.
[2]KHETA A,LIPTON Z C,ANANDKUMAR A.Learning From Noisy Singly-labeled Data [OL].https://arxiv.org/abs/1712.04577.
[3]FRENAY B,VERLEYSEN M.Classification in the Presence of Label Noise:A Survey [J].IEEE Transactions on Neural Networks and Learning Systems,2014,25(5):845-869.
[4]NICHOLSON B,SHENG V S,ZHANG J,et al.Label Noise Correction Methods [C]//IEEE International Conference on Data Science and Advanced Analytics.Shanghai:IEEE,2015:1-9.
[5]QI Z A.Learning from Limited and Imperfect Tagging[D]. Hangzhou:Zhejiang University,2013.
[6]LIU M J,WANG X F.Data Preprocessing in Data Mining[J].Computer Science,2000,27(4):54-57.
[7]LI J,WONG Y,ZHAO Q,et al.Learning to Learn from Noisy Labeled Data[OL].https://arxiv.org/abs/1812.05214.
[8]MANWANI N,SASTRY P S.Noise tolerance under risk minimization[J].IEEE Transactions on Cybernetics,2013,43(3):1146-1151.
[9]LI Y,YANG J,SONG Y,et al.Learning from Noisy Labels with Distillation[J].IEEE International Conference on Computer Vision,2017,10(1):1928-1936.
[10]NETTLETON D F,PUIG A O,FORNELLS A.A study of the effect of different types of noise on the precision of supervised learning techniques[J].Artificial Intelligence Review,2010,33(4):275-306.
[11]WANG Y,LIU W,MA X,et al.Iterative Learning with Openset Noisy Labels[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:8688-8696.
[12]THULASIDASAN S,BHATTACHARYA T,BILMES J,et al.Combating Label Noise in Deep Learning Using Abstention[OL].https://arxiv.org/abs/1905.10964.
[13]XIAO T,XIA T,YANG Y,et al.Learning from massive noisy labeled data for image classification[C]//IEEE Conference on Computer Vision and Pattern Recognition.Boston:IEEE,2015:2691-2699.
[14]MNIH V,HINTON G.Learning to Label Aerial Images from Noisy Data[C]//International Conference on Machine Lear-ning.Edinburgh,Scotland:Omnipress,2012:203-210.
[15]SCOTT C.A Rate of Convergence for Mixture Proportion Estimation,with,Application to Learning from Noisy Labels[C]//Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics.2015:838-846.
[16]LIU T,TAO D.Classification with Noisy Labels by Importance Reweighting[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2014,38(3):447-461.
[17]NATARAJAN N,DHILLON I S,RAVIKUMAR P K,et al.Learning with Noisy Labels[C]//International Conference on Neural Information Processing Systems.Lake Tahoe.USA:Curran Associates Inc,2013:1196-1204.
[18]NORTHCUTT C G,WU T,CHUANG I L.Learning with Confident Examples:Rank Pruning for Robust Classification with Noisy Labels[OL].https://arxiv.org/abs/1705.01936.
[19]ELKAN C,NOTO K.Learning classifiers from only positiveand unlabeled data[C]//International Conference on Knowledge Discovery and Data Mining.Las Vegas:ACM,2008:213-220.
[1] LI Jing-tai, WANG Xiao-dan. XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function [J]. Computer Science, 2022, 49(5): 135-143.
[2] HU Yan-mei, YANG Bo, DUO Bin. Logistic Regression with Regularization Based on Network Structure [J]. Computer Science, 2021, 48(7): 281-291.
[3] LEI Jian-mei, ZENG Ling-qiu, MU Jie, CHEN Li-dong, WANG Cong, CHAI Yong. Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning [J]. Computer Science, 2021, 48(6): 190-195.
[4] BIAN Yu-ning, LU Li-kun, LI Ye-li, ZENG Qing-tao, SUN Yan-xiong. Implementation of Financial Venture Capital Score Card Model Based on Logistic Regression [J]. Computer Science, 2020, 47(11A): 116-118.
[5] LIU Meng-juan,ZENG Gui-chuan,YUE Wei,QIU Li-zhou,WANG Jia-chang. Review on Click-through Rate Prediction Models for Display Advertising [J]. Computer Science, 2019, 46(7): 38-49.
[6] ZHONG Xi, SUN Xiang-e. Research on Naive Bayes Ensemble Method Based on Kmeans++ Clustering [J]. Computer Science, 2019, 46(6A): 439-441.
[7] ZHANG Shi-xiang, LI Wang-geng, LI Tong, ZHU Nan-nan. Improved CoreSets Construction Algorithm for Bayesian Logistic Regression [J]. Computer Science, 2019, 46(11A): 98-102.
[8] XU Zhao-zhao, LI Ching-hwa, CHEN Tong-lin, LEE Shin-jye. Naive Bayesian Decision TreeAlgorithm Combining SMOTE and Filter-Wrapper and It’s Application [J]. Computer Science, 2018, 45(9): 65-69.
[9] ZHANG Zheng-qing, ZHU Yi-jian, BAI Rui-rui, HUANG Yi-qing and YAN Jian-feng. Application of Service Bundling in Churn Predict System [J]. Computer Science, 2016, 43(Z11): 585-590.
[10] ZHANG Xiao-feng and ZHANG De-ping. Software Failure Prediction Model Based on Quasi-likelihood Method [J]. Computer Science, 2016, 43(Z11): 486-489.
[11] YANG Xu-hua and ZHONG Nan-yi. Forecasting of Hospital Outpatient Based on Deep Belief Network [J]. Computer Science, 2016, 43(Z11): 26-30.
[12] ZHAI Jun-chang,QIN Yu-ping and CHE Wei-wei. Improvement of Information Gain in Spam Filtering [J]. Computer Science, 2014, 41(6): 214-216.
[13] WANG Hui,CHEN Hong-yu and LIU Shu-fen. Intrusion Detection System Based on Improved Naive Bayesian Algorithm [J]. Computer Science, 2014, 41(4): 111-115.
[14] LUO Qiang,WANG Guo-yin and CHU Wei-dong. Lane Detection in Micro-traffic under Complex Illumination [J]. Computer Science, 2014, 41(3): 46-49.
[15] JIANG Jie,WANG Zhuo-fang,GONG Rong-sheng and CHEN Tie-ming. Imbalanced Data Classification Method and its Application Research for Intrusion Detection [J]. Computer Science, 2013, 40(4): 131-135.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!