Computer Science ›› 2018, Vol. 45 ›› Issue (6A): 487-492.

• Big Date & Date Mining • Previous Articles     Next Articles

Adaptive Stochastic Gradient Descent for Imbalanced Data Classification

TAO Bing-mo1,LU Shu-xia1,2   

  1. College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China1
    Hebei Province Key Laboratory of Machine Learning and Computational Intelligence,Baoding,Hebei 071002,China2
  • Online:2018-06-20 Published:2018-08-03

Abstract: For imbalanced data classification,the performance of using traditional stochastic gradient descent for solving SVM problems is not very well.Adaptive stochastic gradient descent algorithm defines a distribution pinstead of using uniform distribution to choose examples,and the smoothing hinge loss function is used in the optimization problem.Because of the training sets are imbalanced,using uniform distribution will cause the algorithm choose more majority class based on the imbalanced ratio.That would result the classifier bias towards the minority class.The distribution p largely overcomes this issue.When to stop the programs becomes an important problem,because the normal stochastic gradient descent algorithm does not have a stop criterion especially for large data sets.The stop criterion was setted according to the classification accuracy on the training sets or its subsets.This stop criterion could stop the programs very early especially for large data sets if the parameters are chosen properly.Some experiments on imbalanced data sets show that the proposed algorithm is effective.

Key words: Loss function, Nonuniform distribution, Stochastic gradient descent, Stop criterion, Support vector machine

CLC Number: 

  • TP181
[1]CORTES C,VAPNIK V.Support-vector networks [J].Machine Learning,1995,20(3):273-297.
[2]PLATT J C.Sequential Minimal Optimization:A Fast Algo- rithm for Training Support Vector Machines [J].Technical Report,1998,208(1):212-223.
[3]WRIGHT S J.Coordinate Descent Algorithms [J].Mathematical Programming,2015,151(1):3-34.
[4]NESTEROV Y,STICH S U.Efficiency of the Accelerated Coordinate Descent Method on Structured Optimization Problems [J].Core Discussion Papers,2016,27(1):110-123.
[5]SHALEV-SHWARTZ S,ZHANG T.Accelerated Proximal Sto- chastic Dual Coordinate Ascent for regularized Loss Minimization [J].Mathematical Programming,2016,155(1/2):105-145.
[6]SHALEV-SHWATRZ S,ZHANG T.Stochastic Dual Coordi- nate Ascent Methods for Regularized Loss Minimization [J].Journal of Machine learning Research,2012,14(1):2013.
[7]CSIBA Q,ZHENG Q,RICHTARIK Q.Stochastic Dual Coordinate Ascent with Adaptive Probabilities [C]∥International Conference on Machine Learning.2015:674-683.
[8]WANG X,ZHANG W,YAN J,et al.On the Flexibility of Block Coordinate Descent for Large-Scale Optimization [J].Neurocomputing,2018,272(10):471-480.
[9]JOHNSON R,ZHANG T.Accelerating Stochastic Gradient Using Predictive Variance Reduction [C]∥International Confe-rence on Neural Information Processing Systems.2013:315-232.
[10]SHALEV-SHWARTZ S,SINGER Y.Primal Estimated Sub- gradient Solver for SVM [J].Mathematical Programming,2011,127(1):3-30.
[11]LIN C F,WANG S D.Fuzzy support vector machines [J].IEEE Trans.Neural Network,2002,13(2):464-471.
[12]FAN Q,WANG Z,LI D,et al.Entropy-based fuzzy support vector machine for imbalanced data-sets [J].Knowledge-Based Systems,2017,115(1):87-89.
[13]ZHANG T,ZHOU Z H.Large margin distribution machine [C]∥ Acm Sigkdd International Conference on Knowledge Discovery &Data Mining.2014:313-322.
[14]CHENG F,ZHANG J,WEN C,et al.Large Cost-Sensitive Margin Distribution Machine for Imbalanced Data Classification [J].Neruocomputing,2016,224(8):45-57.
[15]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique [J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[16]GALAR M,BARRENECHEA E,HERRERA F.EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary under-sampling [J].Pattern Recognition,2013,46(12):3460-3471.
[17]LIU X Y,WU J,ZHOU Z H.Exploratory Under-Sampling for Class-Imbalanced Learning [J].IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A publication of the IEEE Systems Man & Cybernetics Society,2009,39(2):539-550.
[18]KUBAT M,MATWIN S.Addressing the Curse of Imbalanced Training Sets:One-Sided Selection [C]∥International Confe-rence on Machine Learning.2012:179-186.
[1] MENG Yue-bo, MU Si-rong, LIU Guang-hui, XU Sheng-jun, HAN Jiu-qiang. Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism [J]. Computer Science, 2022, 49(7): 142-147.
[2] SHAN Xiao-ying, REN Ying-chun. Fishing Type Identification of Marine Fishing Vessels Based on Support Vector Machine Optimized by Improved Sparrow Search Algorithm [J]. Computer Science, 2022, 49(6A): 211-216.
[3] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[4] HOU Xia-ye, CHEN Hai-yan, ZHANG Bing, YUAN Li-gang, JIA Yi-zhen. Active Metric Learning Based on Support Vector Machines [J]. Computer Science, 2022, 49(6A): 113-118.
[5] GAO Rong-hua, BAI Qiang, WANG Rong, WU Hua-rui, SUN Xiang. Multi-tree Network Multi-crop Early Disease Recognition Method Based on Improved Attention Mechanism [J]. Computer Science, 2022, 49(6A): 363-369.
[6] XING Yun-bing, LONG Guang-yu, HU Chun-yu, HU Li-sha. Human Activity Recognition Method Based on Class Increment SVM [J]. Computer Science, 2022, 49(5): 78-83.
[7] ZHANG Xiao-yu, WANG Bin, AN Wei-chao, YAN Ting, XIANG Jie. Glioma Segmentation Network Based on 3D U-Net++ with Fusion Loss Function [J]. Computer Science, 2021, 48(9): 187-193.
[8] HUANG Ying-qi, CHEN Hong-mei. Cost-sensitive Convolutional Neural Network Based Hybrid Method for Imbalanced Data Classification [J]. Computer Science, 2021, 48(9): 77-85.
[9] FENG Jiao, LU Chang-yu. Cross Media Retrieval Method Based on Residual Attention Network [J]. Computer Science, 2021, 48(6A): 122-126.
[10] GUO Fu-min, ZHANG Hua, HU Rong-hua, SONG Yan. Study on Method for Estimating Wrist Muscle Force Based on Surface EMG Signals [J]. Computer Science, 2021, 48(6A): 317-320.
[11] ZHUO Ya-qian, OU Bo. Face Anti-spoofing Algorithm for Noisy Environment [J]. Computer Science, 2021, 48(6A): 443-447.
[12] LEI Jian-mei, ZENG Ling-qiu, MU Jie, CHEN Li-dong, WANG Cong, CHAI Yong. Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning [J]. Computer Science, 2021, 48(6): 190-195.
[13] SHI Xian-rang, SONG Ting-lun, TANG De-zhi, DAI Zhen-yong. Novel Deep Learning Algorithm for Monocular Vision:H_SFPN [J]. Computer Science, 2021, 48(4): 130-137.
[14] QU Hao, CUI Chao-ran, WANG Xiao-xiao, SU Ya-xi, HAN Xiao-hui, YIN Yi-long. Hierarchical Learning on Unbalanced Data for Predicting Cause of Action [J]. Computer Science, 2021, 48(12): 337-342.
[15] WANG You-wei, ZHU Chen, ZHU Jian-ming, LI Yang, FENG Li-zhou, LIU Jiang-chun. User Interest Dictionary and LSTM Based Method for Personalized Emotion Classification [J]. Computer Science, 2021, 48(11A): 251-257.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!