计算机科学 ›› 2016, Vol. 43 ›› Issue (12): 97-100.doi: 10.11896/j.issn.1002-137X.2016.12.017

• 机器学习 • 上一篇    下一篇

基于支持向量上采样的不平衡数据分类方法

曹路   

  1. 五邑大学信息工程学院 江门529020中山大学数据科学与计算机学院 广州510006
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受广东省特色创新类项目(2015KTSCX143),广东省青年创新人才项目(2015KQNCX172),江门市科技计划项目(江科[2016]189号,江科[2015]138号),五邑大学青年基金(2013zk07,2015zk11)资助

Imbalanced Data Classification Method Based on Support Vector Over-sampling

CAO Lu   

  • Online:2018-12-01 Published:2018-12-01

摘要: 传统的支持向量机在处理不平衡数据时效果不佳。为了提高少类样本的识别精度,提出了一种基于支持向量的上采样方法。首先根据K近邻的思想清除原始数据集中的噪声;然后用支持向量机对训练集进行学习以获得支持向量,进一步对少类样本的每一个支持向量添加服从一定规律的噪声,增加少数类样本的数目以获得相对平衡的数据集;最后将获得的新数据集用支持向量机学习。实验结果显示,该方法在人工数据集和UCI标准数据集上均是有效的。

关键词: 支持向量,采样,不平衡数据,分类

Abstract: Traditional support vector machine has drawbacks in dealing with imbalanced data.In order to improve the recognition accuracy of the minority class,an over-sampling method based on support vector was proposed.Firstly,K nearest neighbor technology is used to remove the noise from the original data set.Support vector machine learning is then used to obtain the support vector.Noise obeying a certain rule is added to each support vectors of the minority class to increase the number of minority class samples in order to obtain the relative balanced data set.Finally,the support vector machine is learned on the new data set.The experimental results show that the proposed method is effective on both artificial data sets and UCI standard data sets.

Key words: Support vector,Sampling,Imbalanced data,Classification

[1] Vapnik V N .统计学习理论[M].许建华,张学工,译.北京:电子工业出版社,2004
[2] Castro C L,Braga A P.Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data[J].IEEE Transactions on Neural Networks & Learning Systems,2013,24(6):888-899
[3] Li Y,Liu Z D,Zhang H J.Review on Ensemble Algorithms for Imbalanced Data Classification[J].Application Research of Computers,2014,31(5):1288-1291(in Chinese) 李勇,刘战东,张海军.不平衡数据的集分类方法综述[J].计算机应用研究,2014,31(5):1288-1291
[4] Galar M,FernaNdez A,Barrenechea E,et al.A Review on Ensembles for the Class Imbalance Problem:Bagging-,Boosting-,and Hybrid-Based Approaches[J].IEEE Transactions on Systems Man & Cybernetics Part C,2012,42(4):463-484
[5] Bishop C.Training with Noise is Equivalent to Tikhonov Regularization[J].Neural Computation,1995,7(1):108-116
[6] Yang J,Yu X,Xie Z Q,et al.A novel virtual sample generation method based on Gaussian distribution[J].Knowledge-Based Systems,2011,24(6):740-748
[7] He H,Garcia E A.Learning from Imbalanced Data[J].IEEETransactions on Knowledge & Data Engineering,2009,21(9):1263-1284
[8] Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2011,16(1):321-357
[9] Han H,Wang W Y,Mao B H.Borderline-SMOTE:A NewOver-Sampling Method in Imbalanced Data Sets Learning[M]∥Advances in Intelligent Computing.Springer Berlin Heidelberg,2005:878-887
[10] Gao M,Hong X,Chen S,et al.PDFOS:PDF estimation basedover-sampling for imbalanced two-class problems[J].Neurocomputing,2012,138(11):1-8
[11] Das B,Krishnan N C,Cook D J.RACOG and wRACOG:Two Probabilistic Oversampling Techniques[J].IEEE Transactions on Knowledge & Data Engineering,2015,27(1):222-234
[12] Abdi L,Hashemi S.To combat multi-class imbalanced problems by means of over-sampling and boosting techniques[J].Soft Computing,2014,19(12):3369-3385
[13] Kubat M,Matwin S.Addressing the Curse of Imbalanced Trai-ning Sets:One-Sided Selection[C]∥Proceedings of the Fourteenth International Conference on Machine Learning.2000:179-186
[14] Yen S J,Lee Y S.Cluster-based under-sampling approaches for imbalanced data distributions[J].Expert Systems with Applications,2009,36(3):5718-5727
[15] Lin M,Tang K,Yao X.Dynamic sampling approach to training neural networks for multiclass imbalance classification[J].IEEE Transactions on Neural Networks & Learning Systems,2013,24(4):647-660
[16] Fan Q,Wang Z,Gao D.One-sided Dynamic Undersampling No-Propagation Neural Networks for imbalance problem[J].Engineering Applications of Artificial Intelligence,2016,53(c):62-73
[17] Ng W W,Hu J,Yeung D S,et al.Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems[J].IEEE Transactions on Cybernetics,2014,45(11):2402-2412
[18] Zhang X S,Luo Q.Unbalanced Data Classification AlgorithmBased on Clustering Ensemble Under-sampling [J].Computer Science,2015,42(11):63-66(in Chinese) 张枭山,罗强.一种基于聚类融合欠抽样的不平衡数据分类方法[J].计算机科学,2015,42(11):63-66
[19] Cao L,Wang P.Imbalanced Data Classification Based on SMOTESampling and the Support Vector Machine [J].Journal of wuyi university(Natural Science Edition), 2015,29(4):27-31(in Chinese) 曹路,王鹏.基于SMOTE采样和支持向量机的不平衡数据分类[J].五邑大学学报(自然科学版),2015,29(4):27-31

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!