计算机科学 ›› 2013, Vol. 40 ›› Issue (12): 70-74.

• 综述 • 上一篇    下一篇

抽样技术和CBES分类非平衡数据集

职为梅,郭华平,范明   

  1. 郑州大学信息工程学院 郑州450052;郑州大学信息工程学院 郑州450052;郑州大学信息工程学院 郑州450052
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金(60773048)资助

Sampling Techniques with CBES for Imbalanced Learning

ZHI Wei-mei,GUO Hua-ping and FAN Ming   

  • Online:2018-11-16 Published:2018-11-16

摘要: CBES是面向非平衡数据集分类的组合选择方法。相关的实验表明,CBES方法能大幅度提升基分类器的泛化能力。已有研究表明,抽样方法能有效提高分类器在非平衡数据集分类上的性能。因此,巧妙地将抽样技术应用到CBES方法中,进而提出基于抽样的CBES方法(SCBES),以期进一步提高CBES在稀有类上的性能。大量的实验表明,巧妙地使用抽样方法能进一步提高CBES方法在非平衡数据集分类上的性能。

关键词: 非平衡数据集,组合分类器,组合选择,抽样技术

Abstract: CBES is a method which can be used for classification of imbalanced datasets.Related experimental results show CBES can boost the generalization ability of the base classifier.Reported researches show sampling method can effectively improve the performance of rare data.In the paper,we skillfully used sampling methods into CBES,and then proposed a method,named sampling-based CBES (SCBES) to further improve the classification performance of rare data.The experimental results demonstrate SCBES can effectively improve the performance of classification for imbalanced datasets.

Key words: Imbalanced data sets,Ensemble,Ensemble selection,Sampling method

[1] He Hai-bo,Garcia,Edwardo A.Learning from imbalanced Data[J].IEEE Transactions on Knowledge and Data Engineering,2009,21(9):1263-1284
[2] Fawcett T,Provost F.Combining Data Mining and MachineLearning for Effective User Profile[C]∥Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining.Portland,Oregon,USA,1996:8-13
[3] Ezawa K J,Singh M,Norton S W.Learning Goal OrientedBayesian Networks for Telecommunications Risk Management[C]∥Proceedings of the International Conference on Machine Learning.Bari,Italy,1996:139-147
[4] Zheng Zhaohui,Wu Xiaoyun,Srihari Rohini.Feature Selection for Text Categorization on Imbalanced Data[J].SIGKDD Explorations,2004,6(1):80-89
[5] 黄浩,何钦铭,陈奇,等.基于加权边界度的稀有类检测算法[J].软件学报,2012,23(5):1195-1208
[6] 职为梅,郭华平,张银峰,等.一种面向非平衡数据集分类问题的组合选择方法[J].小型微型计算机系统,2014,35
[7] 高嘉伟,梁吉业.非平衡数据集分类研究问题进展[J].计算机科学,2008,35(4):10-13
[8] Breiman L.Bagging predictors[J].Machine Learning,1996,24(2):123-140
[9] Freund Y,Schapire R F.A decision-theoretic generalization ofon-line learning and an application to boosting[J].Journal of Computer and System Sciences,1997,55(1):119-139
[10] Breiman L.Random forests[J].Machine learning,2001,45(1):5-32
[11] Rodriguez J J,Kuncheva L I,Alonso C J.Rotation forest:A newclassifier ensemble method[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(10):1619-1630
[12] Sun Yan-min,Mobamed S K,Wong A K C.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378
[13] Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-Sampling Technique[J].Journal of Artificial Intelligence Research,2002,16:321-357
[14] Han Hui,Wang Wen-yuan,Mao Bing-huan.Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[C]∥Proceedings of International Conference on Intelligent Computing.Hefei,China,2005:878-887
[15] Zhi Wei-mei,Guo Hua-ping,Fan Ming.Energy-Based Metric for Ensemble Selection[C]∥Proceedings of 14th Asia-Pacific Web Conference.Kunming,China,2012:306-317
[16] 曾志强,吴群,廖备水,等.一种基于核SMOTE的非平衡数据集分类方法[J].电子学报,2009,37(11):2489-2495
[17] UCI repository of machine learning databases[EB/OL].http://www.ics.uci.edu/~mlearn/ MLRepository.html.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!