计算机科学 ›› 2011, Vol. 38 ›› Issue (12): 224-228.

• 人工智能 • 上一篇    下一篇

基于Boosting的不平衡数据分类算法研究

李秋洁,茅耀斌,王执锉   

  1. (南京理工大学自动化学院 南京210094)
  • 出版日期:2018-12-01 发布日期:2018-12-01

Research on Boosting-based Imbalanced Data Classification

  • Online:2018-12-01 Published:2018-12-01

摘要: 研究基于boosting的不平衡数据分类算法,归纳分析现有算法,在此基础上提出权重采样boosting算法。对样本进行权重采样,改变原有数据分布,从而得到适用于不平衡数据的分类器。算法本质是利用采样函数调整原始boosting损失函数形式,进一步强调正样本的分类损失,使得分类器侧重对正样本的有效判别,提高正样本的整体识别率。算法实现简单,实用性强,在UCI数据集上的实验结果表明,对于不平衡数据分类问题,权重采样boosting优于原始boosting及前人算法。

关键词: 不平衡数据分类,Boosting,采样

Abstract: This paper aimed to investigate boosting-based unbalanced data classification algorithms. hhrough the deep analysis of existing algorithms, a weight sampling boosting algorithm was proposed. Changing the data distribution by weight sampling,the trained classifier was made suitable for unbalanced data classification. The natural of the proposed algorithm is that the loss function of naW c boosting is adjusted by the sampling function and the positive examples are emphasized so that the classifier focuses on correctly classifying these examples and finally the recognition rate of positive examples is improved. The new algorithm is simple and practical and has been shown to outperform naive boosting and previous algorithms in the problem of unbalanced data classification on the UCI data sets.

Key words: Imbalanced data classification, Boosting, Sampling

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!