计算机科学 ›› 2016, Vol. 43 ›› Issue (3): 62-67.doi: 10.11896/j.issn.1002-137X.2016.03.012

• 第十五届中国机器学习会议 • 上一篇    下一篇

基于主曲线的不均衡在线贯序极限学习机研究

王金婉,毛文涛,王礼云,何玲   

  1. 河南师范大学计算机与信息工程学院 新乡453007,河南师范大学计算机与信息工程学院 新乡453007;河南省高校“计算智能与数据挖掘”工程技术研究中心 新乡453007,河南师范大学计算机与信息工程学院 新乡453007,河南师范大学计算机与信息工程学院 新乡453007
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(U1204609),河南省基础与前沿技术研究计划项目(132300410430)资助

Imbalanced Online Sequential Extreme Learning Machine Based on Principal Curve

WANG Jin-wan, MAO Wen-tao, WANG Li-yun and HE Ling   

  • Online:2018-12-01 Published:2018-12-01

摘要: 针对现有机器学习算法难以有效提高不均衡在线贯序数据中少类样本分类精度的问题,提出了一种基于主曲线的不均衡在线贯序极限学习机。该方法的核心思路是根据在线贯序数据的分布特性,均衡各类别样本,以减少少类样本合成过程中的盲目性,主要包括离线和在线两个阶段。离线阶段采用主曲线分别建立各类别样本的分布模型,利用少类样本合成过采样算法对少类样本过采样,并根据各样本点到对应主曲线的投影距离分别为其设定相应大小的隶属度,最后根据隶属区间削减多类和少类虚拟样本,进而建立初始模型。在线阶段对贯序到达的少类样本过采样,并根据隶属区间均衡贯序样本,进而动态更新网络权值。通过理论分析证明了所提算法在理论上存在损失信息上界。采用UCI标准数据集和实际澳门气象数据进行仿真实验,结果表明,与现有典型算法相比,该算法对少类样本的预测精度更高,数值稳定性更好。

关键词: 在线贯序极限学习机,不均衡数据,主曲线,少类样本合成过采样

Abstract: Many traditional machine learning methods tend to get biased classifier which leads to lower classification precision for minor class in sequential imbalanced data.To improve the classification accuracy of minor class,a new imbalanced online sequential extreme learning machine based on principal curve was proposed.The core idea of the method is to get balanced samples based on the distribution features of online sequential data,reducing the blindness in the process of synthetic minority,which contains two stages.In offline stage,the principal curve is introduced to establish the distribution model of two kinds of samples.Over-sampling is done by using SMOTE for minor class.Then the membership degree of each sample is set according to the projection distance respectively,and the majority and virtual minor samples are deleted according to the under interval.Then the initial model is established.In online stage,over-sampling is done by using SMOTE for online sequential minor samples,getting the balanced samples according to the under interval.Then network weight is updated dynamically.The proposed algorithm has upper bound of the loss of information through the theoretical proof.The experiment was taken on three UCI datasets and the real-world air pollutant forecasting dataset,which shows that the proposed method outperforms the traditional methods in terms of prediction accuracy and numerical stability.

Key words: Online sequential extreme learning machine,Imbalanced data,Principal curve,Synthetic minority over-sampling

[1] Yang Zhi-ming,Qiao Li-yan,Peng Xi-yuan.Research on Data-ming Method for Imbalanced Dataset Based on Improved SMOTE[J].Acta Electronica Sinica,2007,5(12A):22-26(in Chinese) 杨智明,乔立言,彭喜元.基于改进SMOTE的不平衡数据挖掘方法研究[J].电子学报,2007,5(12A):22-26
[2] Fu Zhong-liang.Cost-sensitive Ensemble Learning Algorithm forMulti-label Classification Problems[J].Acta Automatica Sinica,2014(6):1075-1085(in Chinese) 付忠良.多标签代价敏感分类集成学习算法[J].自动化学报,2014(6):1075-1085
[3] Zeng Hui.Research on Improved Weighted Support Vector Machine and Application In Fault Diagnosis Method[D].Guangzhou:South China University of Technology,2010(in Chinese) 曾辉.改进加权支持向量机的研究及在故障诊断中的应用[D].广州:华南理工大学,2010
[4] Zhang Chun-xia,Zhang Jiang-she.A Survey of Selective Ensemble learning Algorithm[J].Chinese J ournal of Computers,2011,4(8):1399-1410(in Chinese) 张春霞,张讲社.选择性集成学习算法综述[J].计算机学报,2011,4(8):1399-1410
[5] Rok B,Lara L.SMOTE for high-dimensional class-imbalanceddata[J].BMC Bioinformatics,2013,4(1):1-16
[6] Zeng Zhi-qiang,Wu Qun,Liao Bei-shui,et al.A Classification Method For Imbalance Data Set Based on Kernel SMOTE[J].Acta Electronica Sinica,2009,7(11):2489-2495(in Chinese) 曾志强,吴群,廖备水,等.一种基于核SMOTE的非平衡数据集分类方法[J].电子学报,2009,7(11):2489-2495
[7] Jeatrakul P,Wong KW,Fung C C.Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm[M]∥Neural Information Processing.Mo-dels and Applications,2010:152-159
[8] Zhai Y,Ma N,Ruan D.An effective over-sampling method for imbalanced data sets classification[J].Chinese Journal of Electronics, 2011,0(3):489-494
[9] Huang G-B,Zhou H,Ding X,et al.Extreme Learning Machine for Regression and Multiclass[J].IEEE Transactions on Systems,Man,and Cybernetics-Part B:Cybernetics,2012,42(2):513-529
[10] Liang N Y,Huang G B.A fast and accurate online sequential learning algorithm for feedforward networks[J].IEEE Trans Neural Networks,2006,7:1411-1423
[11] Yuan P,Ma H,Fu H.Hotspot-entropy based data forwarding in opportunistic social networks[J].Pervasive and Mobile Computing,2015(1),16(A):136-154
[12] Li Hao.Soft Sensing and its Applied Research Based on Principal Curves[D].Hangzhou:Zhejiang University,2013(in Chinese) 李浩.基于主曲线的软测量及其应用研究[D].杭州:浙江大学,2013
[13] Nele V,Enislay R,Chris Cornelis,et al.Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection[J].Applied Soft Computing Journal,2014,22:511-517
[14] SMG.E-publication Download Page.http://www.smg.gov.mo/www/ccaa/pdf/e_pdf_download.php
[15] Newman D J,Hettich S,Blake C L,et al.UCI Repository of machine learning databases.Irvine,CA:University of California,Department of Information and Computer Science.http://www.ics.uci.edu/mlearn/ML Repository.html

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!