计算机科学 ›› 2016, Vol. 43 ›› Issue (3): 62-67.doi: 10.11896/j.issn.1002-137X.2016.03.012
王金婉,毛文涛,王礼云,何玲
WANG Jin-wan, MAO Wen-tao, WANG Li-yun and HE Ling
摘要: 针对现有机器学习算法难以有效提高不均衡在线贯序数据中少类样本分类精度的问题,提出了一种基于主曲线的不均衡在线贯序极限学习机。该方法的核心思路是根据在线贯序数据的分布特性,均衡各类别样本,以减少少类样本合成过程中的盲目性,主要包括离线和在线两个阶段。离线阶段采用主曲线分别建立各类别样本的分布模型,利用少类样本合成过采样算法对少类样本过采样,并根据各样本点到对应主曲线的投影距离分别为其设定相应大小的隶属度,最后根据隶属区间削减多类和少类虚拟样本,进而建立初始模型。在线阶段对贯序到达的少类样本过采样,并根据隶属区间均衡贯序样本,进而动态更新网络权值。通过理论分析证明了所提算法在理论上存在损失信息上界。采用UCI标准数据集和实际澳门气象数据进行仿真实验,结果表明,与现有典型算法相比,该算法对少类样本的预测精度更高,数值稳定性更好。
[1] Yang Zhi-ming,Qiao Li-yan,Peng Xi-yuan.Research on Data-ming Method for Imbalanced Dataset Based on Improved SMOTE[J].Acta Electronica Sinica,2007,5(12A):22-26(in Chinese) 杨智明,乔立言,彭喜元.基于改进SMOTE的不平衡数据挖掘方法研究[J].电子学报,2007,5(12A):22-26 [2] Fu Zhong-liang.Cost-sensitive Ensemble Learning Algorithm forMulti-label Classification Problems[J].Acta Automatica Sinica,2014(6):1075-1085(in Chinese) 付忠良.多标签代价敏感分类集成学习算法[J].自动化学报,2014(6):1075-1085 [3] Zeng Hui.Research on Improved Weighted Support Vector Machine and Application In Fault Diagnosis Method[D].Guangzhou:South China University of Technology,2010(in Chinese) 曾辉.改进加权支持向量机的研究及在故障诊断中的应用[D].广州:华南理工大学,2010 [4] Zhang Chun-xia,Zhang Jiang-she.A Survey of Selective Ensemble learning Algorithm[J].Chinese J ournal of Computers,2011,4(8):1399-1410(in Chinese) 张春霞,张讲社.选择性集成学习算法综述[J].计算机学报,2011,4(8):1399-1410 [5] Rok B,Lara L.SMOTE for high-dimensional class-imbalanceddata[J].BMC Bioinformatics,2013,4(1):1-16 [6] Zeng Zhi-qiang,Wu Qun,Liao Bei-shui,et al.A Classification Method For Imbalance Data Set Based on Kernel SMOTE[J].Acta Electronica Sinica,2009,7(11):2489-2495(in Chinese) 曾志强,吴群,廖备水,等.一种基于核SMOTE的非平衡数据集分类方法[J].电子学报,2009,7(11):2489-2495 [7] Jeatrakul P,Wong KW,Fung C C.Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm[M]∥Neural Information Processing.Mo-dels and Applications,2010:152-159 [8] Zhai Y,Ma N,Ruan D.An effective over-sampling method for imbalanced data sets classification[J].Chinese Journal of Electronics, 2011,0(3):489-494 [9] Huang G-B,Zhou H,Ding X,et al.Extreme Learning Machine for Regression and Multiclass[J].IEEE Transactions on Systems,Man,and Cybernetics-Part B:Cybernetics,2012,42(2):513-529 [10] Liang N Y,Huang G B.A fast and accurate online sequential learning algorithm for feedforward networks[J].IEEE Trans Neural Networks,2006,7:1411-1423 [11] Yuan P,Ma H,Fu H.Hotspot-entropy based data forwarding in opportunistic social networks[J].Pervasive and Mobile Computing,2015(1),16(A):136-154 [12] Li Hao.Soft Sensing and its Applied Research Based on Principal Curves[D].Hangzhou:Zhejiang University,2013(in Chinese) 李浩.基于主曲线的软测量及其应用研究[D].杭州:浙江大学,2013 [13] Nele V,Enislay R,Chris Cornelis,et al.Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection[J].Applied Soft Computing Journal,2014,22:511-517 [14] SMG.E-publication Download Page.http://www.smg.gov.mo/www/ccaa/pdf/e_pdf_download.php [15] Newman D J,Hettich S,Blake C L,et al.UCI Repository of machine learning databases.Irvine,CA:University of California,Department of Information and Computer Science.http://www.ics.uci.edu/mlearn/ML Repository.html |
No related articles found! |
|