计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 194-198.

• 数据科学 • 上一篇    下一篇

基于谱聚类和成对数据表示的多层感知机分类算法

刘树栋, 魏嘉敏   

  1. (中南财经政法大学信息与安全工程学院 武汉430073)
  • 出版日期:2019-11-10 发布日期:2019-11-20
  • 通讯作者: 刘树栋(1984-),男,博士,讲师,CCF会员,主要研究方向为机器学习与推荐系统,E-mail:bupt.mymeng@gmail.com。
  • 基金资助:
    本文受国家自然科学基金项目(61602518,71872180),中南财经政法大学中央高校基本科研业务费专项资金(2722019JCG074,2722019JCT035)资助。

Multilayer Perceptron Classification Algorithm Based on Spectral Clusteringand Simultaneous Two Sample Representation

LIU Shu-dong, WEI Jia-min   

  1. (School of Information and Security Engineering,Zhongnan University of Economics and Law,Wuhan 430073,China)
  • Online:2019-11-10 Published:2019-11-20

摘要: 面向类别不均衡数据集的分类学习一直是数据挖掘和机器学习领域的研究热点。数据级、算法级和集成方法是目前解决类别不均衡学习的3种主流方法,其中欠抽样是类别不均衡学习一种常用的数据级解决方法,其缺点在于容易丢失多数类中部分有用信息。文中将谱聚类引入到成对数据表示的多数类欠抽样过程中,首先利用谱聚类方法,对多数类样本进行聚类,根据聚类簇大小和簇内样本点与少数类样本点的平均距离,在每个聚类簇内抽取不同个数有代表性的样本,并将簇内样本点之间及所有少数类样本点两两成对表示,从而有效降低了所有样本成对数据表示中两两组合而导致的数据暴涨问题,同时避免了随机抽样而可能导致的有效信息丢失问题。最后在9组UCI数据集上验证了所提算法的有效性。

关键词: 不均衡学习, 多层感知机, 分类, 谱聚类, 欠抽样

Abstract: Classification learning from imbalanced datasets is always one of hot topics in data mining and machine lear-ning domains.Data-level,algorithm-level and ensemble solutions are three main methods so far for addressing imba-lanced learning.Undersmapling,which is one of data-level solutions,is widely utilized in many imbalanced learning scenarios.However,its drawback is discarding potentially useful majority data instances.In this paper,spectral clustering was introduced to take sample of the majority class instances so as to build simultaneous two sample representation.Firstly,all majority class instances are divided into many different clusters by spectral clustering analysis,different numbers of representative samples are extracted from different clusters according to the size of each cluster and the average distance between the minority class instances are generated simultaneous and each cluster,then two sample representation with the extracted instances are generated simultaneous from clusters and the minority class instances.The proposed method not only alleviates the issue of data explosion in simultaneous two sample representation,but also avoids the loss of useful information in random sampling.Finally,several experiments certificate its validity on nine groups of datasets from UCI.

Key words: Classification, Imbalanced learning, Multilayer perceptron, Spectral clustering, Under-sampling

中图分类号: 

  • TP311
[1]PROBOST F.Machine learning from imbalanced data set 101[C]∥Proceedings of Workshop on Learning from Imbalanced Data Set (AAAI’00).Palo Alto,CA:AAAI,2000:1-3.
[2]CHAWLA N V,JAPKOWICZ N,KOLCZ A.Editorial:specialissue on learning from imbalanced data sets[J].SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets,2004,6(1):1-6.
[3]GALAR M,FERNANDEZ A,BARRENCHEA E,et al.A review on ensembles for the class imbalance problem:Bagging-,Boosting-,and hybrid-based approaches[J].IEEE Transaction on Systems,Man and Cybernetics,2012,42(4):463-484.
[4]KRAWCZYK B.Learning from imbalanced data:open challenge and future directions[J].Progress in Artificial Intelligence,2016,5(4):1-12.
[5]ROY A,CRUZ R M O,CAVALCANI G D C.A study on combining dynamic selection and data preprocessing for imbalanced learning[J].Neurocom-puting,2018,286:179-192.
[6]GUO H,LI Y,JENNIFER S,et al.Learning from class-imba-lanced data:review of methods and applications[J].Expert Systems with Applications,2017,73:220-239.
[7]YANG Q,WU X.10 challenging problems in data mining research[J].International Journal of Information Technology and Decision Making,2006,5(4):597-604.
[8]FERNANDEZ A,RIO S,CHAWLA N V,et al.An insight into imbalanced big data classification:outcomes and challenges[J].Complex Intelligent Systems,2017,3(2):105-120.
[9]GUERMAZI R,CHAABANE I,HAMMAMI M.AECID:asymmetric entropy for classifying imbalanced data[J].Information Sciences,2018,467:373-397.
[10]WU F,JING X,SHIN S,et al.Multiset feature learning for highly imbalanced data classification[C]∥Proceedings of the thirty-first AAAI Conference on Artificial Intelligence.Palo Alto,CA:AAAI,2017:1583-1589.
[11]LOYOLA-GONZALEZ O,MARTINEZ-TRINIDAD J F,CARRASCO-OCHOA J A.Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases[J].Neurocomputing,2016,175:935-947.
[12]LIN C,HSIEH T,LIN Y,et al.Minority Oversampling in Kernel Adaptive Subspaces for Class Imbalanced Datasets[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(5):950-962.
[13]SHAHEE S A,ANANTHAKUMAR U.An adaptive oversampling technique for imbalanced datasets[C]∥Proceedings of IEEE International Conference on Data Mining (ICDM’18).NJ:IEEE,2018:1-16.
[14]LIN W,TSAI C,HU Y,et al.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409/410:17-26.
[15]LI F,ZHANG X,ZHANG X,et al.Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets[J].Information Sciences,2018,422:242-256.
[16]DECHERCHI S,ROCCHIA W.Import vector domain descrip-tion:a kernel logistic one-class learning algorithm[J].IEEE Transactions on Neural Networks and Learning Systems,2017,28(7):1722-1729.
[17]FERNANDEZ-FRANCOS D,FONTENLA-ROMERO O,ALONSO-BETANZOS A.One-class convex hull-based algorithm for classification in distributed environments [J].IEEE Transactions on Systems,Man and Cybernetics,2017,99:1-11.
[18]SUN J,SHAO J,HE C.Abnormal event detection for video surveillance using deep one-class learning[J].Multimedia Tools and Applications,2017,3:1-15.
[19]ERFANI S M,REJASEGARAR S,KARUNA-SEKERA S,et al.High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning[J].Pattern Recognition,2016,58(C):121-134.
[20]FERDOWSI Z,GHANI R,SETTIMI R.Online active learning with imbalanced Classes[C]∥Proceedings of IEEE 13th International Conference on Data Mining (ICDM’13),NJ:IEEE,2013:1043-1048.
[21]ZHANG X,YANG T,SRINIVASAN P.Online asymmetric active learning with imbalanced data[C]∥Proceedings of 22th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’16).New York:ACM.2016:2055-2064.
[22]RAMIREZ-LOAIZA M,SHARMA M,KUMAR G,et al.Active learning:An empirical study of common baselines[J].Data Mi-ning and Knowledge Discovery,2017,31:287-313.
[23]ZHANG Y,ZHAO P,CAO J,et al.Online adaptive asymmetric active learning for budgeted imbalanced data[C]∥Proceedings of 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’18).New York:ACM.2018:2768-2777.
[24]LI K,KONG X,LU Z.Boosting weighted ELM for imbalanced learning[J].Neurocomputing,2014,128:15-21.
[25]YU H,SUN C,YANG X,et al.ODC-ELM:optimal decisionoutputs compensation-based extreme learning machine for classifying imbalanced data[J].Knowledge-Based Systems,2016,92:55-70.
[26]DING S,MIRZA B,LIN Z,et al.Kernel based online learning for imbalance multi- class classification[J].Neurocomputing,2018,277:139-148.
[27]DUMPALA S H,CHAKRABORTY R,KOPPARAPU SK.A novel data representation for effective learning in class imbalanced scenarios[C]∥Proceedings of the Twenty-seventh International Joint Conference on Artificial Intelligence.2018:2100-2106.
[28]贾洪杰,丁世飞,史忠植.求解大规模谱聚类的近似加权核k-means算法[J].软件学报,2015,26(11):2836-2846.
[29]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority oversampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[30]HART P.The condensed nearest neighbor rule [J].IEEETransactions on Information Theory,1968,14:515-516.
[31]TANG Y,ZHANG Y,CHAWLA N V,et al.SVMs modeling for highly imbalanced classification [J].IEEE Transactions on Systems,Man,and Cybernetics,2009,39(1):281-288.
[32]GALAR M,FERNANDEZ A,BARRENECHEA E,et al.Eusboost:Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling [J].Pattern Recognition,2013,(12):3460-3471.
[33]SEIFFERT C,KHOSHGOFTAAR T M,HULSE J V,et al.RUSBoost:a hybrid approach to alleviating class imbalance [J].IEEE Transactions on Systems,Man,and Cybernetics,2010,40(1):185-197.
[1] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[2] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[4] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[7] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[8] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[9] 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥.
视频理解中的动作质量评估方法综述
Survey on Action Quality Assessment Methods in Video Understanding
计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028
[10] 邵欣欣.
TI-FastText自动商品分类算法
TI-FastText Automatic Goods Classification Algorithm
计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[11] 王文强, 贾星星, 李朋.
自适应的集成定序算法
Adaptive Ensemble Ordering Algorithm
计算机科学, 2022, 49(6A): 242-246. https://doi.org/10.11896/jsjkx.210200108
[12] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[13] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
[14] 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行.
基于步态分类辅助的虚拟IMU的行人导航方法
Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification
计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148
[15] 黄璞, 沈阳阳, 杜旭然, 杨章静.
基于局部约束特征线表示的人脸识别
Face Recognition Based on Locality Constrained Feature Line Representation
计算机科学, 2022, 49(6A): 429-433. https://doi.org/10.11896/jsjkx.210300169
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!