Computer Science ›› 2020, Vol. 47 ›› Issue (11): 88-94.doi: 10.11896/jsjkx.191000102

• Database & Big Data & Data Science • Previous Articles     Next Articles

Mixed-sampling Method for Imbalanced Data Based on Quantum Evolutionary Algorithm

YANG Hao1, CHEN HONG-mei2   

  1. 1 Key Laboratory of Cloud Computing and Intelligent Technology,Southwest Jiaotong University,Chengdu 611756,China
    2 School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China
  • Received:2019-10-16 Revised:2020-03-29 Online:2020-11-15 Published:2020-11-05
  • About author:YANG Hao,born in 1995,postgraduate,is a member of China Computer Federation.His main research interests include database technology and data mining.
    CHEN Hong-mei,born in 1971,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include granular calculation,rough sets and intelligent information processing.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61572406,61976182) and Key Program for International S&T Cooperation of Sichuan Province (2019YFH0097).

Abstract: The under-sampling and over-sampling are the common methods for solving the classification problem in an imbalanced data.This paper focuses on the overfitting or lose valuable samples problems brought by using a single sampling method.A mixed sampling method,namely MSQEA,based on quantum evolutionary algorithm is proposed.In MSQEA,the majority class samples and minority class samples are firstly encoded separately to form individuals of population in the quantum evolutionary algorithm,and then an appropriate candidate sampling subset is obtained through optimization iterations.After that,the majority samples in candidate subset are removed by under-sampling to avoid the problem of subsequent oversampling method to generate overmuch redundant samples.Then,an oversampling method is used to generate the minority samples.Additionally,in order to effectively evaluate the fitness of quantum individuals,clustering technique is used to cluster the dataset and the effective validation sets for the evaluation of individuals are obtained.Experiments are conducted to evaluate the performance of algorithm MSQEA.The imbalanced data sets are downloaded from KEEL website,and SMO,J48 and NB are used as classifiers to verify the performance of a classifier after data preprocessing by different sampling methods.Experimental results show that the classification performance of MSQEA is better than some state-of-the art sampling methods.

Key words: Classification, Imbalanced data, Mixed-sampling, Quantum evolutionary algorithm

CLC Number: 

  • TP391
[1] SUN A,LIM E P,LIU Y.On strategies for imbalanced textclassification using SVM:A comparative study[J].Decision Support Systems,2009,48(1):191-201.
[2] MAZUROWSKI M A,HABAS P A,ZURADA J M,et al.Training neural network classifiers for medical decision making:The effects of imbalanced datasets on classification performance[J].Neural networks,2008,21(2-3):427-436.
[3] CAO H,LI X L,WOON D Y K,et al.Integrated oversampling for imbalanced time series classification[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(12):2809-2822.
[4] DHEEPA V,DHANAPAL R,MANJUNATH G.Fraud detection in imbalanced datasets using cost based learning[J].Eur.J.Sci.Res,2012,91:486-490.
[5] LIN W C,TSAI C F,HU Y H,et al.Clustering-based under-sampling in class-imbalanced data[J].Information Sciences,2017,409:17-26.
[6] BARUA S,ISLAM M M,YAO X,et al.MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(2):405-425.
[7] CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of artificial intelligence research,2002,16:321-357.
[8] ZHU T,LIN Y,LIU Y.Synthetic minority oversampling technique for multiclass imbalance problems[J].Pattern Recognition,2017,72:327-340.
[9] YANG H,CHEN H M.Ensemble classification algorithm forimbalanced data combining the local area density[J].Journal of Frontiers of Computer Science and Technology.2020,14(2):274-284.
[10] CANO J R,HERRERA F,LOZANO M.Using evolutionary algorithms as instance selection for data reduction in KDD:an experimental study[J].IEEE Transactions on Evolutionary Computation,2003,7(6):561-575.
[11] AHA D W,KIBLER D,ALBERT M K.Instance-based learning algorithms[J].Machine Learning,1991,6(1):37-66.
[12] WILSON D R,MARTINEZ T R.Reduction techniques for instance-based learning algorithms[J].Machine Learning,2000,38(3):257-286.
[13] TSAI C F,LIN W C,HU Y H,et al.Under-sampling class imbalanced datasets by combining clustering analysis and instance selection[J].Information Sciences,2019,477:47-54.
[14] SHAO K,ZHAI Y,SUI H,et al.Learning from the imbalanced data based on quantum evolutionary[J].ICIC Express Letters,2014,8(6):1725-1729.
[15] LI J,FONG S,WONG R K,et al.Adaptive multi-objectiveswarm fusion for imbalanced data classification[J].Information Fusion,2018,39:1-24.
[16] WU Y F,LIANG J Y,WANG J H.Classification algorithmbased on hybrid sampling for unbalanced data[J].Journal of Frontiers of Computer Science and Technology,2019,13(2):342-349.
[17] HU F,WANG L,ZHOU Y,et al.An oversampling method for imbalance data based on three-way decision model[J].Acta Electronica Sinica,2018,46(1):135-144.
[18] HAN H,WANG W Y,MAO B H.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C]//Inter-national Conference on Intelligent Computing.Springer,Berlin,Heidelberg,2005:878-887.
[19] HAN K H,KIM J H.Quantum-inspired evolutionary algorithm for a class of combinatorial optimization[J].IEEE Trans on Evo-lutionary Computation,2002,6(6):580-593.
[20] ALCALÁ-FDEZ J,FERNÁNDEZ A,LUENGO J,et al.Keeldata-mining software tool:data set repository,integration of algorithms and experimental analysis framework[J].Journal of Multiple-Valued Logic & Soft Computing,2011,17:255-287.
[21] MORENO-TORRES J G,SÁEZ J A,HERRERA F.Study on the impact of partition-induced dataset shift on k-fold cross-validation[J].IEEE Transactions on Neural Networks and Learning Systems,2012,23(8):1304-1312.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[3] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[4] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[5] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[6] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[7] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[8] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[9] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[10] YANG Han, WAN You, CAI Jie-xuan, FANG Ming-yu, WU Zhuo-chao, JIN Yang, QIAN Wei-xing. Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification [J]. Computer Science, 2022, 49(6A): 759-763.
[11] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[12] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[13] DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method [J]. Computer Science, 2022, 49(6A): 100-108.
[14] HUANG Shao-bin, SUN Xue-wei, LI Rong-sheng. Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network [J]. Computer Science, 2022, 49(6A): 119-124.
[15] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!