Computer Science ›› 2020, Vol. 47 ›› Issue (6A): 488-493.doi: 10.11896/JsJkx.190600132

• Database & Big Data & Data Science • Previous Articles     Next Articles

New Associative Classification Algorithm for Imbalanced Data

CUI Wei, JIA Xiao-lin, FAN Shuai-shuai and ZHU Xiao-yan   

  1. School of Computer Science and Technology,Xi’an Jiaotong University,Xi’an 710049,China
  • Published:2020-07-07
  • About author:CUI Wei, born in 1994, postgraduate.His main research interests include machine learning and data mining.
    ZHU Xiao-yan, born in 1983, Ph.D, associate professor, is a member of China Computer Federation.Her main research interests include machine lear-ning and data mining.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61402355,61502378).

Abstract: The rule-based classification algorithms,which have good classification performance and interpretability,have been widely used.However,the existing rule-based classification algorithms do not consider the case of imbalanced data,thus affect their classification effect on imbalanced data.In this paper,a new associative classification algorithm ACI for imbalanced data is proposed.Firstly,all the association rules are generated.Then,the rules are pruned by an imbalanced rule pruning method.Finally,the remaining rules are saved in a CR Tree for new instance classification.Experimental results on 27 public data sets show that the proposed algorithm performs better than the compared algorithms.

Key words: Association rule, Classification, Imbalanced data

CLC Number: 

  • TP312
[1] HIERONS R.Machine learning.Tom M.Published by McGraw-Hill,Maidenhead,U.K.,International Student Edition,1997.ISBN:0-07-115467-1,414 pages.Price:U.K.£22.99,soft co-ver.Software Testing Verification & Reliability,2015,9(3):191-193.
[2] SALZBERG S L J M L.C4.5:Programs for Machine Learning by J.Ross Quinlan.Morgan Kaufmann Publishers,Inc.,1993.Machine Learning,1994,16(3):235-240.
[3] RAJPUT A.J48 and JRIP Rules for E-Governance Data.IJCSS,2011,5(2):201.
[4] FRNKRANZ J,WIDMER G.Incremental Reduced ErrorPruning//Machine Learning Proceedings.1994:70-77.
[5] HU K,LU Y,ZHOU L,et al.Integrating classification and association rule mining:A concept lattice framework//International Workshop on Rough Sets,Fuzzy Sets,Data Mining,and Granular-Soft Computing.Springer.1999:443-447.
[6] LI W,HAN J,PEI J.CMAR:Accurate and efficient classification based on multiple class-association rules//Proceedings IEEE International Conference on Data Mining,2001(ICDM 2001).IEEE,2001:369-376.
[7] THABTAH F A,COWLING P,PENG Y.MMAC:A New Multi-Class,Multi-Label Associative Classification Approach//IEEE International Conference on Data Mining.2004.
[8] ZHU X,SONG Q,JIA Z.A Weighted Voting-Based Associative Classification Algorithm.The Computer Journal,2010,53(6):786-801.
[9] GANGANWAR V.An overview of classification algorithms for imbalanced datasets.International Journal of Emerging Technology and Advanced Engineering,2012,2(4):42-47.
[10] HE H,MA Y.Imbalanced learning.Foundations,algorithms, and applications.Wiley-IEEE Press,2013.
[11] ZHOU Z H,LIU X Y.On multi-class cost-sensitive learning//National Conference on Artificial Intelligence.2006.
[12] WU G,CHANG E Y.KBA:Kernel boundary alignment consi-dering imbalanced data distribution.IEEE Transactions on Knowledge & Data Engineering,2005(6):786-795.
[13] BREIMAN L.Bagging predictors.Machine Learning,1996, 24(2):123-140.
[14] ZAREAPOOR M,SHAMSOLMOALI P.Application of credit card fraud detection:Based on bagging ensemble classifier.Procedia computer science,2015,48(2015):679-685.
[15] WITTEN I H,FRANKE,HALL M A,et al.Data Mining: Practical machine learning tools and techniques.Morgan Kaufmann,2016:70-71.
[16] 韩家炜,坎伯.数据挖掘:概念与技术.北京:机械工业出版社,2012:158-159.
[17] DEORA C S,ARORA S,MAKANI Z.Comparison ofInteres-tingness Measures:Support-Confidence Framework versus Lift-Irule Framework.International Journal of Enginnering Research & Applications,2014,3(2):208-215.
[18] ALCAL-FDEZ J,FERNNDEZ A,LUENGO J,et al.KEEL Data-Mining Software Tool:Data Set Repository,Integration of Algorithms and Experimental Analysis Framework.Journal of Multiple-Valued Logic & Soft Computing,2011,17:255-287.
[19] PATIL T R,SHEREKAR S.Performance analysis of Naive Bayes and J48 classification algorithm for data classification.International Journal of Computer Science and Applications,2013,6(2):256-261.
[20] QUINLAN J R.Bagging,boosting,and C4.5//AAAI/IAAI.1996:725-730.
[21] LOBO J M,JIMNEZ-VALVERDE A,REAL R.AUC:a misleading measure of the performance of predictive distribution models.Global Ecology and Biogeography,2008,17(2):145-151.
[22] DAVIS J,GOADRICH M.The relationship between Precision-Recall and ROC curves//Proceedings of the 23rd International Conference on Machine Learning.ACM,2006:233-240.
[23] POWERS D M.Evaluation:from precision,recall and F-measure to ROC,informedness,markedness and correlation.Journal of Machine Learning Technology,2011,2(1):37-63.
[24] WILCOXON F,KATTI S,WILCOX R A.Critical values and probability levels for the Wil-coxon rank sum test and the Wil-coxon signed rank test.Selected Tables in Mathematical Statistics,1970,1:171-259.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[3] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[4] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[5] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[6] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[7] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[8] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[9] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[10] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[11] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[12] DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method [J]. Computer Science, 2022, 49(6A): 100-108.
[13] HUANG Shao-bin, SUN Xue-wei, LI Rong-sheng. Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network [J]. Computer Science, 2022, 49(6A): 119-124.
[14] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[15] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!