计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 488-493.doi: 10.11896/JsJkx.190600132
崔巍, 贾晓琳, 樊帅帅, 朱晓燕
CUI Wei, JIA Xiao-lin, FAN Shuai-shuai and ZHU Xiao-yan
摘要: 基于规则的分类算法具有分类性能好、可解释性强的优点,得到了广泛的应用。然而已有的基于规则的分类算法没有考虑不均衡数据的情况,从而影响了其对不均衡数据的分类效果。文中提出了一种新的不均衡关联分类算法ACI。首先生成所有的关联规则,然后使用不均衡规则裁剪方法进行规则裁剪。最后,将剩余规则存储到CR树中,用于新实例的分类。在27个公开数据集上的实验结果表明,提出的不均衡关联分类算法在不均衡数据集上比基准算法的分类效果更好。
中图分类号:
[1] HIERONS R.Machine learning.Tom M.Published by McGraw-Hill,Maidenhead,U.K.,International Student Edition,1997.ISBN:0-07-115467-1,414 pages.Price:U.K.£22.99,soft co-ver.Software Testing Verification & Reliability,2015,9(3):191-193. [2] SALZBERG S L J M L.C4.5:Programs for Machine Learning by J.Ross Quinlan.Morgan Kaufmann Publishers,Inc.,1993.Machine Learning,1994,16(3):235-240. [3] RAJPUT A.J48 and JRIP Rules for E-Governance Data.IJCSS,2011,5(2):201. [4] FRNKRANZ J,WIDMER G.Incremental Reduced ErrorPruning//Machine Learning Proceedings.1994:70-77. [5] HU K,LU Y,ZHOU L,et al.Integrating classification and association rule mining:A concept lattice framework//International Workshop on Rough Sets,Fuzzy Sets,Data Mining,and Granular-Soft Computing.Springer.1999:443-447. [6] LI W,HAN J,PEI J.CMAR:Accurate and efficient classification based on multiple class-association rules//Proceedings IEEE International Conference on Data Mining,2001(ICDM 2001).IEEE,2001:369-376. [7] THABTAH F A,COWLING P,PENG Y.MMAC:A New Multi-Class,Multi-Label Associative Classification Approach//IEEE International Conference on Data Mining.2004. [8] ZHU X,SONG Q,JIA Z.A Weighted Voting-Based Associative Classification Algorithm.The Computer Journal,2010,53(6):786-801. [9] GANGANWAR V.An overview of classification algorithms for imbalanced datasets.International Journal of Emerging Technology and Advanced Engineering,2012,2(4):42-47. [10] HE H,MA Y.Imbalanced learning.Foundations,algorithms, and applications.Wiley-IEEE Press,2013. [11] ZHOU Z H,LIU X Y.On multi-class cost-sensitive learning//National Conference on Artificial Intelligence.2006. [12] WU G,CHANG E Y.KBA:Kernel boundary alignment consi-dering imbalanced data distribution.IEEE Transactions on Knowledge & Data Engineering,2005(6):786-795. [13] BREIMAN L.Bagging predictors.Machine Learning,1996, 24(2):123-140. [14] ZAREAPOOR M,SHAMSOLMOALI P.Application of credit card fraud detection:Based on bagging ensemble classifier.Procedia computer science,2015,48(2015):679-685. [15] WITTEN I H,FRANKE,HALL M A,et al.Data Mining: Practical machine learning tools and techniques.Morgan Kaufmann,2016:70-71. [16] 韩家炜,坎伯.数据挖掘:概念与技术.北京:机械工业出版社,2012:158-159. [17] DEORA C S,ARORA S,MAKANI Z.Comparison ofInteres-tingness Measures:Support-Confidence Framework versus Lift-Irule Framework.International Journal of Enginnering Research & Applications,2014,3(2):208-215. [18] ALCAL-FDEZ J,FERNNDEZ A,LUENGO J,et al.KEEL Data-Mining Software Tool:Data Set Repository,Integration of Algorithms and Experimental Analysis Framework.Journal of Multiple-Valued Logic & Soft Computing,2011,17:255-287. [19] PATIL T R,SHEREKAR S.Performance analysis of Naive Bayes and J48 classification algorithm for data classification.International Journal of Computer Science and Applications,2013,6(2):256-261. [20] QUINLAN J R.Bagging,boosting,and C4.5//AAAI/IAAI.1996:725-730. [21] LOBO J M,JIMNEZ-VALVERDE A,REAL R.AUC:a misleading measure of the performance of predictive distribution models.Global Ecology and Biogeography,2008,17(2):145-151. [22] DAVIS J,GOADRICH M.The relationship between Precision-Recall and ROC curves//Proceedings of the 23rd International Conference on Machine Learning.ACM,2006:233-240. [23] POWERS D M.Evaluation:from precision,recall and F-measure to ROC,informedness,markedness and correlation.Journal of Machine Learning Technology,2011,2(1):37-63. [24] WILCOXON F,KATTI S,WILCOX R A.Critical values and probability levels for the Wil-coxon rank sum test and the Wil-coxon signed rank test.Selected Tables in Mathematical Statistics,1970,1:171-259. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[3] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[4] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[5] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[6] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[7] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[8] | 杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070 |
[9] | 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥. 视频理解中的动作质量评估方法综述 Survey on Action Quality Assessment Methods in Video Understanding 计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028 |
[10] | 邵欣欣. TI-FastText自动商品分类算法 TI-FastText Automatic Goods Classification Algorithm 计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089 |
[11] | 陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149 |
[12] | 杨健楠, 张帆. 一种结合双注意力机制和层次网络结构的细碎农作物分类方法 Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure 计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169 |
[13] | 杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行. 基于步态分类辅助的虚拟IMU的行人导航方法 Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification 计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148 |
[14] | 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建. 基于注意力机制和多任务学习的阿尔茨海默症分类 Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning 计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072 |
[15] | 李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134 |
|