计算机科学 ›› 2014, Vol. 41 ›› Issue (2): 111-113.

• CCML 2013 • 上一篇    下一篇

基于相关规则的不平衡数据的关联分类

黄再祥,周忠眉,何田中   

  1. 漳州师范学院计算机科学与工程系 漳州363000;漳州师范学院计算机科学与工程系 漳州363000;漳州师范学院计算机科学与工程系 漳州363000
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61170129),福建省自然科学基金(2013J01259)资助

Correlated Rules Based Associative Classification for Imbalanced Datasets

HUANG Zai-xiang,ZHOU Zhong-mei and HE Tian-zhong   

  • Online:2018-11-14 Published:2018-11-14

摘要: 许多研究表明关联分类具有较高的分类准确率,然而,大多数关联分类基于“支持度-置信度”框架,在不平衡数据集中,置信度和支持度都偏向产生多数类的规则,因此,少数类的实例容易被错误分类。针对上述问题,提出了一种基于相关规则的不平衡数据的关联分类算法。该算法挖掘频繁且互关联的项集,在以该项集为前件的分类规则中选取提升度最大的规则。规则按结合了提升度、置信度和补类支持度(CCS)的规则强度进行排序。实验表明,该算法取得了较高的平均分类准确率且在分类少数类的实例时具有更高的准确率。

关键词: 数据挖掘,关联分类,不平衡数据,相关规则 中图法分类号TP311.13文献标识码A

Abstract: Many studies have shown that associative classification is a promising classification method.However,most algorithms of associative classifications may not achieve high classification performance on imbalanced datasets because they generate rules based on the “support-confidence” framework.The confidence (support) tends to bias the majority class in imbalanced datasets.As a result,these instances with minority class may be misclassified.We proposed a new associative classification approach called CRAC (Correlated Rules based Associative Classification for Imbalanced Datasets).First,we mine frequent and mutual associative itemsets for classification.Therefore,we will generate small set of high-quality rules.Second,CRAC only select the rule with largest lift as a CAR among all rules with that frequent and associative itemset as condition.As a result,the antecedent and the consequent of the rules CRAC generated are positively correlated.Finally,we rank rules according to a new metric which integrates lift,support and Complement Class Support (CCS).So,we are likely to use rules with positively correlation to prediction the minority class.Our experiments on fifteen UCI data sets show that our approach is an effective classification technique for both balance and imbalanced datasets,and has better average classification accuracy in comparison with CBA.

Key words: Data mining,Associative classification,Imbalance datasets,Correlated rules

[1] Liu B,Hsu W,Ma Y.Integrating classification and associationrule mining[C]∥Proc of the 4th International Conference on Knowledge Discovery and Data Mining (KDD’98).1998:80-86
[2] Li W,Han J,Pei J.CMAR:Accurate and efficient classification based on multiple class-association rules[C]∥Proc of the 1st International Conference on Data Mining.2001:369-376
[3] Yin X,Han J.CPAR:classification based on predictive association rules[C]∥Proc of the SIAM International Conference on Data Mining (SDM’03).2003:331-335
[4] Dong G,Zhang X,Wong L,et al.CAEP:Classification by aggregating emerging patterns[C]∥Discovery Science.Springer Berlin Heidelberg,1999:30-42
[5] Wang J,Karypis G.HARMONY:Efficiently mining the bestrules for classification[C]∥ Proc.of SDM.2005:205-216
[6] Quinlan J R.C4.5:programs for machine learning[M].Morgan kaufmann,1993
[7] Verhein F,Chawla S.Using significant,positively associated and relatively class correlated rules for associative classification of imbalanced datasets[C]∥Seventh IEEE International Confe-rence on Data Mining,2007,ICDM 2007.IEEE,2007:679-684
[8] Arunasalam B,Chawla S.CCCS:a top-down associative classifier for imbalanced class distribution[C]∥Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,2006:517-522
[9] Omiecinski E R.Alternative interest measures for mining associa-tions in databases [J].IEEE Transactions on Knowledge and Data Engineering,2003,15(1):57-69
[10] Zhao Y,Karypis G.Criterion functions for document clustering:Experiments and analysis [Z].Machine Learning,2001
[11] Agrawal R,Srikant R.Fast algorithms for mining associationrules[C]∥Proc of the 20th International Conference on Very Large Data Bases (VLDB’94).1994:487-499
[12] Thabtah F A,Cowling P,Peng Y.MMAC:A New Multi-class,Multi-label Associative Classification Approach[C]∥Proc of the 4th International Conference on Data Mining (ICDM’04).2004:217-224
[13] CBA:http://www.comp.nus.edu.sg/dm2/p-download.html

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!