计算机科学 ›› 2013, Vol. 40 ›› Issue (12): 223-228.
牛新征,佘堃
NIU Xin-zheng and SHE Kun
摘要: 挖掘事务数据库中的最大频繁项目集是数据挖掘领域一个重要的研究方向。基于FP-tree的FPMAX算法是目前较为高效与稳定的最大频繁项目集挖掘算法之一。然而对于稠密数据库中的挖掘,FPMAX会产生大量的冗余递归过程,导致额外的条件FP-tree构造开销。而且在支持度较低时,FPMAX则会因用于超集检测的全局MFI-tree较为庞大而导致超集检测的性能下降。为此提出FPMAX的改进算法FPMAX-reduce,其通过采用基于事务共同后缀的前瞻剪枝策略来减少挖掘过程中的冗余递归过程。 当递归过程中产生的新条件FP-tree规模较小时,FPMAX-reduce通过构造条件MFI-tree来减小后续超集检测遍历的开销。性能试验表明,FPMAX-reduce算法通过有效的前瞻剪枝,在稠密事务数据库以及低支持度的情况下至多可将递归过程减少至原算法的一半以下,进而有效地提高了FPMAX算法的效率。
[1] BayardoJr R J.Efficiently mining long patterns from databases[C]∥Proceedings of the 1998ACM SIGMOD International Conference on Management of Data.New York,1998:85-93 [2] Burdick D,Calimlim M,Flannick J,et al.Mafia:A maximal frequent itemsetalgorithm[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1490-1504 [3] Gouda K,Zaki M J.Efficiently mining maximal frequent item-sets[C]∥Proceedings of the 2001IEEE International Confe-rence on Data Mining.San Jose,2001:163-170 [4] 吉根林,杨明,宋余庆,等.最大频繁项目集的快速更新[J].计算机学报,2005,28(1):128-135 [5] Uno T,Kiyomi M,Arimura H.LCM ver.2:Efficient mining algorithms for frequent/closed/maximal itemsets[C]∥Procee-dings of the 2004IEEE ICDM Workshop on Frequent Itemset Mining Implementations.Brighton,2004:1-11 [6] Uno T,Kiyomi M,Arimura H.Lcm ver.3:Collaboration of array,bitmap and prefix tree for frequent itemset mining[C]∥Proceedings of the 1st International Workshop on Open Source Data Mining:Frequent Pattern Mining Implementations.New York,2005,77-86 [7] Grahne G,Zhu J.High performance mining of maximal frequent itemsets[C]∥Proceedings of the 6th International Workshop on High Performance Data Mining.2003:1-10 [8] Grahne G,Zhu J.Efficiently using prefix-trees in mining fre-quent itemsets[C]∥Proceedings of the Third FIMI Workshop on Frequent Itemset Mining Implementations.Florida,2003:123-132 [9] Goethals B,Zaki M J.Advances in frequent itemset mining implementations:report on FIMI’03[J].ACM SIGKDD Explorations Newsletter,2004,6(1):109-117 [10] Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation[J].ACM SIGMOD Record,2000,29(2):1-12 [11] 牛新征,佘堃.面向大规模数据的快速并行聚类划分算法研究[J].计算机科学,2012,9(1):134-137 |
No related articles found! |
|