基于片上多核的频繁项集并行挖掘算法

计算机科学 ›› 2014, Vol. 41 ›› Issue (3): 55-58.

基于片上多核的频繁项集并行挖掘算法

张步忠,程玉胜,王则林

安庆师范学院计算机与信息学院安庆246013;安庆师范学院计算机与信息学院安庆246013;南通大学计算机科学与技术学院南通226000

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受安徽省自然科学基金(070412061,10040606Q42),安庆师范学院青年科研基金项目(KJ201112)资助

Frequent Itemset Mining Parallel Algorithm Based on Chip Multi-core

ZHANG Bu-zhong,CHENG Yu-sheng and WANG Ze-lin

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 关联规则挖掘中最主要的工作是如何高效地挖掘频繁项集。目前在单机平台上,由于计算量大等原因,大数据集上的关联规则挖掘很难得到理想结果。在分析现有频繁项集挖掘算法的基础上,结合Eclat和dEclat挖掘算法优点,针对大数据集和片上多核共享内存计算环境,提出一种高效的并行频繁项集挖掘算法PEclat,算法实现了任务级并行挖掘频繁项集,并在大数据集上进行了多项测试。实验结果表明,无论数据稠密程度如何,该算法均能取得较好的性能。

关键词: 片上多核,频繁项集,并行处理,关联规则中图法分类号TP391文献标识码A

Abstract: The main work in association rule mining is how to find mining frequent itemsets efficiently．When the exis-ting frequent itemsets mining algorithms are analyzed,it can be concluded that the computing scale will increase sharply with the increase of data scale．So association rule mining in large data sets on PC platform is difficult to get desired result．Combining the advantages of Eclat and dEclat algorithm,PEclat,an efficient parallel frequent itemset mining algorithm was presented．PEclat runs in the multi-core and shared-memory computing environment．It works as task-level parallel and can process large data sets．The experiment states that better performance can be achieved regardless of the data density.

Key words: Chip multi-core,Frequent itemsets,Parallel process,Association rule

张步忠,程玉胜,王则林. 基于片上多核的频繁项集并行挖掘算法[J]. 计算机科学, 2014, 41(3): 55-58. https://doi.org/

ZHANG Bu-zhong,CHENG Yu-sheng and WANG Ze-lin. Frequent Itemset Mining Parallel Algorithm Based on Chip Multi-core[J]. Computer Science, 2014, 41(3): 55-58. https://doi.org/

参考文献

[1] Han J,Kamber M．Data mining:Concepts and techniques [M].Morgan Kaufmann,2005
[2] Agrawal R,Srikant R．Fast Algorithms for Mining Association Rules[C]∥Proc．of the 20th Int．Conf．on Very Large Databa-ses.1994:487-499
[3] Zaki M J,Parthasarathy S,et al．New algorithms for fast disco-very of association rules[C]∥Proc．of the 3rd Int．Conf．on Knowledge Discovery and Data Mining．1997:283-286
[4] Zaki M J,Gouda K．Fast Vertical Mining Using Diffsets[C]∥Proc．ACM SIGKDD Knowledge Discovery and Data Mining.2003:326-335
[5] Zaki M J．Scalable Algorithms for Association Mining[J]．IEEE Transactions on Knowledge and Data Engineering,2000,12:372-390
[6] 杜剑峰,李宏,陈松乔,等.单调和反单调约束条件下关联规则的挖掘算法分析[J].计算机科学,2005,32(6):142-144,166
[7] 宋长新,马克.改进的Eclat数据挖掘算法的研究[J].微计算机信息,2008,4:92-94
[8] Wur S,Leu Y．An Effective Boolean Algorithm for Mining Association Rules in Large Databases[C]∥Proc．of the 6th Int．Conf．on Database Systems for Advanced Applications．1999:179-186
[9] 傅向华,陈冬剑,王志强.基于倒排索引位运算的深度优先频繁项集挖掘[J].小型微型计算机系统,2012(8):1747-1751
[10] 熊忠阳,陈培恩,张玉芳.基于散列布尔矩阵的关联规则Eclat改进算法[J].计算机应用研究,2010(4):1323-1325
[11] Agrawal R,Shafer J C．Parallel Mining of Association Rules[J]．IEEE Transactions on Knowledge and Data Engineering,1996,8(6):962-969
[12] 张诤,王惠文．一种高效的并行频繁集挖掘算法[J]．计算机工程,2008(11):55-57,60
[13] Zhang Y,Zhang F,Bakos J．Frequent Itemset Mining on Large-Scale Shared Memory Machines[C]∥IEEE International Conference on Cluster Computing．2011:585-589
[14] 黄立勤,柳燕煌．基于MapReduce并行的Apriori算法改进研究[J]．福州大学学报:自然科学版,2011(5):680-685
[15] Threading Building Blocks SDK[EB/OL]．http://www.threa-dingbuildingblocks.org
[16] Java SDK[EB/OL]．http://www.oracle.com

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed