计算机科学 ›› 2017, Vol. 44 ›› Issue (7): 262-266.doi: 10.11896/j.issn.1002-137X.2017.07.046
黄剑,李明奇,郭文强
HUANG Jian, LI Ming-qi and GUO Wen-qiang
摘要: 对于规模庞大的事务数据库,传统的并行Apriori算法在挖掘中会在数据IO上有较大的时间开销。 从压缩事务、减少扫描次数、简化候选集生成3个方面对Apriori算法进行改进。提出了以元素“0”和“1”表示事务的布尔矩阵模型,并引入权值维度,压缩了相同事务的矩阵规模。同时,动态地进行剪枝,矩阵的“与”运算用于候选集合的生成。将改进后的算法在Hadoop框架上进行并行化实现,实验表明该算法适合大规模数据挖掘且具有良好的伸缩性与有效性。
[1] 董志安,纪希禹,韩秋明,等.数据挖掘技术与应用实例[M].北京:机械工业出版社,2009:12-48. [2] XIE Y G,LIU R,QIAO R,et al.Resarch Literature on Big Data and Public Opinion[J].New media and society,2014(4):133-157.(in Chinese) 谢耘耕,刘锐,乔睿,等.大数据与社会舆情研究综述[J].新媒体与社会,2014(4):133-154. [3] AGRAWAL R,SRIKANT R.Fast algorithm for mining association rules[C]∥Proceedings of 20th Int.Conf.Very Large Data Bases (VLDB).Morgan Kaufman Press,1994:487-499. [4] REN W J,YU B W.Improved Apriori Algorithm Based on Ma-trix Reducation[J].Computer and Modern,2015,0:2-3.(in Chinese) 任伟建,于博文.基于矩阵简约的Apriori算法改进[J].计算机与现代化,2015,0:2-3. [5] GUNARATHNE T,WU T L,QIU J,et al.MapReduce in the Clouds for Science[C]∥2010 IEEE Second International Con-ference on Cloud Computing Technology and Science(CloudCom).IEEE,2010:565-572. [6] 陆嘉恒.Hadoop实战[M].北京:机械工业出版社,2011:17-128. [7] DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters [J].Communications of the ACM,2008,1(1):107-113. [8] LI W J,ZHAO H,ZHANG Y,et al.Research on massive data mining technology based on MapReduce[J].Computer Engineering and Applications,2013(20):113-114.(in Chinese) 李伟卫,赵航,张阳,等.基于MapReduce的海量数据挖掘技术研究[J].计算机工程与应用,2013(20):113-114. [9] CHANG F,DEAN J,GHEMAWAT S,et al.Bigtable:A distri-buted storage system for structured data [J].ACM Transactions on Computer Systems (TOCS),2008,6(2):4-10. [10] LIU S J.Research of Frequent Itemsets Mining AlogorithmBased on MapReduce Calculation Model[D].Harbin:Harbin University of Science and Technology,2015:39-50.(in Chinese) 刘士佳.基于MapReduce框架的频繁项集挖掘算法研究[D].哈尔滨:哈尔滨理工大学,2015:39-50. [11] KUANG S H,LI B.Analysis Cloud Computing Architecture and its Application[J].Computer&Digital Engineering,2010,3(6):61-63.(in Chinese) 框胜徽,李勃.云计算体系结构及应用实例分析[J].计算机与数字工程,2010,3(6):61-63. [12] BONCHI F,LUCCHESE C.Pushing tougher constraints in fre-quent patternvmining[M]∥Advances in Knowledge Discovery and Data Mining.Springer Berlin Heidelberg,2005:114-124. [13] BCHEUNG D W,HAN J,NG V T,et al.Maintenance of discovered association rules in largedatabases:An incremental updating technique[C]∥Proceedings of the Twelfth International Conference on Data Engineering,1996.IEEE,1996,6-114. |
No related articles found! |
|