Computer Science ›› 2017, Vol. 44 ›› Issue (7): 262-266.doi: 10.11896/j.issn.1002-137X.2017.07.046

Previous Articles     Next Articles

Reseach on Improved Apriori Algorithm Based on Hadoop

HUANG Jian, LI Ming-qi and GUO Wen-qiang   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Traditional mining based on parallel Apriori algorithms needs much more time in data IO with the increasing size of large transaction database.In this paper,we improved Apriori algorithm in three aspects:compression in the transaction,reducing the number of scanning areas,and simplifying the candidate set generation .We proposed “0” and “1” as the entries to describe the transaction Boolean matrix model,and introduced the weight dimensions to compress the matrix size of the transaction.Meanwhile,dynamic pruning matrix is adopted,and “and” operation of matrix is applied to generate a candidate set.The experiments of the improved algorithm running parallel in Hadoop framework show that the algorithm is suitable for large-scale data mining,and the algorithm has good scalability and effectiveness.

Key words: Apriori algorithm,Transaction database,Boolean matrix,Hadoop

[1] 董志安,纪希禹,韩秋明,等.数据挖掘技术与应用实例[M].北京:机械工业出版社,2009:12-48.
[2] XIE Y G,LIU R,QIAO R,et al.Resarch Literature on Big Data and Public Opinion[J].New media and society,2014(4):133-157.(in Chinese) 谢耘耕,刘锐,乔睿,等.大数据与社会舆情研究综述[J].新媒体与社会,2014(4):133-154.
[3] AGRAWAL R,SRIKANT R.Fast algorithm for mining association rules[C]∥Proceedings of 20th Int.Conf.Very Large Data Bases (VLDB).Morgan Kaufman Press,1994:487-499.
[4] REN W J,YU B W.Improved Apriori Algorithm Based on Ma-trix Reducation[J].Computer and Modern,2015,0:2-3.(in Chinese) 任伟建,于博文.基于矩阵简约的Apriori算法改进[J].计算机与现代化,2015,0:2-3.
[5] GUNARATHNE T,WU T L,QIU J,et al.MapReduce in the Clouds for Science[C]∥2010 IEEE Second International Con-ference on Cloud Computing Technology and Science(CloudCom).IEEE,2010:565-572.
[6] 陆嘉恒.Hadoop实战[M].北京:机械工业出版社,2011:17-128.
[7] DEAN J,GHEMAWAT S.MapReduce:simplified data proces-sing on large clusters [J].Communications of the ACM,2008,1(1):107-113.
[8] LI W J,ZHAO H,ZHANG Y,et al.Research on massive data mining technology based on MapReduce[J].Computer Engineering and Applications,2013(20):113-114.(in Chinese) 李伟卫,赵航,张阳,等.基于MapReduce的海量数据挖掘技术研究[J].计算机工程与应用,2013(20):113-114.
[9] CHANG F,DEAN J,GHEMAWAT S,et al.Bigtable:A distri-buted storage system for structured data [J].ACM Transactions on Computer Systems (TOCS),2008,6(2):4-10.
[10] LIU S J.Research of Frequent Itemsets Mining AlogorithmBased on MapReduce Calculation Model[D].Harbin:Harbin University of Science and Technology,2015:39-50.(in Chinese) 刘士佳.基于MapReduce框架的频繁项集挖掘算法研究[D].哈尔滨:哈尔滨理工大学,2015:39-50.
[11] KUANG S H,LI B.Analysis Cloud Computing Architecture and its Application[J].Computer&Digital Engineering,2010,3(6):61-63.(in Chinese) 框胜徽,李勃.云计算体系结构及应用实例分析[J].计算机与数字工程,2010,3(6):61-63.
[12] BONCHI F,LUCCHESE C.Pushing tougher constraints in fre-quent patternvmining[M]∥Advances in Knowledge Discovery and Data Mining.Springer Berlin Heidelberg,2005:114-124.
[13] BCHEUNG D W,HAN J,NG V T,et al.Maintenance of discovered association rules in largedatabases:An incremental updating technique[C]∥Proceedings of the Twelfth International Conference on Data Engineering,1996.IEEE,1996,6-114.

No related articles found!
Full text



No Suggested Reading articles found!