计算机科学 ›› 2017, Vol. 44 ›› Issue (6): 250-254.doi: 10.11896/j.issn.1002-137X.2017.06.043

• 人工智能 • 上一篇    下一篇

基于MapReduce的改进的Apriori算法及其应用研究

赵月,任永功,刘洋   

  1. 辽宁师范大学计算机与信息技术学院 大连116029,辽宁师范大学计算机与信息技术学院 大连116029,辽宁师范大学计算机与信息技术学院 大连116029
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(F020806),辽宁省高等学校优秀人才支持计划项目(LR2015033),辽宁省科技计划项目(2013405003),大连市科技计划项目(2013A16GX116)资助

Improved Apriori Algorithm and Its Application Based on MapReduce

ZHAO Yue, REN Yong-gong and LIU Yang   

  • Online:2018-11-13 Published:2018-11-13

摘要: 随着移动通信和互联网技术的迅猛发展,如何高效地分析移动用户的需求并及时推送有用信息成为数据挖掘领域的热点之一。针对上述问题,提出一种基于云计算Hadoop平台的分布式关联规则MRS-Apriori算法。该方法在经典Apriori算法的基础上优化了数据库编码规则,增加了判断标记Judgemark来判断事务项是否频繁,提高了MRS-Apriori算法在连接时扫描数据库的效率。在编码的基础上,采用Hadoop平台下的MapReduce编程框架模型实现并行化处理,提高了迭代时连接步骤的效率,降低了大规模数据样本运算的时间开销。实验结果表明,改进的MRS-Apriori算法可以有效地减少运算时间,在处理大规模数据集上具有较高的准确性。

关键词: 编码规则,关联规则,频繁项集,MapReduce框架

Abstract: With the rapid development of mobile communications and Internet technology,it becomes one of the hot issues in the field of data mining that how to analyze the requirements of mobile users efficiently and send useful informations in time.In order to recommend the analysis result to users efficiently and timely,a mining method named MRS-Apriori algorithm based on MapReduce was proposed.This method defines a kind of coding rule to optimize database based on classical Apriori algorithm.A judging mark named Judgemark is added to database to decide whether the transaction database is frequent.This mechanism improves the efficiency of MRS-Apriroi algorithm in connecting database to scan database efficiently.On the basis of encoding rules,the MRS-Apriroi algorithm uses MapReduce programming framework model under Hadoop to achieve parallel processing.It improves the performance of iteration when connecting process and reduces the time in dealing with large-scale data.The experiment results show that MRS-Apriroi algorithm can effectively reduce time and have high accuracy in handling large data sets.

Key words: Coding rules,Association rules,Frequent itemsets,MapReduce framework

[1] HUANG Y B,CHEN M Y.Architecture Characteristics andAnalysis of Mobile Device Applications[J].Chinese Journal of Computers,2015,8(2):386-396.(in Chinese) 黄永兵,陈明宇.移动设备应用程序的体系结构特征分析[J].计算机学报,2015,8(2):386-396.
[2] MENG X W,HU X,WANG L C,et al.Mobile recommendersystems and their applications[J].Journal of Software,2013,4(1):91-108.(in Chinese) 孟祥武,胡勋,王立才,等.移动推荐系统及其应用[J].软件学报,2013,4(1):91-108.
[3] AGRAWAL R,IMIELIMSKI T,SWAMI A.Mining Associa-tion Rules between sets of items in large databases[C]∥Proceedings of the ACM SIGMOD Conference on Management of Data.Washington DC,1993:207-216.
[4] AGRAWA A,SRIKANT R.Fast algorithms for mining association rules[C]∥Proceedings of the VLDB International Confe-rence.1994:487-499.
[5] SCHLEGEL B,KIEFER T,KiSSINGER T.pcApriori:Scalable Apriori for Multiprocessor Systems[C]∥Proceedings of International Conference on Scientific and Statistical Database Ma-nagement.2013:1-12.
[6] GUO J,RENG Y G.Research on association rule mining inBook sales under cloud computing environment[J].Computer Applications and Software,2014,1(11):50-53.(in Chinese) 郭健,任永功.云计算环境下的关联规则挖掘在图书销售中的研究[J].计算机应用与软件,2014,1(11):50-53.
[7] LUO D,LI T S.Research on Improved Apriori Algorithm Based on Compressed Matrix[J].Computer Science,2013,0(12):75-80.(in Chinese) 罗丹,李陶深.一种基于压缩矩阵的Apriori算法改进研究[J].计算机科学,2013,0(12):75-80.
[8] WANG B L,SHEN Y G.Improvement of Apriori algorithm basedon boolean matrix[J].Adwanced Materials Research,2011,9:144-148.
[9] LIN M Y,LEE P Y,HSUEH S C.Apriori-based Frequent Itemset Mining Algorithm on Mapreduce[C]∥Proceedings of the 2nd International Conference on Ubiquitous Management and Communication.2012:1-8.
[10] LAZCORRETA E,BOTELLA F,FERNDEZ-CABALLEROA.Towards personalized recommendation by two-step modified Apriori data mining algorithm[J].Expert Systems with Applications,2008,5(3):1422-1429.
[11] TANG J W,WANG X F.Design and Implementation of Apriori on GPU[J].Computer Science,2014,1(10):238-243.(in Chinese) 唐家维,王晓峰.基于GPU的并行化Apriori算法的设计与实现[J].计算机科学,2014,1(10):238-243.
[12] LIU D Y,FENG J,LI X F.Logic-based Frequent SequentialPattern Mining Algorithm[J].Computer Science,2015,2(5):260-264.(in Chinese) 刘端阳,冯建,李晓粉.一种基于逻辑的频繁序列模式挖掘算法[J].计算机科学,2015,2(5):260-264.
[13] 韩家炜,等.数据挖掘概念与技术(第3版)[M].范明,等译.北京:机械工业出版社,2012:158-162.
[14] OLIVEIRA S R M,ZAIANE O R.A unified framework for protecting sensitive association rules in business collaboration [J].International Journal of Business Intelligence and Data Mining,2006,1(3):247-287.
[15] JEFFREY D,SANJAY G.Mapreduce:Simplified Data Proces-sing on Large Clusters[J].Proceedings of the Sixth Symposium on Operating System Design and Implementation,2004,1(1):107-113.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!