一种基于MapReduce模型的高效频繁项集挖掘算法

doi:10.11896/j.issn.1002-137X.2017.07.006

Abstract

Abstract: Along with the rapid development of Internet and the rapid growing group of users,many Internet services companies have to manage TB size or higher amount of data every day.How to find useful information in this big data era is becoming an important problem.The data mining algorithm has been widely used in many fields,and finding frequent itemsets is one of the most common and primary applications of data mining,and Apriori algorithm is the most typical algorithm for finding frequent itemsets from a big transaction database.However,when the dataset size is rather huge or a single host is used,the memory would be quickly exhausted and the computation time would also increase dramatically,which make the algorithm performance inefficient.Parallel and distributed computing based on the MapReduce framework has been proposed.An improved reformative MMRA (Matrix MapReduce Algorithm) algorithm which should convert the blocked data into matrixs to find all frequent k-itemsets was proposed in this paper,and the proposed algorithm was compared with current two existed algorithms(one-phase algorithm and k-phase algorithm).Adapting Hadoop-MapReduce as the experiment platform,parallel and distributed computing offer a potential solution for processing vast amount of data.Experimental results show that the proposed algorithm outperforms the other two algorithms.

Key words: Hadoop,MapReduce,Distributed computing,Data mining,Frequent itemset mining,Apriori algorithm

ZHU Kun, HUANG Rui-zhang and ZHANG Na-na. Efficient Frequent Patterns Mining Algorithm Based on MapReduce Model[J].Computer Science, 2017, 44(7): 31-37.

References

[1] HE Y,WANG W Q,XUE F.Study of Massive Data MiningBased on Cloud Computing[J].Computer Technology and Development,2013,23(2):69-72.(in Chinese) 贺瑶,王文庆,薛飞.基于云计算的海量数据挖掘研究[J].计算机技术与发展,2013,23(2):69-72.
[2] TIAN S,WANG S,LIU Y,et al.An algorithm for mining frequent itemsets on uncertain dataset[J].Computer Modelling and New Technologies,2014:264-272.
[3] CHANG R,LIU Z.An improved apriori algorithm[C]∥2011 International Conference on Electronics and Optoelectronics.2011:1476-1478.
[4] AGRAWAL R,IMIELINSKI T,SWAMI A.Mining Association Rule between Sets of Items in Large Databases[C]∥ACM SIGMOD Conf.Management of Data,1993:207-216.
[5] LI W W,ZHAO H,ZHANG Y,et al.Research on massive data mining based on MapReduce[J].Computer Engineering and Applications,2013,49(20):112-117.(in Chinese) 李伟卫,赵航,张阳,等.基于MapReduce的海量数据挖掘技术研究[J].计算机工程与应用,2013,49(20):112-117.
[6] LIU S H,LIU S J,CHEN X S,et al.IOMRA-- A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce[C]∥IEEE 17th International Conference on Computation Model.2014:1290-1295.
[7] YANG X Y,LIU Z,FU Z.MapReduce as a Programming Model for Association Rules Algorithm on Hadoop[C]∥The 3rd International Conference on Information Sciences and Interaction Sciences.Chengdu,China,IEEE,2010:99-102.
[8] YE Y B,CHANG C C.A Parallel Apriori Algorithm for Frequent Itemsets Mining[C]∥Fourth International Conference Software Engineering Research,Managementand Applications.2006:87-94.
[9] KOVACS F,ILLES J.Frequent itemset mining on hadoop.Com-putational Cybernetics (ICCC)[C]∥The IEEE 9th Internatio-nal Conference.2013:241-245.
[10] YU K M,LEE M G,HUANG Y S,et al.An Efficient Frequent Patterns Mining Algorithm Based on MapReduce Framework[C]∥ International Conference on Software Intelligence Techno-logies and Applications & International Conference on Frontiers of Internet of Things.2014:1-5.
[11] LI L,ZHANG M.The Strategy of Mining Association RuleBased on Cloud Computing[C]∥The International Conference on Business Computing and Global Informatization.Washington,DC,USA,2011:475-478.
[12] LI N,ZENG L,HE Q,et al.Parallel Implementation of Apriori Algorithm Based on MapReduce[C]∥The 13th ACIS International Conference on Software Engineering,Networking and Parallel& Distributed Computing.Kyoto,2012:236-241.
[13] NING Y H.Study of Some Techniques in Data Mining Based on Spark[D].Hangzhou:China Jiliang University,2015.(in Chinese) 宁永恒.基于Spark的若干数据挖掘技术研究[D].杭州:中国计量学院,2015.
[14] LIU D D,CHEN J,LIANG F,et al.A Performance Analysis for Hadoop under Heterogeneous Cloud Computing Environments[J].Journal of Interation Technology,2012,1(4):46-51.(in Chinese) 刘丹丹,陈俊,梁锋,等.云计算异构环境下Hadoop性能分析[J].集成技术,2012,1(4):46-51.
[15] DONG X H,LI R X,ZHOU W W,et al.Performance Optimization and Feature Enhancements of Hadoop System[J].Journal of Computer Research and Development,2013,50(S2):1-15.(in Chinese) 董新华,李瑞轩,周湾湾,等.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15.
[16] YING Y,LIU Y J.Survey of Developments of MapReduce Parallel Computing Technology[J].Computer System Application,2014,23(4):1-6.(in Chinese) 应毅,刘亚军.MapReduce并行计算技术发展综述[J].计算机系统应用,2014,23(4):1-6.
[17] KHARE N,ADLAKHA N,PARDASANI K R.An Algorithm Ior Mining Multidimensional Association Rules using Boolean Matrix[C]∥ 2010 International Conference in Recent Trends in Information,Telecommun ication and Computing.Kochi,Kerala,2010:95-99.
[18] LIU H Z,DAI S P,JIANG H.Quantitative association rulesmining algorithm based on matrix[C]∥2009 International Conference on Computational Intelligence and Software Enginee-ring.Wuhan,2009:1-4.
[19] TIAN S,WANG S,LIU Y,et al.An algorithm for mining frequent itemsets on uncertain dataset[J].Computer Modelling and New Technologies,2014,18(11):264-272.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Efficient Frequent Patterns Mining Algorithm Based on MapReduce Model

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0