Computer Science ›› 2017, Vol. 44 ›› Issue (7): 31-37.doi: 10.11896/j.issn.1002-137X.2017.07.006

Previous Articles     Next Articles

Efficient Frequent Patterns Mining Algorithm Based on MapReduce Model

ZHU Kun, HUANG Rui-zhang and ZHANG Na-na   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Along with the rapid development of Internet and the rapid growing group of users,many Internet services companies have to manage TB size or higher amount of data every day.How to find useful information in this big data era is becoming an important problem.The data mining algorithm has been widely used in many fields,and finding frequent itemsets is one of the most common and primary applications of data mining,and Apriori algorithm is the most typical algorithm for finding frequent itemsets from a big transaction database.However,when the dataset size is rather huge or a single host is used,the memory would be quickly exhausted and the computation time would also increase dramatically,which make the algorithm performance inefficient.Parallel and distributed computing based on the MapReduce framework has been proposed.An improved reformative MMRA (Matrix MapReduce Algorithm) algorithm which should convert the blocked data into matrixs to find all frequent k-itemsets was proposed in this paper,and the proposed algorithm was compared with current two existed algorithms(one-phase algorithm and k-phase algorithm).Adapting Hadoop-MapReduce as the experiment platform,parallel and distributed computing offer a potential solution for processing vast amount of data.Experimental results show that the proposed algorithm outperforms the other two algorithms.

Key words: Hadoop,MapReduce,Distributed computing,Data mining,Frequent itemset mining,Apriori algorithm

[1] HE Y,WANG W Q,XUE F.Study of Massive Data MiningBased on Cloud Computing[J].Computer Technology and Development,2013,23(2):69-72.(in Chinese) 贺瑶,王文庆,薛飞.基于云计算的海量数据挖掘研究[J].计算机技术与发展,2013,23(2):69-72.
[2] TIAN S,WANG S,LIU Y,et al.An algorithm for mining frequent itemsets on uncertain dataset[J].Computer Modelling and New Technologies,2014:264-272.
[3] CHANG R,LIU Z.An improved apriori algorithm[C]∥2011 International Conference on Electronics and Optoelectronics.2011:1476-1478.
[4] AGRAWAL R,IMIELINSKI T,SWAMI A.Mining Association Rule between Sets of Items in Large Databases[C]∥ACM SIGMOD Conf.Management of Data,1993:207-216.
[5] LI W W,ZHAO H,ZHANG Y,et al.Research on massive data mining based on MapReduce[J].Computer Engineering and Applications,2013,49(20):112-117.(in Chinese) 李伟卫,赵航,张阳,等.基于MapReduce的海量数据挖掘技术研究[J].计算机工程与应用,2013,49(20):112-117.
[6] LIU S H,LIU S J,CHEN X S,et al.IOMRA-- A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce[C]∥IEEE 17th International Conference on Computation Model.2014:1290-1295.
[7] YANG X Y,LIU Z,FU Z.MapReduce as a Programming Model for Association Rules Algorithm on Hadoop[C]∥The 3rd International Conference on Information Sciences and Interaction Sciences.Chengdu,China,IEEE,2010:99-102.
[8] YE Y B,CHANG C C.A Parallel Apriori Algorithm for Frequent Itemsets Mining[C]∥Fourth International Conference Software Engineering Research,Managementand Applications.2006:87-94.
[9] KOVACS F,ILLES J.Frequent itemset mining on hadoop.Com-putational Cybernetics (ICCC)[C]∥The IEEE 9th Internatio-nal Conference.2013:241-245.
[10] YU K M,LEE M G,HUANG Y S,et al.An Efficient Frequent Patterns Mining Algorithm Based on MapReduce Framework[C]∥ International Conference on Software Intelligence Techno-logies and Applications & International Conference on Frontiers of Internet of Things.2014:1-5.
[11] LI L,ZHANG M.The Strategy of Mining Association RuleBased on Cloud Computing[C]∥The International Conference on Business Computing and Global Informatization.Washington,DC,USA,2011:475-478.
[12] LI N,ZENG L,HE Q,et al.Parallel Implementation of Apriori Algorithm Based on MapReduce[C]∥The 13th ACIS International Conference on Software Engineering,Networking and Parallel& Distributed Computing.Kyoto,2012:236-241.
[13] NING Y H.Study of Some Techniques in Data Mining Based on Spark[D].Hangzhou:China Jiliang University,2015.(in Chinese) 宁永恒.基于Spark的若干数据挖掘技术研究[D].杭州:中国计量学院,2015.
[14] LIU D D,CHEN J,LIANG F,et al.A Performance Analysis for Hadoop under Heterogeneous Cloud Computing Environments[J].Journal of Interation Technology,2012,1(4):46-51.(in Chinese) 刘丹丹,陈俊,梁锋,等.云计算异构环境下Hadoop性能分析[J].集成技术,2012,1(4):46-51.
[15] DONG X H,LI R X,ZHOU W W,et al.Performance Optimization and Feature Enhancements of Hadoop System[J].Journal of Computer Research and Development,2013,50(S2):1-15.(in Chinese) 董新华,李瑞轩,周湾湾,等.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15.
[16] YING Y,LIU Y J.Survey of Developments of MapReduce Parallel Computing Technology[J].Computer System Application,2014,23(4):1-6.(in Chinese) 应毅,刘亚军.MapReduce并行计算技术发展综述[J].计算机系统应用,2014,23(4):1-6.
[17] KHARE N,ADLAKHA N,PARDASANI K R.An Algorithm Ior Mining Multidimensional Association Rules using Boolean Matrix[C]∥ 2010 International Conference in Recent Trends in Information,Telecommun ication and Computing.Kochi,Kerala,2010:95-99.
[18] LIU H Z,DAI S P,JIANG H.Quantitative association rulesmining algorithm based on matrix[C]∥2009 International Conference on Computational Intelligence and Software Enginee-ring.Wuhan,2009:1-4.
[19] TIAN S,WANG S,LIU Y,et al.An algorithm for mining frequent itemsets on uncertain dataset[J].Computer Modelling and New Technologies,2014,18(11):264-272.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .