计算机科学 ›› 2017, Vol. 44 ›› Issue (7): 31-37.doi: 10.11896/j.issn.1002-137X.2017.07.006
• 2016 年全国理论计算机科学学术年会 • 上一篇 下一篇
朱坤,黄瑞章,张娜娜
ZHU Kun, HUANG Rui-zhang and ZHANG Na-na
摘要: 由于互联网技术急速发展及其用户迅速地增加,很多网络服务公司每天不得不处理TB级甚至更大规模的数据量。在如今的大数据时代,如何挖掘有用的信息正变成一个重要的问题。关于数据挖掘(Data Mining)的算法在很多领域中已经被广泛运用,挖掘频繁项集是数据挖掘中最常见且最主要的应用之一,Apriori则是从一个大的数据集中挖掘出频繁项集的最为典型的算法。然而,当数据集比较大或使用单一主机时,内存将会被快速消耗,计算时间也将急剧增加,使得算法性能较低,基于MapReduce的分布式和并行计算则被提出。文中提出了一种改进的MMRA (Matrix MapReduce Algorithm)算法,它通过将分块数据转换成矩阵来挖掘所有的频繁k项集;然后将提出的算法和目前已经存在的两种算法(one-phase算法、k-phase算法)进行比较。采用Hadoop-MapReduce作为实验平台,并行和分布式计算为处理大数据集提供了一个潜在的解决方案。实验结果表明,改进算法的性能优于其他两种算法。
[1] HE Y,WANG W Q,XUE F.Study of Massive Data MiningBased on Cloud Computing[J].Computer Technology and Development,2013,23(2):69-72.(in Chinese) 贺瑶,王文庆,薛飞.基于云计算的海量数据挖掘研究[J].计算机技术与发展,2013,23(2):69-72. [2] TIAN S,WANG S,LIU Y,et al.An algorithm for mining frequent itemsets on uncertain dataset[J].Computer Modelling and New Technologies,2014:264-272. [3] CHANG R,LIU Z.An improved apriori algorithm[C]∥2011 International Conference on Electronics and Optoelectronics.2011:1476-1478. [4] AGRAWAL R,IMIELINSKI T,SWAMI A.Mining Association Rule between Sets of Items in Large Databases[C]∥ACM SIGMOD Conf.Management of Data,1993:207-216. [5] LI W W,ZHAO H,ZHANG Y,et al.Research on massive data mining based on MapReduce[J].Computer Engineering and Applications,2013,49(20):112-117.(in Chinese) 李伟卫,赵航,张阳,等.基于MapReduce的海量数据挖掘技术研究[J].计算机工程与应用,2013,49(20):112-117. [6] LIU S H,LIU S J,CHEN X S,et al.IOMRA-- A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce[C]∥IEEE 17th International Conference on Computation Model.2014:1290-1295. [7] YANG X Y,LIU Z,FU Z.MapReduce as a Programming Model for Association Rules Algorithm on Hadoop[C]∥The 3rd International Conference on Information Sciences and Interaction Sciences.Chengdu,China,IEEE,2010:99-102. [8] YE Y B,CHANG C C.A Parallel Apriori Algorithm for Frequent Itemsets Mining[C]∥Fourth International Conference Software Engineering Research,Managementand Applications.2006:87-94. [9] KOVACS F,ILLES J.Frequent itemset mining on hadoop.Com-putational Cybernetics (ICCC)[C]∥The IEEE 9th Internatio-nal Conference.2013:241-245. [10] YU K M,LEE M G,HUANG Y S,et al.An Efficient Frequent Patterns Mining Algorithm Based on MapReduce Framework[C]∥ International Conference on Software Intelligence Techno-logies and Applications & International Conference on Frontiers of Internet of Things.2014:1-5. [11] LI L,ZHANG M.The Strategy of Mining Association RuleBased on Cloud Computing[C]∥The International Conference on Business Computing and Global Informatization.Washington,DC,USA,2011:475-478. [12] LI N,ZENG L,HE Q,et al.Parallel Implementation of Apriori Algorithm Based on MapReduce[C]∥The 13th ACIS International Conference on Software Engineering,Networking and Parallel& Distributed Computing.Kyoto,2012:236-241. [13] NING Y H.Study of Some Techniques in Data Mining Based on Spark[D].Hangzhou:China Jiliang University,2015.(in Chinese) 宁永恒.基于Spark的若干数据挖掘技术研究[D].杭州:中国计量学院,2015. [14] LIU D D,CHEN J,LIANG F,et al.A Performance Analysis for Hadoop under Heterogeneous Cloud Computing Environments[J].Journal of Interation Technology,2012,1(4):46-51.(in Chinese) 刘丹丹,陈俊,梁锋,等.云计算异构环境下Hadoop性能分析[J].集成技术,2012,1(4):46-51. [15] DONG X H,LI R X,ZHOU W W,et al.Performance Optimization and Feature Enhancements of Hadoop System[J].Journal of Computer Research and Development,2013,50(S2):1-15.(in Chinese) 董新华,李瑞轩,周湾湾,等.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15. [16] YING Y,LIU Y J.Survey of Developments of MapReduce Parallel Computing Technology[J].Computer System Application,2014,23(4):1-6.(in Chinese) 应毅,刘亚军.MapReduce并行计算技术发展综述[J].计算机系统应用,2014,23(4):1-6. [17] KHARE N,ADLAKHA N,PARDASANI K R.An Algorithm Ior Mining Multidimensional Association Rules using Boolean Matrix[C]∥ 2010 International Conference in Recent Trends in Information,Telecommun ication and Computing.Kochi,Kerala,2010:95-99. [18] LIU H Z,DAI S P,JIANG H.Quantitative association rulesmining algorithm based on matrix[C]∥2009 International Conference on Computational Intelligence and Software Enginee-ring.Wuhan,2009:1-4. [19] TIAN S,WANG S,LIU Y,et al.An algorithm for mining frequent itemsets on uncertain dataset[J].Computer Modelling and New Technologies,2014,18(11):264-272. |
No related articles found! |
|