计算机科学 ›› 2016, Vol. 43 ›› Issue (1): 286-289.doi: 10.11896/j.issn.1002-137X.2016.01.061

• 人工智能 • 上一篇    下一篇

大数据环境下关联规则并行分层挖掘算法研究

张忠林,田苗凤,刘宗成   

  1. 兰州交通大学电子与信息工程学院 兰州730070,兰州交通大学电子与信息工程学院 兰州730070,兰州交通大学电子与信息工程学院 兰州730070
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目(61163010),甘肃省自然科学基金(1308RJZA194)资助

Parallel Hierarchical Association Rule Mining in Big Data Environment

ZHANG Zhong-lin, TIAN Miao-feng and LIU Zong-cheng   

  • Online:2018-12-01 Published:2018-12-01

摘要: 为满足大数据实时处理的需求,提出了一种基于划分的关联规则并行分层挖掘算法(Parallel Hierarchical Association Rule Mining,PHARM)。首先,将整个数据库D随机分割成若干个非重叠区域,并行挖掘出局部频繁项集;然后利用先验性质,连接局部频繁项集得全局候选项集;再次扫描D统计出每个候选项集的实际支持度,以确定全局频繁项集。最后,建模分析了该算法的高效性。

关键词: 大数据,划分,关联规则,并行分层挖掘,高效性

Abstract: To deal with big data’s demand of real-time processing,we proposed the parallel hierarchical association rule mining algorithm based on partitioning.First,the algorithm divides the transactions of D into n nonoverlapping partitions randomly,and all the local frequent itemsets mining is parallelized.Second,apriori property is utilized to collect frequent itemsets from all partitions and form the global candidate itemsets with respect to D.Then the actual support of each candidate is counted to determine the global frequent itemsets.At last,the algorithm’s high efficiency was analyzed by modeling.

Key words: Big data,Partition,Association rule,Parallel hierarchical mining,High efficiency

[1] Jacobs A.The Pathologies of Big Data[J].Communications of the ACM,2009,2(8):36-40
[2] Fan Wei,Bifet A.Mining Big Data:Current Status,and Forecast to the Future[J].SIGKDD Explorations,2013,14(2):1-5
[3] Jin Zong-ze,Feng Ya-lan,Ji Bo,et al.Data Mining Association in the Data Analysis [J].Computer & Digital Engineering,2014,0(42):1295-1296(in Chinese) 金宗泽,冯亚兰,纪博,等.大数据分析中的关联挖掘[J].计算机与数字工程,2014,0(42):1295-1296
[4] Yu Chu-li,Xiao Ying-yuan.Parallel association rules algorithm based on Hadoop[D].Tianjin:Tianjin University of Technology,2011:16-20(in Chinese) 余楚礼,肖迎元.基于Hadoop的并行关联规则算法研究[D].天津:天津理工大学,2011:16-20
[5] Li Jian-feng,Peng Jian.Task scheduling algorithm based on improved genetic algorithm in cloud computing environment[J].Journal of Computer Applications,2011,31(1):184-185(in Chinese) 李建锋,彭舰.云计算环境下基于改进遗传算法的任务调度算法[J].计算机应用,2011,1(1):184-185
[6] Mao Yu-xing,Chen Tong-bing,Shi Bai-le.Efficient method for mining multiple-level and generalized association rules[J].Journal of Software,2011,2(12):2965-2980(in Chinese) 毛宇星,陈彤兵,施伯乐.一种高效的多层和概化关联规则挖掘方法[J].软件学报,2011,2(12):2965-2980
[7] Feng Zhong-hui,Zhou Bing,Shen Jun-yi.A parallel hierarchical clustering algorithm for PCs cluster system[J].Neurocompu-ting,2007,70(4-6):809-818
[8] Chaitan B,Milind B,Raghunath N.Big Data Benchmarking [C]∥MBDS’12.San Jose,California,USA:ACM,2012:39-40
[9] Manakasemsak B,Benjamas N,Surarerks A,et al.Parallel Association Rule Mining based on FI-Growth Algorithm[C]∥ 2007 International Conference on Parallel and Distributed Systems.2007:1-8
[10] Meng Xiao-feng,Ci Xiang.Big data management:concepts,techniques and challenges[J].Journal of Computer Research and Development,2013,0(1):146-169(in Chinese) 孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,0(1):146-149
[11] Cuzzocrea A,Song I-Y,Davis K C.Analytics over Large-Scale Multidimensional Data:The Big Data Revolution[C]∥DOLAP’11.Glasgow,Scotland,UK:ACM,2011:101-103
[12] Zhou Jin,Hu Liang,Wang Feng,et al.An Efficient Multidimensional Fusion Algorithm for IoT Data Based on Partitioning[J].Tsinghua Science & Technology,2013,8(4):369-378
[13] Han Jia-wei,Kamber M,Pei Jian.Data mining:concepts and techniques(3th ED)[M].Fan M,Meng F,translated.Beijing:China Machine Press,2012:160-166(in Chinese) Han Jia-wei,Kamber M,Pei Jian.数据挖掘:概念与技术(第3版)[M].范明,孟小峰,译.北京:机械工业出版社,2012:160-161
[14] Tang Jia-wei,Wang Xiao-feng.Design and Implementation of Apriori on GPU[J].Computer Science,2014,1(10):238-239(in Chinese) 唐家维,王晓峰.基于GPU的并行化Apriori算法的设计与实现[J].计算机科学,2014,1(10):238-239

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!