一种基于Hadoop的关联规则挖掘算法

Abstract

Abstract: The traditional parallel association rule algorithm defines a MapReduce task for each iteration to implement the generation and counting function of the candidate set,but multiple startup of the MapReduce task brings great performance overhead.This paper defined a parallel association rule mining algorithm (PST-Apriori).This algorithm adopts a partition strategy,defines a prefix shared tree in each distributed computing node,and compresses the candidate items generated by each transaction T to the prefix shared tree (PST).Then the breadth traversal algorithm is used,and the 〈key,value〉 corresponding to each node are used as input of the map function,and the MapReduce frame is automatically gathered according to the key value.Finally,the reduce function is called to aggregate the processing results of multiple tasks,and the frequent itemsets satisfying the minimum support threshold are obtained.The algorithm only usestwo MapReduce tasks,and PST is sorted according to key value to facilitate shuffle operation at Mapper,which improves the efficiency of operation.

Key words: Association rule, Hadoop, MapReduce, Prefix shared tree

CLC Number:

TP311

DING Yong, ZHU Chang-shui, WU Yu-yan. Association Rule Mining Algorithm Based on Hadoop[J].Computer Science, 2018, 45(11A): 409-411.

References

[1]AGRAWAL R,SRIKANT R.Fast algorithms for mining associa-tion rules(3rd ed)[M]∥Readings in Database Systems.Morgan Kaufmann Publishers Inc.,1998:2299-2308.
[2]HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[C]∥ACM SIGMOD International Conference on Management of Data.ACM,2000:1-12.
[3]LI L,ZHANG M.The Strategy of Mining Association Rule Based on Cloud Computing[C]∥International Conference on Business Computing and Global Informatization.IEEE,2011:475-478.
[4]LI N,ZENG L,HE Q,et al.Parallel Implementation of Apriori Algorithm Based on MapReduce[C]∥Acis International Conference on Software Engineering,Artificial Intelligence,Networking and Parallel & Distributed Computing.IEEE,2012:236-241.
[5]ZHOU X,HUANG Y.An Improved Parallel Association Rules Algorithm Based on MapReduce Framework for Big Data[C]∥International Conference on Fuzzy Systems and Knowledge Discovery.IEEE,2014:284-288.
[6]郝天曙.基于Hadoop的并行数据挖掘的研究[D].南京:南京邮电大学,2017.
[7]张玲.基于Hadoop平台并行关联规则挖掘算法研究[D].西安:西安科技大学,2017.
[8]荀亚玲.集群环境下的关联规则挖掘及应用[D].太原:太原科技大学,2017.
[9]于跃.基于Hadoop平台的并行化分布式关联规则挖掘算法研究[D].吉林:吉林大学,2017.
[10]李若晨.基于并行的Apriori数据挖掘算法的研究[D].吉林:吉林大学,2017.
[11]叶璐.基于Spark的改进关联规则算法研究[D].太原:太原科技大学,2017.
[12]马连灯.基于HADOOP平台的并行关联规则算法研究[D].天津:天津工业大学,2017.

Related Articles 15

[1]	LIU Wei-ming, AN Ran, MAO Yi-min. Parallel Support Vector Machine Algorithm Based on Clustering and WOA [J]. Computer Science, 2022, 49(7): 64-72.
[2]	CAO Yang-chen, ZHU Guo-sheng, SUN Wen-he, WU Shan-chao. Study on Key Technologies of Unknown Network Attack Identification [J]. Computer Science, 2022, 49(6A): 581-587.
[3]	TIAN Bing-chuan, TIAN Chen, ZHOU Yu-hang, CHEN Gui-hai, DOU Wan-chun. Reducing Head-of-Line Blocking on Network in Hadoop Clusters [J]. Computer Science, 2022, 49(3): 11-22.
[4]	XU Hui-hui, YAN Hua. Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children [J]. Computer Science, 2021, 48(6): 210-214.
[5]	SHEN Xia-jiong, YANG Ji-yong, ZHANG Lei. Attribute Exploration Algorithm Based on Unrelated Attribute Set [J]. Computer Science, 2021, 48(4): 54-62.
[6]	ZHANG Yuan-ming, YU Jia-rui, JIANG Jian-bo, LU Jia-wei, XIAO Gang. Intermediate Data Transmission Pipeline Optimization Mechanism for MapReduce Framework [J]. Computer Science, 2021, 48(2): 41-46.
[7]	CUI Wei, JIA Xiao-lin, FAN Shuai-shuai and ZHU Xiao-yan. New Associative Classification Algorithm for Imbalanced Data [J]. Computer Science, 2020, 47(6A): 488-493.
[8]	ZHANG Su-mei and ZHANG Bo-tao. Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization [J]. Computer Science, 2020, 47(6A): 84-88.
[9]	CHEN Meng-hui, CAO Qian-feng and LAN Yan-qi. Heuristic Algorithm Based on Block Mining and Recombination for Permutation Flow-shop Scheduling Problem [J]. Computer Science, 2020, 47(6A): 108-113.
[10]	WANG Qing-song, JIANG Fu-shan, LI Fei. Multi-label Learning Algorithm Based on Association Rules in Big Data Environment [J]. Computer Science, 2020, 47(5): 90-95.
[11]	ZHU An-qing, LI Shuai, TANG Xiao-dong. Parallel FP_growth Association Rules Mining Method on Spark Platform [J]. Computer Science, 2020, 47(12): 139-143.
[12]	HAN Cheng-cheng, LIN Qiang, MAN Zheng-xing, CAO Yong-chun, WANG Hai-jun, WANG Wei-lan. Mining Nuclear Medicine Diagnosis Text for Correlation Extraction Between Lesions and Their Representations [J]. Computer Science, 2020, 47(11A): 524-530.
[13]	WANG Tong, MA Wen-ping, LUO Wei. Information Sharing and Secure Multi-party Computing Model Based on Blockchain [J]. Computer Science, 2019, 46(9): 162-168.
[14]	LU Xin-yun, WANG Xing-fen. Educational Administration Data Mining of Association Rules Based on Domain Association Redundancy [J]. Computer Science, 2019, 46(6A): 427-430.
[15]	ZHANG Wei-guo. Decision Making of Course Selection Oriented by Knowledge Recommendation Service [J]. Computer Science, 2019, 46(6A): 507-510.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Association Rule Mining Algorithm Based on Hadoop

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0