Computer Science ›› 2020, Vol. 47 ›› Issue (12): 139-143.doi: 10.11896/jsjkx.191000110

Previous Articles     Next Articles

Parallel FP_growth Association Rules Mining Method on Spark Platform

ZHU An-qing1, LI Shuai2, TANG Xiao-dong3   

  1. 1 School of Management Jinan University Guangzhou 510000,China
    2 School of Computer Science and Engineering Beihang University Beijing 100191,China
    3 School of Economics and Management South,China Normal University Guangzhou 510006,China
  • Received:2019-10-16 Revised:2020-03-10 Online:2020-12-15 Published:2020-12-17
  • About author:ZHU An-qing,born in 1976Ph.Dassociate professor.Her main research interests include internet of things and enterprise management.
  • Supported by:
    Guangzhou Patent Technology Industrialization Project(201601010207),General Project of National Natural Science Foundation of China(61672077),National Key R&D Program(2017YFF0106407) and 2017 National Natural Science Foundation Youth Fund Project(61702026).

Abstract: In order to improve the efficiency of association rule mininga parallel FP_growth association rule mining method suitable for spark platform is proposed.Firstthe Spark platform is used to complete the traversal scan operation in the memory RDD of all nodes of the distributed system to obtain frequent sets in order to generate FP_Table and update FP_Tree.Thenthe time series is introduced to predict the itemsets to be minedso that all nodes in the distributed system can share the mining tasks in a balanced mannerso as to make full use of the traversal FP_Tree calculation function of each node to obtain the FP_growth association rule mining results.The experimental results show that compared to the single machine casethe parallelized FP_growth association rule mining improves the efficiency by about 60%.After the load balancing processthe mining efficiency of the FP_growth association rule is higherincreasing by about 14%which indicates that the traversal task allocation of each node is more balanced and the degree of parallelism is higher.

Key words: Association rules mining, FP_growth algorithm, Frequent sets, Load balancing, Spark platform

CLC Number: 

  • TP311.13
[1] BELALEM G,ABBACHE A,BELKREDIM F Z,et al.Arabic Query Expansion Using WordNet and Association Rules[J].International Journal of Intelligent Information Technologies,2016,12(3):51-64.
[2] MAI T,VO B,NGUYEN L T T.A lattice-based approach for mining high utility association rules[J].Information Sciences,2017,399:81-97.
[3] YU B,LIU S Q.An improved association rule mining algorithm based on FP-growth algorithm[J].Computer and Network,2017,43(14):68-71.
[4] LIN W T,CHU C P.Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments[J].Parallel Algorithms &Applications,2014,30(5):1-13.
[5] SHAO X Y,ZHANG L.An improved parallel algorithm for FP-Growth association rules based on Hadoop[J].Computer Applied Research,2018,35 (1).
[6] DIVYAVARMA K,REMYA M,DEEPA G.An Enhanced Bug Mining for Identifying Frequent Bug Pattern Using Word Tokenizer and FP-Growth[M]//Proceedings of the 5th International Conference on Frontiers in Intelligent Computing:Theory and Applications.2017:525-532.
[7] MOHAMED Y S,NAJIB M,ABDELAZIZ E,et al.APRICOIN:An adaptive approach for prioritizing high-risk containers inspections[J].IEEE Access,2017,5(99):18238-18249.
[8] XU F,LU H.The Application of FP-Growth Algorithm Based on Distributed Intelligence in Wisdom Medical Treatment[J].International Journal of Pattern Recognition &Artificial Intelligence,2017,31(4):232-237.
[9] CHEN J G,LI K L,MEMBER S,et al.A Parallel Random Fo-rest Algorithm for Big Data in a Spark Cloud Computing Environment[J].IEEE Transactions on Parallel &Distributed Systems,2017,28(4):919-933.
[10] SHI W,ZHU Y,YU P S,et al.Effective Prediction of Missing Data on Apache Spark over Multivariable Time Series[J].IEEE Transactions on Big Data,2018,4(4):473-486.
[11] ZHANG L Z,CUI Y,LUO G C,et al.Dynamic Load Balancing Algorithms for Large Data Distributed Storage[J].Computer Science,2017,44(5):178-183.
[12] LI Z Y,YU J,BIAN C,et al.Data flow dynamic load balancing strategy based on load perception[J].Computer Applications,2017,37(10):2760-2766.
[13] LUO J,LEI R,XUE L.Spatio-Temporal Load Balancing for Ener-gy Cost Optimization in Distributed Internet Data Centers[J].IEEE Transactions on Cloud Computing,2017,3(3):387-397.
[14] LI X,LI T.Design and implementation of recommendation system based on Spark[J].Computer Technology and development,2018,28(10):201-205.
[15] MESTRE D G,PIRES C E S,NASCIMENTOD C,et al.An efficient spark-based adaptive windowing for entity matching[J].Journal of Systems &Software,2017,128:1-10.
[16] CHEN J G,LI K L,MEMBER S,et al.A Parallel Random Fo-rest Algorithm for Big Data in a Spark Cloud Computing Environment[J].IEEE Transactions on Parallel &Distributed Systems,2017,28(4):919-933.
[17] YAN Y L,CHEN M,SADIQ S,et al.Efficient Imbalanced Multimedia Concept Retrieval by Deep Learning on Spark Clusters[J].International Journal of Multimedia Data Engineering &Management,2017,8(1):1-20.
[18] CAO N,WANG C,LI M,et al.Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data[J].IEEE Transactions on Parallel &Distributed Systems,2014,25(1):222-233.
[19] HU J,PAN H A.Improved incremental updating algorithm of association rules[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2018,30(4):558-563.
[1] TIAN Zhen-zhen, JIANG Wei, ZHENG Bing-xu, MENG Li-min. Load Balancing Optimization Scheduling Algorithm Based on Server Cluster [J]. Computer Science, 2022, 49(6A): 639-644.
[2] GAO Jie, LIU Sha, HUANG Ze-qiang, ZHENG Tian-yu, LIU Xin, QI Feng-bin. Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor [J]. Computer Science, 2022, 49(5): 355-362.
[3] TAN Shuang-jie, LIN Bao-jun, LIU Ying-chun, ZHAO Shuai. Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning [J]. Computer Science, 2022, 49(2): 336-341.
[4] XIA Zhong, XIANG Min, HUANG Chun-mei. Hierarchical Management Mechanism of P2P Video Surveillance Network Based on CHBL [J]. Computer Science, 2021, 48(9): 278-285.
[5] BAI Yong, ZHANG Zhan-long, XIONG Jun-di. Power Knowledge Text Mining Based on FP-Growth Algorithm and GRNN [J]. Computer Science, 2021, 48(8): 86-90.
[6] SONG Hai-ning, JIAO Jian, LIU Yong. Research on Mobile Edge Computing in Expressway [J]. Computer Science, 2021, 48(6A): 383-386.
[7] WANG Zheng, JIANG Chun-mao. Cloud Task Scheduling Algorithm Based on Three-way Decisions [J]. Computer Science, 2021, 48(6A): 420-426.
[8] ZHENG Zeng-qian, WANG Kun, ZHAO Tao, JIANG Wei, MENG Li-min. Load Balancing Mechanism for Bandwidth and Time-delay Constrained Streaming Media Server Cluster [J]. Computer Science, 2021, 48(6): 261-267.
[9] YAO Ze-wei, LIU Jia-wen, HU Jun-qin, CHEN Xing. PSO-GA Based Approach to Multi-edge Load Balancing [J]. Computer Science, 2021, 48(11A): 456-463.
[10] YANG Zi-qi, CAI Ying, ZHANG Hao-chen, FAN Yan-fang. Computational Task Offloading Scheme Based on Load Balance for Cooperative VEC Servers [J]. Computer Science, 2021, 48(1): 81-88.
[11] GUO Fei-yan, TANG Bing. Mobile Edge Server Placement Method Based on User Latency-aware [J]. Computer Science, 2021, 48(1): 103-110.
[12] GAO Zi-yan and WANG Yong. Load Balancing Strategy of Distributed Messaging System for Cloud Services [J]. Computer Science, 2020, 47(6A): 318-324.
[13] HUANG Mei-gen, WANG Tao, LIU Liang, PANG Rui-qin and DU Huan. Virtual Network Function Deployment Strategy Based on Software Defined Network Resource Optimization [J]. Computer Science, 2020, 47(6A): 404-408.
[14] ZHOU Jian-xin, ZHANG Zhi-peng, ZHOU Ning. Load Balancing Technology of Segment Routing Based on CKSP [J]. Computer Science, 2020, 47(4): 256-261.
[15] ZHANG Zhao, LI Hai-long, HU Lei, DONG Si-qi. Service Function Load Balancing Based on SDN-SFC [J]. Computer Science, 2019, 46(9): 130-136.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!