计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 139-143.doi: 10.11896/jsjkx.191000110
朱岸青1, 李帅2, 唐晓东3
ZHU An-qing1, LI Shuai2, TANG Xiao-dong3
摘要: 为了提高关联规则挖掘效率文中提出了一种适用于Spark平台的并行化FP_growth关联规则挖掘方法.首先利用Spark平台在分布式系统中的所有节点的内存RDD中完成遍历扫描运算得到频繁集以便生成FP_Table并更新FP_Tree.然后引入时间序列来预测待挖掘的项目集以便实现分布式系统中的所有节点能够均衡分担挖掘任务从而充分利用各节点的FP_Tree遍历功能获取FP_growth关联规则挖掘结果.实验结果显示相比单机情况并行化FP_growth关联规则挖掘在效率方面提高了约60%.经过负载均衡处理后的FP_growth关联规则挖掘的效率更高提高了约14%这说明各节点遍历任务的分配更均衡并行化程度更高.
中图分类号:
[1] BELALEM G,ABBACHE A,BELKREDIM F Z,et al.Arabic Query Expansion Using WordNet and Association Rules[J].International Journal of Intelligent Information Technologies,2016,12(3):51-64. [2] MAI T,VO B,NGUYEN L T T.A lattice-based approach for mining high utility association rules[J].Information Sciences,2017,399:81-97. [3] YU B,LIU S Q.An improved association rule mining algorithm based on FP-growth algorithm[J].Computer and Network,2017,43(14):68-71. [4] LIN W T,CHU C P.Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments[J].Parallel Algorithms &Applications,2014,30(5):1-13. [5] SHAO X Y,ZHANG L.An improved parallel algorithm for FP-Growth association rules based on Hadoop[J].Computer Applied Research,2018,35 (1). [6] DIVYAVARMA K,REMYA M,DEEPA G.An Enhanced Bug Mining for Identifying Frequent Bug Pattern Using Word Tokenizer and FP-Growth[M]//Proceedings of the 5th International Conference on Frontiers in Intelligent Computing:Theory and Applications.2017:525-532. [7] MOHAMED Y S,NAJIB M,ABDELAZIZ E,et al.APRICOIN:An adaptive approach for prioritizing high-risk containers inspections[J].IEEE Access,2017,5(99):18238-18249. [8] XU F,LU H.The Application of FP-Growth Algorithm Based on Distributed Intelligence in Wisdom Medical Treatment[J].International Journal of Pattern Recognition &Artificial Intelligence,2017,31(4):232-237. [9] CHEN J G,LI K L,MEMBER S,et al.A Parallel Random Fo-rest Algorithm for Big Data in a Spark Cloud Computing Environment[J].IEEE Transactions on Parallel &Distributed Systems,2017,28(4):919-933. [10] SHI W,ZHU Y,YU P S,et al.Effective Prediction of Missing Data on Apache Spark over Multivariable Time Series[J].IEEE Transactions on Big Data,2018,4(4):473-486. [11] ZHANG L Z,CUI Y,LUO G C,et al.Dynamic Load Balancing Algorithms for Large Data Distributed Storage[J].Computer Science,2017,44(5):178-183. [12] LI Z Y,YU J,BIAN C,et al.Data flow dynamic load balancing strategy based on load perception[J].Computer Applications,2017,37(10):2760-2766. [13] LUO J,LEI R,XUE L.Spatio-Temporal Load Balancing for Ener-gy Cost Optimization in Distributed Internet Data Centers[J].IEEE Transactions on Cloud Computing,2017,3(3):387-397. [14] LI X,LI T.Design and implementation of recommendation system based on Spark[J].Computer Technology and development,2018,28(10):201-205. [15] MESTRE D G,PIRES C E S,NASCIMENTOD C,et al.An efficient spark-based adaptive windowing for entity matching[J].Journal of Systems &Software,2017,128:1-10. [16] CHEN J G,LI K L,MEMBER S,et al.A Parallel Random Fo-rest Algorithm for Big Data in a Spark Cloud Computing Environment[J].IEEE Transactions on Parallel &Distributed Systems,2017,28(4):919-933. [17] YAN Y L,CHEN M,SADIQ S,et al.Efficient Imbalanced Multimedia Concept Retrieval by Deep Learning on Spark Clusters[J].International Journal of Multimedia Data Engineering &Management,2017,8(1):1-20. [18] CAO N,WANG C,LI M,et al.Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data[J].IEEE Transactions on Parallel &Distributed Systems,2014,25(1):222-233. [19] HU J,PAN H A.Improved incremental updating algorithm of association rules[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2018,30(4):558-563. |
[1] | 田真真, 蒋维, 郑炳旭, 孟利民. 基于服务器集群的负载均衡优化调度算法 Load Balancing Optimization Scheduling Algorithm Based on Server Cluster 计算机科学, 2022, 49(6A): 639-644. https://doi.org/10.11896/jsjkx.210800071 |
[2] | 高捷, 刘沙, 黄则强, 郑天宇, 刘鑫, 漆锋滨. 基于国产众核处理器的深度神经网络算子加速库优化 Deep Neural Network Operator Acceleration Library Optimization Based on Domestic Many-core Processor 计算机科学, 2022, 49(5): 355-362. https://doi.org/10.11896/jsjkx.210500226 |
[3] | 谭双杰, 林宝军, 刘迎春, 赵帅. 基于机器学习的分布式星载RTs系统负载调度算法 Load Scheduling Algorithm for Distributed On-board RTs System Based on Machine Learning 计算机科学, 2022, 49(2): 336-341. https://doi.org/10.11896/jsjkx.201200126 |
[4] | 夏中, 向敏, 黄春梅. 基于CHBL的P2P视频监控网络分层管理机制 Hierarchical Management Mechanism of P2P Video Surveillance Network Based on CHBL 计算机科学, 2021, 48(9): 278-285. https://doi.org/10.11896/jsjkx.201200056 |
[5] | 白勇, 张占龙, 熊隽迪. 基于FP-Growth算法和GRNN的电力知识文本挖掘 Power Knowledge Text Mining Based on FP-Growth Algorithm and GRNN 计算机科学, 2021, 48(8): 86-90. https://doi.org/10.11896/jsjkx.210600031 |
[6] | 宋海宁, 焦健, 刘永. 高速公路中的移动边缘计算研究 Research on Mobile Edge Computing in Expressway 计算机科学, 2021, 48(6A): 383-386. https://doi.org/10.11896/jsjkx.200900212 |
[7] | 王政, 姜春茂. 一种基于三支决策的云任务调度优化算法 Cloud Task Scheduling Algorithm Based on Three-way Decisions 计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023 |
[8] | 郑增乾, 王锟, 赵涛, 蒋维, 孟利民. 带宽和时延受限的流媒体服务器集群负载均衡机制 Load Balancing Mechanism for Bandwidth and Time-delay Constrained Streaming Media Server Cluster 计算机科学, 2021, 48(6): 261-267. https://doi.org/10.11896/jsjkx.200400131 |
[9] | 姚泽玮, 林嘉雯, 胡俊钦, 陈星. 基于PSO-GA的多边缘负载均衡方法 PSO-GA Based Approach to Multi-edge Load Balancing 计算机科学, 2021, 48(11A): 456-463. https://doi.org/10.11896/jsjkx.210100191 |
[10] | 杨紫淇, 蔡英, 张皓晨, 范艳芳. 基于负载均衡的VEC服务器联合计算任务卸载方案 Computational Task Offloading Scheme Based on Load Balance for Cooperative VEC Servers 计算机科学, 2021, 48(1): 81-88. https://doi.org/10.11896/jsjkx.200800220 |
[11] | 郭飞雁, 唐兵. 基于用户延迟感知的移动边缘服务器放置方法 Mobile Edge Server Placement Method Based on User Latency-aware 计算机科学, 2021, 48(1): 103-110. https://doi.org/10.11896/jsjkx.200900146 |
[12] | 王国澎, 杨剑新, 尹飞, 蒋生健. 负载均衡的处理器运算资源分配方法 Computing Resources Allocation with Load Balance in Modern Processor 计算机科学, 2020, 47(8): 41-48. https://doi.org/10.11896/jsjkx.191000148 |
[13] | 金琪, 王俊昌, 付雄. 基于智能放置策略的Cuckoo哈希表 Cuckoo Hash Table Based on Smart Placement Strategy 计算机科学, 2020, 47(8): 80-86. https://doi.org/10.11896/jsjkx.191200109 |
[14] | 高子妍, 王勇. 面向云服务的分布式消息系统负载均衡策略 Load Balancing Strategy of Distributed Messaging System for Cloud Services 计算机科学, 2020, 47(6A): 318-324. https://doi.org/10.11896/JsJkx.191100012 |
[15] | 黄梅根, 汪涛, 刘亮, 庞瑞琴, 杜欢. 基于软件定义网络资源优化的虚拟网络功能部署策略 Virtual Network Function Deployment Strategy Based on Software Defined Network Resource Optimization 计算机科学, 2020, 47(6A): 404-408. https://doi.org/10.11896/JsJkx.191000116 |
|