计算机科学 ›› 2023, Vol. 50 ›› Issue (1): 34-40.doi: 10.11896/jsjkx.211100049

• 数据库&大数据&数据科学 • 上一篇    下一篇

一种存储介质优化的大规模图遍历方法研究

矫天哲, 何虹燕, 张泽鑫, 宋杰   

  1. 东北大学软件学院 沈阳 110189
  • 收稿日期:2021-11-04 修回日期:2022-06-23 出版日期:2023-01-15 发布日期:2023-01-09
  • 通讯作者: 宋杰(songjie@mail.neu.edu.cn)
  • 作者简介:jtzash@163.com
  • 基金资助:
    国家自然科学基金(61672143)

Study on Big Graph Traversals for Storage Medium Optimization

JIAO Tianzhe, HE Hongyan, ZHANG Zexin, SONG Jie   

  1. School of Software College,Northeastern University,Shenyang 110819,China
  • Received:2021-11-04 Revised:2022-06-23 Online:2023-01-15 Published:2023-01-09
  • About author:JIAO Tianzhe,born in 1997,Ph.D.Her main research interests include big data management and machine learning.
    SONG Jie,born in 1980,Ph.D,professor.His main research interests include big data management,green computing and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61672143).

摘要: 大图数据的BFS算法作为一种基础算法,受到工业界和学术界的广泛重视。不同平台涌现出众多大图BFS算法的研究工作,其中多使用固态硬盘来提高算法效率。在BFS算法遍历过程中,存储设备需要连续重复装载会数据以满足遍历需求,而数据重复装载造成大量数据擦写操作,严重影响了固态硬盘的使用寿命。由此可见,减少BFS算法数据擦写操作可以有效延长固态硬盘的使用寿命。结合图结构的特点,提出数据重用模型,用于描述图遍历过程中的数据重用程度;提出了基于图顶点度的启发式优先访问方法,该方法判断图顶点之间的独立性,并根据判断结果选择优先访问的图顶点,增加数据重用的可能性,提高缓存的命中率,减少闪存颗粒磨损。所提优化方法不修改BFS算法和大图数据,适用于各种BFS算法和数据集。最后,实验验证了所提数据重用模型的正确性,以及启发式优先访问方法的有效性。该优化方法应用于BFS-4K,B40C和Gunrock这3种常见的BFS算法上,能有效减少图遍历过程中的数据写入操作,固态硬盘的使用寿命可分别提高12%,15%,22%。

关键词: 图遍历, 固态硬盘, 缓存命中率, 启发式, 使用寿命

Abstract: As the basis of many algorithms,the breadth-first search(BFS) algorithm for big graph data has attracted more and more interests industrially and academically.Among the researches on BFS for the big graph,the solid state disk(SSD) is utilized to improve algorithm performance.In the graph traversal,the storage devices are required to continuously and repeatedly load data to fulfill the demand of graph traversal.However,many operations of data wipe caused by continuously and repeatedly loading data severely degrade the lifetime of SSD.For this reason,the lifetime of SSD can be effectively extended by reducing the data for write operations.Firstly,combined with the characteristics of graph structure,the data reuse model is constructed for describing the levels of data reuse in graph traversal.Then,a heuristic priority access algorithm based on vertex degree is proposed.By judging the independence of vertexes,the proposed algorithm selects graph vertexes that are priority accessed for improving the probability of data reusing,increasing hit ratio,and reducing the wear of flash memory.This method is available to any BFS algorithms and datasets without modifying BFS algorithm and big graph data.In the end,simulation results demonstrate that the proposed model and algorithm are effective.By applying the proposed algorithm on three common BFS algorithms:BFS-4K,B40C,and Gunrock,it can effectively reduce the operations of data written in graph traversal and improve the lifetime of SSD 12%,15%,22%,respectively.

Key words: Graph traversal, Solid state disk, Cache hit rate, Heuristic, Lifetime

中图分类号: 

  • TP311
[1]PAN Y,LI Y,ZHANG H,et al.GFTL:Group-level mapping in flash translation layer to provide efficient address translation for NAND flash-based SSDs[J].IEEE Transaction on Consumer Electronics,2020,66(3):242-250.
[2]SHI H F,WANG C G,CAI R Y,et al.Node Static Processing and Caching Strategy in Space-Ground Integrated Intelligent Network[J].Computer Engineering,2021,47(1):30-36.
[3]SABET A H,ZHAO Z J,GUPTA R.Subway:Minimizing Data Transfer during out-of-GPU-Memory Graph Processing[C]//European Conference on Computer Systems.Association for Computing Machinery,2020:1-16.
[4]LIU H,HUANG H.Enterprise:Breadth-first graph traversal on GPUs[C]//High Performance Computing,Networking,Storage and Analysis.ACM,2015:1-12.
[5]HU J,JIANG H,TIAN L,et al.GC-ARM:garbage collection-aware RAM management for flash based solid state drives[C]//International Conference on Networking,Architecture and Sto-rage(NAS).IEEE,2012:134-143.
[6]PAN Y,LIN M,WU Z,et al.Caching-Aware Garbage Collection to Improve Performance and Lifetime for NAND Flash SSDs[J].IEEE Transactions on Consumer Electronics,2021,67(2):141-148.
[7]PAIK J,CHO E,JIN R,et al.Selective-Delay Garbage Collection Mechanism for Read Operations in Multichannel FlashBased Storage Devices[J].IEEE Transactions on Consumer Electronics,2018,64(1):118-126.
[8]YANG Y,MISRA V,RUBENSTEIN D.On the optimality ofgreedy garbage collection for SSDs[J].ACM SIGMETRICS Performance Evaluation Review,2015,43(2):63-65.
[9]HOUDT B V.Performance of garbage collection algorithms for flash-based solid state drives with hot/cold data[J].Perfor-mance Evaluation,2013,70(10):692-703.
[10]PELEATO B,TABRIZI H,AGARWAL R,et al.BER-basedwear leveling and bad block management for NAND flash[C]//International Conference on Communications(ICC).IEEE,2015:295-300.
[11]CHEN Y,CHI H,YANG C,et al.Enabling the Duo-phase Data Management to Realize Longevity Bit-alterable Flash Memory[J].IEEE Transactions on Computers,2021,(16):1982-1997.
[12]YANG P,XUE N,ZHANG Y,et al.Reducing garbage collection overhead in SSD based on workload prediction[C]//Hot Topics in Storage and File Systems(HotStorage19).ACM Workshop,2019:1-8.
[13]CHENG Z J,WANG L,CHENG Y D,et al.File Access Popularity Prediction for Hierarchical Storage for High-Energy Phy-sics[J].Computer Engineering,2021,47(2):126-132.
[14]KANG D H,MIN C,EOM Y I.An Efficient Buffer Replacement Algorithm for NAND Flash Storage Devices[C]//IEEE International Symposium on Modelling,Analysis,and Simulation of Computer and Telecommunication Systems.IEEE,2014:239-248.
[15]JUNG M,CHOI W,SRIKANTAIAH S,et al.HIOS:A HostInterface I/O Scheduler for Solid State Disks[J].ACM SIGARCH Computer Architecture News,2014,42(3):289-300.
[16]JUNG M,KANDEMIR M.Sprinkler:Maximizing resource utilization in many-chip solid state disks[C]//International Symposium on High Performance Computer Architecture(HPCA'14).IEEE,2014:524-535.
[17]KANG J U,HYUN J,MAENG H,et al.The Multistreamed Solid-State Drive[C]//Hot Topics in Storage and File Systems(HotStorage).ACM Workshop,2014:13-17.
[18]YAN H,HUANG Y,ZHOU X,et al.An efficient and non-timesensitive file-aware garbage collection algorithm for NAND flash-based consumer electronics[J].IEEE Transactions on Consumer Electronics,2019,65(1):73-79.
[19]SKOURTIS D,ACHLIOPTAS D,WATKINS N,et al.Flash on Rails:Consistent Flash Performance through Redundancy[C]//Annual Technical Conference(ATC).USENIX,2014:463-474.
[20]MA R,WU F,LU Z H,et al.BlockHammer:Improving Flash Reliability by Exploiting Process Variation Aware Proactive Failure Prediction[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2020,39(12):4563-4574.
[21]SUN Q,ZHUGE Q F,CHEN X Z,et al.In-page Wear-leveling Memory Management Based on Non-volatile Memory[J].Computer Science,2018,45(S2):505-510.
[22]JOHN A,CHRISTOPHER R,NADY O,et al.Parboil:A revisedbenchmark suite for scientific and commercial throughput computing[EB/OL].(2012-03-19)[2021-10-20].http://impact.crhc.illino-is.edu/Shared/Docs/impact-12-01.parboil.pdf.
[23]SCOTT B,AYDIN B,KRSTE A,et al.Distributed memorybreadth-first search revisited:Enabling bottom-up search[C]//Parallel and Distributed Processing Symposium Workshops & PhD Forum(IPDPSW).IEEE,2013:1618-1627.
[24]LIU H,HUANG H.Enterprise:Breadth-first graph traversal on GPUs[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.ACM,2013:1-12.
[25]LI Q,LI H,ZHANG K,et al.A Survey of SSD Lifecycle Prediction[C]//International Conference on Software Engineering and Service Science(ICSESS).IEEE,2019:195-198.
[26]SALLINEN S,GHARAIBEH A,RIPEANU M.AcceleratingDirection-Optimized Breadth First Search on Hybrid Architectures[C]//European Conference on Parallel Processing.Sprin-ger International Publishing,2015:233-245.
[27]BUSATO F,BOMBIERI N.BFS-4K:an efficient implementation of BFS for Kepler GPU architectures[J].IEEE Transactions on Parallel and Distributed Systems,2015,26(7):1826-1838.
[28]MERRILL D,GARLAND M,GRIMSHAW A.Scalable GPU graph traversal[C]//ACM SIGPLAN Notices.ACM,2012:117-128.
[29]WANG Y,DAVIDSON A,PAN Y,et al.Gunrock:a high-performance graph processing library on the GPU[C]//Symposium on Principles and Practice of Parallel Programming.ACM SIGPLAN,2016:265-266.
[1] 张露萍, 徐飞.
具有突触规则的脉冲神经膜系统综述
Survey on Spiking Neural P Systems with Rules on Synapses
计算机科学, 2022, 49(8): 217-224. https://doi.org/10.11896/jsjkx.220300078
[2] 耿海军, 王威, 尹霞.
基于混合软件定义网络的单节点故障保护方法
Single Node Failure Routing Protection Algorithm Based on Hybrid Software Defined Networks
计算机科学, 2022, 49(2): 329-335. https://doi.org/10.11896/jsjkx.210100051
[3] 程秋云, 刘京菊, 杨国正, 罗智昊.
一种启发式的互联网多层网络模型构建方法
Heuristic Method for Building Internet Multilayer Network Model
计算机科学, 2022, 49(11A): 210800249-6. https://doi.org/10.11896/jsjkx.210800249
[4] 俞赛赛, 王小娟, 章倩倩.
基于启发式搜索特征选择的加密流量恶意行为检测技术
Detection of Malicious Behavior in Encrypted Traffic Based on Heuristic Search Feature Selection
计算机科学, 2022, 49(11A): 210800237-6. https://doi.org/10.11896/jsjkx.210800237
[5] 王加昌, 郑代威, 唐雷, 郑丹晨, 刘梦娟.
基于机器学习的剩余使用寿命预测实证研究
Empirical Research on Remaining Useful Life Prediction Based on Machine Learning
计算机科学, 2022, 49(11A): 211100285-9. https://doi.org/10.11896/jsjkx.211100285
[6] 郭彪, 唐麒, 文智敏, 傅娟, 王玲, 魏急波.
一种面向动态部分可重构片上系统的列表式软硬件划分算法
List-based Software and Hardware Partitioning Algorithm for Dynamic Partial Reconfigurable System-on-Chip
计算机科学, 2021, 48(6): 19-25. https://doi.org/10.11896/jsjkx.200700198
[7] 刘忠慧, 赵琦, 邹璐, 闵帆.
三元概念的启发式构建及其在社会化推荐中的应用
Heuristic Construction of Triadic Concept and Its Application in Social Recommendation
计算机科学, 2021, 48(6): 234-240. https://doi.org/10.11896/jsjkx.200500136
[8] 张晓, 张思蒙, 石佳, 董聪, 李战怀.
Ceph分布式存储系统性能优化技术研究综述
Review on Performance Optimization of Ceph Distributed Storage System
计算机科学, 2021, 48(2): 1-12. https://doi.org/10.11896/jsjkx.201000149
[9] 郭启程, 杜晓玉, 张延宇, 周毅.
基于改进鲸鱼算法的无人机三维路径规划
Three-dimensional Path Planning of UAV Based on Improved Whale Optimization Algorithm
计算机科学, 2021, 48(12): 304-311. https://doi.org/10.11896/jsjkx.201000021
[10] 唐文君, 刘岳, 陈荣.
移动边缘计算中的动态用户分配方法
User Allocation Approach in Dynamic Mobile Edge Computing
计算机科学, 2021, 48(1): 58-64. https://doi.org/10.11896/jsjkx.200900079
[11] 郭飞雁, 唐兵.
基于用户延迟感知的移动边缘服务器放置方法
Mobile Edge Server Placement Method Based on User Latency-aware
计算机科学, 2021, 48(1): 103-110. https://doi.org/10.11896/jsjkx.200900146
[12] 冯炳超, 吴璟莉.
求解自行车共享系统静态再平衡问题的单亲遗传算法
Partheno-genetic Algorithm for Solving Static Rebalance Problem of Bicycle Sharing System
计算机科学, 2020, 47(6A): 114-118. https://doi.org/10.11896/JsJkx.190700120
[13] 张旭, 王莉莉, 杨博韬.
带有一刀切约束的二维非规则装箱算法
Heuristic Algorithms for Two-dimensional Irregular Bin Packing Problem with GuillotineConstraints
计算机科学, 2020, 47(5): 212-216. https://doi.org/10.11896/jsjkx.190400078
[14] 潘恒, 李景峰, 马君虎.
可抵御内部威胁的角色动态调整算法
Role Dynamic Adjustment Algorithm for Resisting Insider Threat
计算机科学, 2020, 47(5): 313-318. https://doi.org/10.11896/jsjkx.190800051
[15] 杨婷, 罗飞, 丁炜超, 卢海峰.
一种自适应优化松弛量的装箱算法
Bin Packing Algorithm Based on Adaptive Optimization of Slack
计算机科学, 2020, 47(4): 211-216. https://doi.org/10.11896/jsjkx.190500132
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!