计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 29-35.doi: 10.11896/jsjkx.220800050

• 高性能计算 • 上一篇    下一篇

基于多核CPU的无锁并行Semi-naive算法

喻婷, 王立松, 秦小麟   

  1. 南京航空航天大学计算机科学与技术学院 南京 211106
  • 收稿日期:2022-08-05 修回日期:2022-11-10 出版日期:2023-06-15 发布日期:2023-06-06
  • 通讯作者: 秦小麟(qinxcs@nuaa.edu.cn)
  • 作者简介:(sx2016004yt@nuaa.edu.cn)
  • 基金资助:
    国家自然科学基金(61802182)

Lock-free Parallel Semi-naive Algorithm Based on Multi-core CPU

YU Ting, WANG Lisong, QIN Xiaolin   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2022-08-05 Revised:2022-11-10 Online:2023-06-15 Published:2023-06-06
  • About author:YU Ting,born in 1998,postgraduate.Her main research interests include data management and parallel algorithm.QIN Xiaolin,born in 1953,Ph.D,is a senior member of China Computer Fe-deration.His main research interests include data management and security in distributed environment and so on.
  • Supported by:
    National Natural Science Foundation of China(61802182).

摘要: Datalog系统被广泛应用于很多领域,如图数据库、网络和静态程序分析等。在处理海量数据时,基于串行的Datalog求解策略无法充分发挥现有多核处理器的计算性能。针对上述问题,提出一种基于多核CPU的无锁并行Semi-naive算法(Parallel Semi-naive,PSN)用于支持Datalog的高效求解。PSN使用B+树索引进行数据划分,将数据分配给不同的线程执行计算,每个分区产生的中间结果元组互不相同,有利于实现计算时无锁的并行。同时提出一种双层哈希表结构来索引中间结果以提高查重的效率,每个线程只在特定的区域执行相关的计算,无需使用锁来防止写冲突。PSN使用任务队列-线程池为空闲线程分配任务,来实现多线程的负载均衡。在KONECT(Koblenz Network Collection)及斯坦福SNAP(Stanford Network Analysis Platform)的公开数据集上的实验结果表明,PSN算法可以优化Datalog规则的查询性能。

关键词: Datalog, Semi-naive算法, 并行, 递归规则, 负载均衡

Abstract: Datalog system has a wide range of applications in many fields,such as graph databases,networks,and static program analysis.When dealing with massive data,the serial-based Datalog evaluation strategy cannot fully utilize the computational resources of existing multicore processors.To address these problems,a lock-free parallelized semi-naive algorithm,parallel semi-naive(PSN) based on multi-core CPU is proposed to support efficient Datalog evaluation.PSN uses the B+ tree index to partition data and allocates data to different threads to perform calculations.The intermediate result tuples generated from each partition are different from each other,which is conducive to the realization of lock-free parallelism during calculation.PSN uses a double-layer hash table structure to index intermediate results to improve the efficiency of duplicate checking.Each thread only performs related calculations in a specific area,without using locks to prevent write conflicts.PSN adopts task queue and thread pool to allocate tasks to idle threads to achieve multi-thread load balancing.Experimental results on the public datasets of Koblenz network collection(KONECT) and Stanford network analysis platform( SNAP) show that the PSN algorithm can optimize the query performance of datalog rules.

Key words: Datalog, Semi-naive algorithm, Parallel, Recursive rule, Load balance

中图分类号: 

  • TP311
[1]CERI S,GOTTLOB G,TANCA L.What you always wanted to know about Datalog(and never dared to ask)[J].IEEE Transactions on Knowledge and Data Engineering,1989,1(1):146-166.
[2]ULLMAN J D.Principles of Database and Knowledge-Base Systems[M].Rockville,MD:Computer Science Press,1988.
[3]SHKAPSKY A,ZENG K,ZANIOLO C.Graph queries in anext-generationdatalog system[J].Proceedings of the VLDB Endowment,2013,6(12):1258-1261.
[4]FOGEL A,FUNG S,PEDROSA L,et al.A general approach to network configuration analysis[C]//12th USENIX Symposium on Networked Systems Design and Implementation(NSDI 15).2015:469-483.
[5]ALLEN N,KRISHNAN P,SCHOLZ B.Combining type-analysis with points-to analysis for analyzing Java library source-code[C]//Proceedings of the 4th ACM SIGPLAN International Workshop on State of the Art in Program Analysis.2015:13-18.
[6]HASSANSHAHI B,RAMESH R K,KRISHNAN P,et al.An efficient tunable selective points-to analysis for large codebases[C]//Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis.2017:13-18.
[7]SCHOLZ B,VOROBYOV K,KRISHNAN P,et al.Adatalogsource-to-source translator for static program analysis:An experience report[C]//2015 24th Australasian Software Engineering Conference.IEEE,2015:28-37.
[8]LIVSHITS V B,LAM M S.Finding Security Vulnerabilities in Java Applications with Static Analysis[C]//USENIX Security Symposium.2005:18-18.
[9]GREEN T J,HUANG SS,LOO B T,et al.Datalog and recursive query processing[J].Foundations and Trends © in Databases,2013,5(2):105-195.
[10]SCHOLZ B,JORDAN H,SUBOTIC' P,et al.On fast large-scaleprogram analysis indatalog[C]//Proceedings of the 25th International Conference on Compiler Construction.2016:196-206.
[11]MEI H,QIN G,XU M,et al.NeuralDatalog through time:Informed temporal modeling via logical specification[C]//International Conference on Machine Learning.PMLR,2020:6808-6819.
[12]JORDAN H,SUBOTIC' P,ZHAO D,et al.A specialized B-tree for concurrentdatalog evaluation[C]//Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming.2019:327-339.
[13]MOTIK B,NENOV Y,PIRO R,et al.Maintenance ofdatalog materialisations revisited[J].Artificial Intelligence,2019,269:76-136.
[14]MARTÍNEZ-ANGELES C A,DUTRA I,COSTA V S,et al.Adatalog engine for gpus[M]//Declarative Programming and Knowledge Management.Cham:Springer,2013:152-168.
[15]ABITEBOUL S,HULL R,VIANU V.Foundations of Databa-ses[J].SIGMOD Record,2000,29(3):76-87.
[16]BRAVENBOER M,SMARAGDAKIS Y.Exception analysis and points-to analysis:Better together[C]//Proceedings of the Eighteenth International Symposium on Software Testing and Analysis.2009:1-12.
[17]HODER K,BJØRNER N,MOURA L.μZ-an efficient engine for fixed points with constraints[C]//International Conference on Computer Aided Verification.Berlin:Springer,2011:457-462.
[18]MOTIK B,NENOV Y,PIRO R,et al.Parallelmaterialisation of datalog programs in centralised,main-memory RDF systems[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2014.
[19]WHALEY J,AVOTS D,CARBIN M,et al.UsingDatalog with binary decision diagrams for program analysis[C]//Asian Symposium on Programming Languages and Systems.Berlin:Sprin-ger,2005:97-118.
[20]DONG G.On distributedprocessibility of datalog queries by decomposing databases[C]//Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data.1989:26-35.
[21]SHAO J,BELL D A,HULL M E C.Combining rule decomposition and data partitioning in paralleldatalog program processing[C]//Proceedings of the First International Conference on Pa-rallel and Distributed Information Systems.IEEE Computer So-ciety,1991:106-107.
[22]GILRAY T,KUMAR S,MICINSKI K.Compiling data-paralleldatalog[C]//Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction.2021:23-35.
[23]GANGULY S,SILBERSCHATZ A,TSUR S.Parallel bottom-up processing ofdatalog queries[J].The Journal of Logic Programming,1992,14(1/2):101-126.
[24]SEO J,PARK J,SHIN J,et al.Distributed socialite:Adatalog-based language for large-scale graph analysis[J].Proceedings of the VLDB Endowment,2013,6(14):1906-1917.
[25]WOLFSON O,SILBERSCHATZ A.Distributed processing of logic programs[C]//Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data.1988:329-336.
[26]YANG M,SHKAPSKY A,ZANIOLO C.Scaling up the per-formance of more powerfuldatalog systems on multicore machines[J].The VLDB Journal,2017,26(2):229-248.
[27]FAN Z,ZHU J,ZHANG Z,et al.Scaling-Up In-Memory Datalog Processing:Observations and Techniques[C]//Proceedings of the VLDB Endowment.2019.
[28]MCSHERRY F,MURRAY D G,ISAACS R,et al.Differential Dataflow[C]//Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research.2013.
[29]RYZHYK L,BUDIU M.Differential Datalog[C]//Proceedings of the Datalog 2.0-3rd International Workshop on the Resur-gence of Datalog in Academia and Industry.CEUR-WS,2019:56-67.
[30]LEONE N,PFEIFER G,FABER W,et al.The DLV system for knowledge representation and reasoning[J].ACM Transactions on Computational Logic(TOCL),2006,7(3):499-562.
[31]KUNEGIS J.The KONECT Project[EB/OL].http://konect.cc.
[32]LESKOVEC J,KREVL A.Stanford Large Network Dataset Collection[EB/OL].https://snap.stanford.edu/data.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!