计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 29-35.doi: 10.11896/jsjkx.220800050
喻婷, 王立松, 秦小麟
YU Ting, WANG Lisong, QIN Xiaolin
摘要: Datalog系统被广泛应用于很多领域,如图数据库、网络和静态程序分析等。在处理海量数据时,基于串行的Datalog求解策略无法充分发挥现有多核处理器的计算性能。针对上述问题,提出一种基于多核CPU的无锁并行Semi-naive算法(Parallel Semi-naive,PSN)用于支持Datalog的高效求解。PSN使用B+树索引进行数据划分,将数据分配给不同的线程执行计算,每个分区产生的中间结果元组互不相同,有利于实现计算时无锁的并行。同时提出一种双层哈希表结构来索引中间结果以提高查重的效率,每个线程只在特定的区域执行相关的计算,无需使用锁来防止写冲突。PSN使用任务队列-线程池为空闲线程分配任务,来实现多线程的负载均衡。在KONECT(Koblenz Network Collection)及斯坦福SNAP(Stanford Network Analysis Platform)的公开数据集上的实验结果表明,PSN算法可以优化Datalog规则的查询性能。
中图分类号:
[1]CERI S,GOTTLOB G,TANCA L.What you always wanted to know about Datalog(and never dared to ask)[J].IEEE Transactions on Knowledge and Data Engineering,1989,1(1):146-166. [2]ULLMAN J D.Principles of Database and Knowledge-Base Systems[M].Rockville,MD:Computer Science Press,1988. [3]SHKAPSKY A,ZENG K,ZANIOLO C.Graph queries in anext-generationdatalog system[J].Proceedings of the VLDB Endowment,2013,6(12):1258-1261. [4]FOGEL A,FUNG S,PEDROSA L,et al.A general approach to network configuration analysis[C]//12th USENIX Symposium on Networked Systems Design and Implementation(NSDI 15).2015:469-483. [5]ALLEN N,KRISHNAN P,SCHOLZ B.Combining type-analysis with points-to analysis for analyzing Java library source-code[C]//Proceedings of the 4th ACM SIGPLAN International Workshop on State of the Art in Program Analysis.2015:13-18. [6]HASSANSHAHI B,RAMESH R K,KRISHNAN P,et al.An efficient tunable selective points-to analysis for large codebases[C]//Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis.2017:13-18. [7]SCHOLZ B,VOROBYOV K,KRISHNAN P,et al.Adatalogsource-to-source translator for static program analysis:An experience report[C]//2015 24th Australasian Software Engineering Conference.IEEE,2015:28-37. [8]LIVSHITS V B,LAM M S.Finding Security Vulnerabilities in Java Applications with Static Analysis[C]//USENIX Security Symposium.2005:18-18. [9]GREEN T J,HUANG SS,LOO B T,et al.Datalog and recursive query processing[J].Foundations and Trends © in Databases,2013,5(2):105-195. [10]SCHOLZ B,JORDAN H,SUBOTIC' P,et al.On fast large-scaleprogram analysis indatalog[C]//Proceedings of the 25th International Conference on Compiler Construction.2016:196-206. [11]MEI H,QIN G,XU M,et al.NeuralDatalog through time:Informed temporal modeling via logical specification[C]//International Conference on Machine Learning.PMLR,2020:6808-6819. [12]JORDAN H,SUBOTIC' P,ZHAO D,et al.A specialized B-tree for concurrentdatalog evaluation[C]//Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming.2019:327-339. [13]MOTIK B,NENOV Y,PIRO R,et al.Maintenance ofdatalog materialisations revisited[J].Artificial Intelligence,2019,269:76-136. [14]MARTÍNEZ-ANGELES C A,DUTRA I,COSTA V S,et al.Adatalog engine for gpus[M]//Declarative Programming and Knowledge Management.Cham:Springer,2013:152-168. [15]ABITEBOUL S,HULL R,VIANU V.Foundations of Databa-ses[J].SIGMOD Record,2000,29(3):76-87. [16]BRAVENBOER M,SMARAGDAKIS Y.Exception analysis and points-to analysis:Better together[C]//Proceedings of the Eighteenth International Symposium on Software Testing and Analysis.2009:1-12. [17]HODER K,BJØRNER N,MOURA L.μZ-an efficient engine for fixed points with constraints[C]//International Conference on Computer Aided Verification.Berlin:Springer,2011:457-462. [18]MOTIK B,NENOV Y,PIRO R,et al.Parallelmaterialisation of datalog programs in centralised,main-memory RDF systems[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2014. [19]WHALEY J,AVOTS D,CARBIN M,et al.UsingDatalog with binary decision diagrams for program analysis[C]//Asian Symposium on Programming Languages and Systems.Berlin:Sprin-ger,2005:97-118. [20]DONG G.On distributedprocessibility of datalog queries by decomposing databases[C]//Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data.1989:26-35. [21]SHAO J,BELL D A,HULL M E C.Combining rule decomposition and data partitioning in paralleldatalog program processing[C]//Proceedings of the First International Conference on Pa-rallel and Distributed Information Systems.IEEE Computer So-ciety,1991:106-107. [22]GILRAY T,KUMAR S,MICINSKI K.Compiling data-paralleldatalog[C]//Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction.2021:23-35. [23]GANGULY S,SILBERSCHATZ A,TSUR S.Parallel bottom-up processing ofdatalog queries[J].The Journal of Logic Programming,1992,14(1/2):101-126. [24]SEO J,PARK J,SHIN J,et al.Distributed socialite:Adatalog-based language for large-scale graph analysis[J].Proceedings of the VLDB Endowment,2013,6(14):1906-1917. [25]WOLFSON O,SILBERSCHATZ A.Distributed processing of logic programs[C]//Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data.1988:329-336. [26]YANG M,SHKAPSKY A,ZANIOLO C.Scaling up the per-formance of more powerfuldatalog systems on multicore machines[J].The VLDB Journal,2017,26(2):229-248. [27]FAN Z,ZHU J,ZHANG Z,et al.Scaling-Up In-Memory Datalog Processing:Observations and Techniques[C]//Proceedings of the VLDB Endowment.2019. [28]MCSHERRY F,MURRAY D G,ISAACS R,et al.Differential Dataflow[C]//Proceedings of the Sixth Biennial Conference on Innovative Data Systems Research.2013. [29]RYZHYK L,BUDIU M.Differential Datalog[C]//Proceedings of the Datalog 2.0-3rd International Workshop on the Resur-gence of Datalog in Academia and Industry.CEUR-WS,2019:56-67. [30]LEONE N,PFEIFER G,FABER W,et al.The DLV system for knowledge representation and reasoning[J].ACM Transactions on Computational Logic(TOCL),2006,7(3):499-562. [31]KUNEGIS J.The KONECT Project[EB/OL].http://konect.cc. [32]LESKOVEC J,KREVL A.Stanford Large Network Dataset Collection[EB/OL].https://snap.stanford.edu/data. |
|