计算机科学 ›› 2014, Vol. 41 ›› Issue (11): 233-238.doi: 10.11896/j.issn.1002-137X.2014.11.045
汪璟玢,方知立
WANG Jing-bin and FANG Zhi-li
摘要: 在Hadoop平台中采用索引文件来辅助查询是解决海量RDF(Resource Description Framework)查询的一种新思路。目前在Hadoop平台中实现的RDF查询都较少利用索引文件,且主要针对RDF的静态数据,对数据动态更新操作的兼容性都比较差。为了克服这两个缺点,提出IMSQ(using Index in MapReduce to Segment and Query)算法来对RDF文件进行分布式查询。该算法主要分为分割和查询两部分:首先为RDF进行一次星形分割,得到若干个分割,文件并建立索引文件;其次在查询时,按照分层生成连接计划,采用过滤选择策略,先找索引文件,缩小文件集,再对相应的分割文件进行查询;最后进行一次结果合并和输出。在LUBM数据集上进行的测试实验表明,在数据量大的情况下IMSQ方法的查询效率具有明显的优势。
[1] 李慧颖,瞿裕忠.基于关键词的语义网数据查询研究综述[J].计算机科学,2011,8(7):18-23 [2] 金强.基于Hase的RDF存储系统的研究与设计[D].杭州:浙江大学,2011 [3] 王鑫,冯志勇,杜朴风,等.Jingwei:一种分布式大规模RDF数据服务器[J].计算机研究与发展,2011,48(Suppl.):451-455 [4] Li L,Song Y.Distributed Storage of Massive RDF Data Using HBase[J].Journal of Communication and Computer,2011,8(5):325-328 [5] Sun J,Jin Q.Scalable rdf store based on hbase and mapreduce[C]∥2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE).IEEE,2010:633-636 [6] Husain M F,Doshi P,Khan L,et al.Storage and retrieval oflarge rdf graph using hadoop and mapreduce[M].Cloud Computing.Springer Berlin Heidelberg,2009:680-686 [7] Myung J,Yeon J,Lee S G.SPARQL Basie Graph Pattern Processing with Iterative MapReduce[C]∥Proceedings of the Workshop on Massive Data Analytics on the Cloud(MDAC’10).2010:6-12 [8] Husain M,McGlothlin J,Masud M M,et al.Heuristics-BasedQuery Processing for Large RDF Graphs Using Cloud Computing[J].IEEE Transactions on Knowledge and Data Enginee-ring,2011,23(9):1312-1327 [9] Cheng J,Wang W,Gao R.Massive RDF Data Complicated Query Optimization Based on MapReduce[J].Physics Procedia,2012,25:1414-1419 [10] Wu B,Jin H,Yuan P.Scalable SAPRQL querying processing on large RDF data in cloud computing environment[C]∥Pervasive Computing and the Networked World.Berlin Heidelberg:Springer,2013:631-646 [11] Liu L,Yin J,Gao L.Efficient Social Network Data Query Processing on MapReduce[C]∥Proc of the 5th ACM workshop.New York:ACM,2013:27-32 [12] 刘翔宇,吴刚.基于Prfer序列的RDF数据索引与查询[J].计算机学报,2011,4(10):1997-2008 [13] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113 [14] 袁平鹏,刘谱,张文娅,等.高可扩展的 RDF 数据存储系统[J].计算机研究与发展,2012,49(10):2131-2141 |
No related articles found! |
|