计算机科学 ›› 2014, Vol. 41 ›› Issue (11): 233-238.doi: 10.11896/j.issn.1002-137X.2014.11.045

• 软件与数据库技术 • 上一篇    下一篇

基于索引的分布式RDF查询优化算法

汪璟玢,方知立   

  1. 福州大学数学与计算机科学学院 福州350108;福州大学数学与计算机科学学院 福州350108
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受福州大学科技发展基金项目(2013-XQ-32),空间数据挖掘与信息共享教育部重点实验室开放研究基金项目(201006),2011年福建省科技拥军基金项目(JG2011005),福建省自然科学基金项目(2012J01168)资助

Distributed Optimized Query Algorithm Based on Index

WANG Jing-bin and FANG Zhi-li   

  • Online:2018-11-14 Published:2018-11-14

摘要: 在Hadoop平台中采用索引文件来辅助查询是解决海量RDF(Resource Description Framework)查询的一种新思路。目前在Hadoop平台中实现的RDF查询都较少利用索引文件,且主要针对RDF的静态数据,对数据动态更新操作的兼容性都比较差。为了克服这两个缺点,提出IMSQ(using Index in MapReduce to Segment and Query)算法来对RDF文件进行分布式查询。该算法主要分为分割和查询两部分:首先为RDF进行一次星形分割,得到若干个分割,文件并建立索引文件;其次在查询时,按照分层生成连接计划,采用过滤选择策略,先找索引文件,缩小文件集,再对相应的分割文件进行查询;最后进行一次结果合并和输出。在LUBM数据集上进行的测试实验表明,在数据量大的情况下IMSQ方法的查询效率具有明显的优势。

关键词: Hadoop,RDF,索引,MapReduce

Abstract: Using index file is a new way of solving the large amount of RDF (Resource Description Framework) query problem,which can be a great aid to query optimization.At present,most of the RDF query optimization method based on Hadoop do not use index file,and most of them aim at static data so they perform poorly at dynamic updating of data.In order to overcome these two drawbacks,this paper proposed IMSQ (using Index in MapReduce to Segment and Query) algorithm to perform distributed RDF query.The algorithm can be divided into segment and query execution two parts,firstly,makes a starlike segmentation for RDF data,and obtaines several segment file and corresponding index file,secondly,generates a layered join plan,uses filter method to seek the index file to narrow the result set and then does query on corresponding segment file;finally,merges and outputs the middle result. The results of the experiment on the LUBM test data set show that IMSQ method query efficiency is higher when the amount of the RDF data is large.

Key words: Hadoop,RDF,Index,MapReduce

[1] 李慧颖,瞿裕忠.基于关键词的语义网数据查询研究综述[J].计算机科学,2011,8(7):18-23
[2] 金强.基于Hase的RDF存储系统的研究与设计[D].杭州:浙江大学,2011
[3] 王鑫,冯志勇,杜朴风,等.Jingwei:一种分布式大规模RDF数据服务器[J].计算机研究与发展,2011,48(Suppl.):451-455
[4] Li L,Song Y.Distributed Storage of Massive RDF Data Using HBase[J].Journal of Communication and Computer,2011,8(5):325-328
[5] Sun J,Jin Q.Scalable rdf store based on hbase and mapreduce[C]∥2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE).IEEE,2010:633-636
[6] Husain M F,Doshi P,Khan L,et al.Storage and retrieval oflarge rdf graph using hadoop and mapreduce[M].Cloud Computing.Springer Berlin Heidelberg,2009:680-686
[7] Myung J,Yeon J,Lee S G.SPARQL Basie Graph Pattern Processing with Iterative MapReduce[C]∥Proceedings of the Workshop on Massive Data Analytics on the Cloud(MDAC’10).2010:6-12
[8] Husain M,McGlothlin J,Masud M M,et al.Heuristics-BasedQuery Processing for Large RDF Graphs Using Cloud Computing[J].IEEE Transactions on Knowledge and Data Enginee-ring,2011,23(9):1312-1327
[9] Cheng J,Wang W,Gao R.Massive RDF Data Complicated Query Optimization Based on MapReduce[J].Physics Procedia,2012,25:1414-1419
[10] Wu B,Jin H,Yuan P.Scalable SAPRQL querying processing on large RDF data in cloud computing environment[C]∥Pervasive Computing and the Networked World.Berlin Heidelberg:Springer,2013:631-646
[11] Liu L,Yin J,Gao L.Efficient Social Network Data Query Processing on MapReduce[C]∥Proc of the 5th ACM workshop.New York:ACM,2013:27-32
[12] 刘翔宇,吴刚.基于Prfer序列的RDF数据索引与查询[J].计算机学报,2011,4(10):1997-2008
[13] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113
[14] 袁平鹏,刘谱,张文娅,等.高可扩展的 RDF 数据存储系统[J].计算机研究与发展,2012,49(10):2131-2141

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!