Computer Science ›› 2014, Vol. 41 ›› Issue (11): 233-238.doi: 10.11896/j.issn.1002-137X.2014.11.045

Previous Articles     Next Articles

Distributed Optimized Query Algorithm Based on Index

WANG Jing-bin and FANG Zhi-li   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Using index file is a new way of solving the large amount of RDF (Resource Description Framework) query problem,which can be a great aid to query optimization.At present,most of the RDF query optimization method based on Hadoop do not use index file,and most of them aim at static data so they perform poorly at dynamic updating of data.In order to overcome these two drawbacks,this paper proposed IMSQ (using Index in MapReduce to Segment and Query) algorithm to perform distributed RDF query.The algorithm can be divided into segment and query execution two parts,firstly,makes a starlike segmentation for RDF data,and obtaines several segment file and corresponding index file,secondly,generates a layered join plan,uses filter method to seek the index file to narrow the result set and then does query on corresponding segment file;finally,merges and outputs the middle result. The results of the experiment on the LUBM test data set show that IMSQ method query efficiency is higher when the amount of the RDF data is large.

Key words: Hadoop,RDF,Index,MapReduce

[1] 李慧颖,瞿裕忠.基于关键词的语义网数据查询研究综述[J].计算机科学,2011,8(7):18-23
[2] 金强.基于Hase的RDF存储系统的研究与设计[D].杭州:浙江大学,2011
[3] 王鑫,冯志勇,杜朴风,等.Jingwei:一种分布式大规模RDF数据服务器[J].计算机研究与发展,2011,48(Suppl.):451-455
[4] Li L,Song Y.Distributed Storage of Massive RDF Data Using HBase[J].Journal of Communication and Computer,2011,8(5):325-328
[5] Sun J,Jin Q.Scalable rdf store based on hbase and mapreduce[C]∥2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE).IEEE,2010:633-636
[6] Husain M F,Doshi P,Khan L,et al.Storage and retrieval oflarge rdf graph using hadoop and mapreduce[M].Cloud Computing.Springer Berlin Heidelberg,2009:680-686
[7] Myung J,Yeon J,Lee S G.SPARQL Basie Graph Pattern Processing with Iterative MapReduce[C]∥Proceedings of the Workshop on Massive Data Analytics on the Cloud(MDAC’10).2010:6-12
[8] Husain M,McGlothlin J,Masud M M,et al.Heuristics-BasedQuery Processing for Large RDF Graphs Using Cloud Computing[J].IEEE Transactions on Knowledge and Data Enginee-ring,2011,23(9):1312-1327
[9] Cheng J,Wang W,Gao R.Massive RDF Data Complicated Query Optimization Based on MapReduce[J].Physics Procedia,2012,25:1414-1419
[10] Wu B,Jin H,Yuan P.Scalable SAPRQL querying processing on large RDF data in cloud computing environment[C]∥Pervasive Computing and the Networked World.Berlin Heidelberg:Springer,2013:631-646
[11] Liu L,Yin J,Gao L.Efficient Social Network Data Query Processing on MapReduce[C]∥Proc of the 5th ACM workshop.New York:ACM,2013:27-32
[12] 刘翔宇,吴刚.基于Prfer序列的RDF数据索引与查询[J].计算机学报,2011,4(10):1997-2008
[13] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113
[14] 袁平鹏,刘谱,张文娅,等.高可扩展的 RDF 数据存储系统[J].计算机研究与发展,2012,49(10):2131-2141

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!