计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 402-406.

• 大数据与数据挖掘 • 上一篇    下一篇

带关系属性的空间关键词并行查询处理算法

徐哲, 刘亮, 秦小麟, 秦伟萌   

  1. 南京航空航天大学计算机科学与技术学院 南京210016
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 刘 亮(1985-),男,博士,讲师,主要研究领域为传感器网络数据库、时空数据管理等,E-mail:LiangLiu@nuaa.edu.cn
  • 作者简介:徐 哲(1993-),男,硕士生,CCF学生会员,主要研究领域为大数据、空间查询,E-mail:xuzhe@nuaa.edu.cn;秦小麟(1953-),男,教授,博士生导师,主要研究领域为空间与时空数据库、分布式数据管理与安全等;秦伟萌(1996-),女,硕士,主要研究领域为空间查询。
  • 基金资助:
    本文受国家自然科学基金(61402225,61373015,41301407),江苏省自然科学基金(BK20140832),中国博士后基金(2013M540447),智能电网保护和运行控制国家重点实验室项目资助。

Distributed Spatial Keyword Query Processing Algorithm with Relational Attributes

XU Zhe, LIU Liang, QIN Xiao-lin, QIN Wei-meng   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China
  • Online:2019-06-14 Published:2019-07-02

摘要: 移动互联网、物联网的快速发展产生了大量带关系属性的空间文本对象数据。面向网页文本数据的搜索引擎仅支持文本关键词查询,无法处理包含地理位置信息、文本信息、关系属性的混合数据。现有面向空间关键字的查询处理技术未将关系属性作为过滤条件,且是基于单机实现的,无法满足查询性能的要求。为解决上述问题,提出了一种新颖的将关系属性、空间和关键字3种属性映射成文本数据的Baseline算法(Baseline Algorithm of Distributed Keywords and Location-aware with Relational Attributes Query,BADKLRQ),利用分布式倒排文本索引对转换后的文本数据进行并行索引。针对带关系属性、空间和关键字的查询请求,将查询请求转换成映射空间中的多个文本关键字,对转换后的文本数据进行查询,并提出基于Baseline算法的改进算法MGDKLRQ,以改进空间属性转换成文本关键字的算法。实验结果表明,在索引时间和查询时间上,BADKLRQ算法比现有算法提升了10%~15%,MGDKLRQ算法比现有算法提升了20%~30%。

关键词: 范围查询, 分布式索引, 关系属性, 空间关键字

Abstract: The rapid growth of the mobile internet and the internet of things generates a large amount data of spatial text object with relational attributes.Search engines for webpage text data can efficiently store and index textual data,but only support textual keyword queries.However mixed data including geographic location information,textual information,and relational attributes cannot be processed.Existing query-processing techniques for space-oriented keywords do not consider relation attributes as filter conditions.And those techniques are based on stand-alone implementation and cannot meet query performance requirements.In order to solve the above problems,this paper proposed a novel Baseline algorithm named BADKLRQ (Baseline Algorithm of Distributed Keywords and Location-aware with Relational Attributes Query) that maps attributes of relation attributes,space,and keywords into text data.The row text index indexes the converted text data.For query requests with relation attributes,space,and keywords,the query request is also converted into a plurality of text keywords in the mapping space,and the converted text data is queried.And an improved algorithm based on Baseline algorithm MGDKLRQ is proposed to improve the algorithm of converting spatial attributes into text keywords.Experiments show that the BADKLRQ algorithm improves by 10% to 15% and MGDKLRQ algorithm improves by 20% to 30% over the existing algorithm in terms of index time and query time.

Key words: Distributed index, Range query, Relation attributes, Spatial keywords

中图分类号: 

  • TP311
[1]WU D,CONG G,JENSEN C S.A framework for efficient spatial web object retrieval[M].Springer-Verlag New York,Inc.2012.
[2]CONG G,JENSEN C S,WU D.Efficient retrieval of the top-k most relevant spatial web objects[C]∥Proceedings of the VLDB Endowment.2009:337-348.
[3]AHN J,JO B,JUNG S.Multiple Domain-Based Spatial Keyword Query Processing Method Using Collaboration of Multiple IR-Trees[C]∥Proceedings of the 7th International Conference on Emerging Databases.Springer,Singapore,2018:183-192.
[4]ZHENG K,SU H,ZHENG B,et al.Interactive top-k spatial keyword queries[C]∥2015 IEEE 31st International Conference on Data Engineering (ICDE).IEEE,2015:423-434.
[5]ZHANG D,CHAN C Y,TAN K L.Processing spatial keyword query as a top-k aggregation query[C]∥Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval.ACM,2014:355-364.
[6]LIU X,CHEN L,WAN C.LINQ:A framework for location-aware indexing and query processing[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(5):1288-1300.
[7]GUTTMAN A.R-trees:a dynamic index structure for spatial searching[J].ACM SIGMOD Record,2016,14(2):47-57.
[8]ZANG X,HAO P,GAO X,et al.QDR-Tree:An Efcient Index Scheme for Complex Spatial Keyword Query[J].arXiv preprint arXiv:1804.10726,2018.
[9]JUNG H R,YONG S K,CHUNG Y D.QR-tree:An efficient and scalable method for evaluation of continuous range queries[J].Information Sciences,2014,274(8):156-176.
[10]NAIR S H,SINHA A,VACHHANI L.Hilbert’s space-filling curve for regions with holes[C]∥2017 IEEE 56th Annual Conference on Decision and Control (CDC).IEEE,2017:313-319.
[11]CHEN Y Y,SUEL T,MARKOWETZ A.Efficient query processing in geographic web search engines[C]∥Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data.ACM,2006:277-288.
[12]KHODAEI A,SHAHABI C,LI C.Hybrid indexing and seamless ranking of spatial and textual features of web documents[C]∥International Conference on Database and Expert Systems Applications.Springer-Verlag,2010:450-466.
[13]SANTOKI K.Indexing and Searching on a Hadoop Distributed File System[EB/OL].http://www.drdobbs.com/parallel/indexing-and_searching-on-a-hadoop-distr/226300241?pgno=3.
[14]CHRISTOFORAKI M,HE J,DIMOPOULOS C,et al.Text vs.space:efficient geo-search query processing[C]∥Proceedings of the 20th ACM International Conference on Information and Knowledge Management.ACM,2011:423-432.
[15]GÖBEL R,HENRICH A,NIEMANN R,et al.A hybrid index structure for geo-textual searches[C]∥Proceedings of the 18th ACM conference on Information and Knowledge Management.ACM,2009:1625-1628.
[16]WU D,MAN L Y,JENSEN C S,et al.Efficient continuously moving top-k spatial keyword query processing[C]∥IEEE,International Conference on Data Engineering.IEEE,2011:541-552.
[1] 郭帅,刘亮,秦小麟.
用户偏好约束的空间关键词范围查询处理方法
Spatial Keyword Range Query with User Preferences Constraint
计算机科学, 2018, 45(4): 182-189. https://doi.org/10.11896/j.issn.1002-137X.2018.04.031
[2] 董天阳, 尚跃辉, 程强.
方向感知的路网移动对象范围查询算法
Direction-aware Moving Object Range Query Algorithm in Road Network
计算机科学, 2018, 45(11): 210-219. https://doi.org/10.11896/j.issn.1002-137X.2018.11.033
[3] 袁鑫攀,汪灿飞,龙军,彭成.
CS-Chord:基于聚类分离的分布式高维向量索引
CS-Chord:Distributed High Dimensional Vector Index Based on Clustering Separation
计算机科学, 2017, 44(Z11): 494-497. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.105
[4] 刘怀进,陈永红,田晖,王田,蔡奕侨.
两层无线传感网中节能的安全范围查询方法
Privacy and Integrity Protection Range Query Processing in Two-tiered Wireless Sensor Networks
计算机科学, 2016, 43(Z11): 393-397. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.090
[5] 李艳红,黄群,蒋宏,李国徽.
路网中空间关键字连续范围查询算法研究
Research on Processing Continuous Spatial Keyword Range Queries in Road Networks
计算机科学, 2014, 41(7): 232-235. https://doi.org/10.11896/j.issn.1002-137X.2014.07.048
[6] 唐李洋,倪志伟,李应.
基于Cassandra的可扩展分布式反向索引的构建
Scalable Distributed Inverted Index Built on Cassandra
计算机科学, 2011, 38(6): 187-190.
[7] 吴炜,苏永红,李瑞轩,卢正鼎.
基于DHT的分布式索引技术研究与实现
Research and Implementation of Distributed Index Based on DHT
计算机科学, 2010, 37(2): 65-70.
[8] 吴凌坤 汤庸 王鹏 舒然.
SA:一种有利于多属性范围查询的多维聚簇方法

计算机科学, 2009, 36(6): 133-137.
[9] 傅向华 彭小刚 王志强 明仲.
基于分布式范围树的结构化P2P多维范围查询

计算机科学, 2007, 34(8): 69-71.
[10] 师智斌 黄厚宽.
数据立方体聚集范围查询分块方法研究

计算机科学, 2007, 34(12): 93-96.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!