计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 42-52.doi: 10.11896/j.issn.1002-137X.2018.07.007

• 第五届CCF 大数据学术会议 • 上一篇    下一篇

基于HBase的支持频繁更新与多用户并发的R树

王波涛,梁伟,赵凯利,钟汉辉,张玉圻   

  1. 东北大学计算机科学与工程学院 沈阳110169
  • 收稿日期:2017-07-15 出版日期:2018-07-30 发布日期:2018-07-30
  • 作者简介:王波涛(1968-),男,博士,教授,CCF会员,主要研究领域为隐私保护、大数据、云计算、基于位置服务,E-mail:wangbotao@ise.neu.edu.cn(通信作者);梁 伟(1993-),女,硕士生,主要研究领域为机器学习、大数据,E-mail:1713900300@qq.com;赵凯利(1992-),女,硕士生,主要研究领域为移动大数据、时空索引,E-mail:1335888327@qq.com;钟汉辉(1993-),男,硕士生,主要研究领域为大数据、云计算,E-mail:1596694976@qq.com;张玉圻(1994-),女,硕士生,主要研究领域为大数据、云计算,E-mail:zhangyuqi1994@163.com。
  • 基金资助:
    本文受国家自然科学基金:面向动态位置服务的移动查询处理与优化技术(61173030)资助。

R-tree for Frequent Updates and Multi-user Concurrent Accesses Based on HBase

WANG Bo-tao,LIANG Wei,ZHAO Kai-li,ZHONG Han-hui,ZHANG Yu-qi   

  1. School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China
  • Received:2017-07-15 Online:2018-07-30 Published:2018-07-30

摘要: 基于位置服务的应用已经进入大数据时代,传统基于位置服务的技术面临系统扩展性、性能等方面的挑战。云计算技术是大数据处理的基础,索引是优化查询的重要手段。尽管目前已存在大量的研究成果,但尚未有HBase上的支持频繁更新与多用户并发的R树索引。针对移动对象索引的频繁更新与多用户并发的需求,文中提出了基于HBase的支持频繁更新与多用户并发的R树索引,它只索引包含移动对象的网格,避免了频繁更新问题;进一步基于HBase的数据行与数据分区的组织与读写特性,对R树的节点进行重组,并对网格Z-order编码,从而减少了对HBase的读写操作,提高了查询效率;最后提出了基于ZooKeeper分布式读写锁的优化策略,提高了索引的吞吐量。实验结果表明,与网格索引相比,在数据非均匀的情况下,所提策略的查询吞吐量提高了25%~50%,更新吞吐量约在同一数量级;与分布式共享锁索引相比,分布式读写锁索引的吞吐量提高了近40%。

关键词: HBase, R树, 基于位置服务, 移动对象索引

Abstract: Application based on location based service (LBS) has entered the era of big data.Traditional location based service techniques face new challenges such as scalability,performance,etc.Cloud computing technology is the basis of big data processing and index is an important way to optimize query.Although there exist a large number of research results,as far as we know,there is no R-tree index which supports frequent updates and multi-user concurrent accesses based on HBase.According to the above characteristics of moving objects,this paper proposed a new R-tree index which supports frequent updates and multi-user concurrent accesses based on HBase.In this new index,the R-tree only indexes the grid which contains the moving object to avoide the problem of frequent updating effectively.Furthermore,based on the organization of HBase data rows and I/O characteristics of data partitions,this paper reorganized the nodes and encoded the grid cells with z-order,which reduce the read and write operations of HBase and improve the query efficiency.Finally,it proposed an optimization strategy for distributed read and write locks based on Zookeeper,which improves the throughput of new indexes.The experimental results show that the query throughput of the proposed strategy is improved by 25%~50% and the update throughput is about the same level in the case of uneven data compared with the grid index.Compared with the index using distri-buted shared locks,the query throughput of the index using distributed read and write locksis increased by nearly 40%.

Key words: HBase, Location based service, Moving object index, R-tree

中图分类号: 

  • TP315
[1]KHAN A U R,OTHMAN M,MADANI S A,et al.A Survey of Mobile Cloud Computing Application Models.IEEE Communications Surveys & Tutorials,2014,16(1):393-413.
[2]WOLFSON O.Moving Objects Information Management:TheDatabase Challenge[M]∥Next Generation Information Technologies and Systems.Springer Berlin Heidelberg,2002:75-89.
[3]WANG S,WANG H J,QIN X P,et al.Architecture of Big Data:Challenges, Status and Prospects.Chinese Journal of Computers,2011,34(10):1741-1752.(in Chinese)
王珊,王会举,覃雄派,等.架构大数据:挑战、现状与展望.计算机学报,2011,34(10):1741-1752.
[4]CARSTOIU D,LEPADATU E,GASPAR M.Hbase-non SQL Database,Performances Evaluation.International Journal of Advancements in Computing Technology,2010,2(5):42-52.
[5]ZHOU X,ZHANG X,WANG Y,et al.Efficient DistributedMulti-dimensional Index for Big Data Management[M]∥Web-Age Information Management.Springer Berlin Heidelberg,2013:130-141.
[6]TAKASU A.An Efficient Distributed Index for Geospatial Databases∥Database and Expert Systems Applications.Springer International Publishing,2015:28-42.
[7]ZHOU X,LI H,ZHANG X,et al.ABR-Tree:An Efficient Distributed Multidimensional Indexing Approach for Massive Data∥International Conference on Algorithms and Architectures for Parallel Processing.Springer,Cham,2015:781-790.
[8]HUANG S,WANG B,ZHU J,et al.R-HBase:A Multi-dimensional Indexing Framework for Cloud Computing Environment[C]∥IEEE International Conference on Data Mining Workshop.IEEE,2015:569-574.
[9]HUANG S,WANG B,DENG S,et al.HMVR-tree:A Multi-version R-tree Based on HBase for Concurrent Access[M]∥Big Data Computing and Communications.Springer International Publishing,2016.
[10]XIA Y,HUANG Z,ZHANG X,et al.Parallel Indexing forPast,Current and Future Locations of Moving Objects∥International Conference on Service Sicence,Technology and Engineering.DEStech Publications,2016:20-27.
[11]DU N,ZHAN J,ZHAO M,et al.Spatio-temporal data index model of moving objects on fixed networks using hbase∥2015 IEEE International Conference on Computational Intelligence & Communication Technology (CICT).IEEE,2015:247-251.
[12]GEORGE L.HBase:The Definitive Guide.The People’s Posts and Telecommunications Press,2013.(in Chinese)
GEORGE L.Hbase权威指南.北京:人民邮电出版社,2013.
[13]Zookeeper.https://zookeeper.apache.org.
[14]WANG B T,ZHAO K L,CHANG L D,et al. OptimizationTechnique fot Continuous Range Query Based on Storm.Computer Science and Engineering,2017,39(1):1-14.(in Chinese)
王波涛,赵凯利,常立东,等.基于Storm的连续范围查询优化技术.计算机工程与科学,2017,39(1):1-14.
[15]Apache Storm.http://storm.apache.org.
[1] 邵子灏, 杨世宇, 马国杰.
室内信息服务的基础——低成本定位技术研究综述
Foundation of Indoor Information Services:A Survey of Low-cost Localization Techniques
计算机科学, 2022, 49(9): 228-235. https://doi.org/10.11896/jsjkx.210900260
[2] 徐江峰谭玉龙.
基于机器学习的HBase配置参数优化研究
Research on HBase Configuration Parameter Optimization Based on Machine Learning
计算机科学, 2020, 47(6A): 474-479. https://doi.org/10.11896/JsJkx.190900046
[3] 类兴邦,房俊.
基于融合数据库的海量传感器信息存储架构
Mass Sensor Information Storage Infrastructure Based on Fusion Database
计算机科学, 2016, 43(6): 68-71. https://doi.org/10.11896/j.issn.1002-137X.2016.06.014
[4] 宋华珠,段文军,刘翔.
基于HBase的本体存储模型
Ontology Storage Model Based on HBase
计算机科学, 2016, 43(6): 39-43. https://doi.org/10.11896/j.issn.1002-137X.2016.06.008
[5] 胡波,谭良.
HBase架构中RPC客户端的通信性能优化
Optimization of Communication Performance of RPC Client in HBase Architecture
计算机科学, 2016, 43(4): 97-101. https://doi.org/10.11896/j.issn.1002-137X.2016.04.019
[6] 刘树波,李艳敏,刘梦君.
基于密文检索的位置服务用户隐私保护方案
Privacy-preserving for Location-based Service over Encrypted Data Search
计算机科学, 2015, 42(4): 101-105. https://doi.org/10.11896/j.issn.1002-137X.2015.04.019
[7] 李松,崔环宇,张丽平,经海东.
基于CURE聚类算法的静态R树构建方法
Static R-tree Building Method Based on Cure Clustering Algorithm
计算机科学, 2015, 42(10): 193-197.
[8] 胡文领,王永利.
QR-TCM:具有质量保证的位置服务隐私保护模型
QR-TCM:A Privacy Protection Model with Quality Guarantee for Location-based Services
计算机科学, 2014, 41(4): 95-98.
[9] 汪璟玢,胡 烜.
DRR:一种多维案例检索优化算法研究
DDR:A Multidimensional Case Retrieval Optimization Algorithm
计算机科学, 2013, 40(3): 86-88.
[10] 强彦,卢军佐,刘涛,裴博.
基于HBase的并行BFS方法
HBase Based Parallel BFS Method
计算机科学, 2013, 40(3): 228-231.
[11] 胡 磊,王佳俊,倪巍伟.
一种基于坐标和的保护位置隐私近邻查询方法
Location Privacy Preserving Nearest Neighbor Querying Based on Coordinates Accumulation
计算机科学, 2012, 39(8): 173-177.
[12] 甘早斌,袁永光,赵贻竹,鲁宏伟.
基于DR-tree的室内移动对象索引研究
Indoor Moving Objects Index Research Based on DR-tree
计算机科学, 2012, 39(10): 177-181.
[13] 郑莹 王建新 陈建二.
Steiner Tree问题的研究进展
Survey of Steiner Tree Problem
计算机科学, 2011, 38(10): 16-22.
[14] 刘耿耿,王小溪,陈国龙,郭文忠,王少铃.
求解VLSI布线问题的离散粒子群优化算法
Discrete Particle Swarm Optimization Algorithm for the Routing of VLSI Circuit
计算机科学, 2010, 37(10): 197-201.
[15] .
基于密度的不确定性数据概率聚类

计算机科学, 2009, 36(5): 68-71.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!