计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 44-49.doi: 10.11896/j.issn.1002-137X.2019.04.007

• 大数据与数据科学 • 上一篇    下一篇

基于分布式无共享架构的海量数据并行查询平台

秦东明1, 喻剑1, 张波2, 赵勤1,2   

  1. 同济大学嵌入式系统与服务计算教育部重点实验室 上海2000921
    上海师范大学信息与机电工程学院 上海2002342
  • 收稿日期:2018-03-17 出版日期:2019-04-15 发布日期:2019-04-23
  • 通讯作者: 赵 勤(1982-),男,博士,讲师,CCF会员,主要研究方向为社交网络分析、数据挖掘、机器学习,E-mail:q_zhao@shnu.edu.cn(通信作者)
  • 作者简介:秦东明(1981-),男,博士生,工程师,主要研究方向为大数据处理、服务计算;喻 剑(1975-),男,博士,工程师,主要研究方向为数据挖掘、服务计算;张 波(1978-),男,博士,教授,主要研究方向为机器学习、社交网络分析,大数据分析;
  • 基金资助:
    本文受国家重点研发计划高性能计算专项(2016YFB0200300),国家自然科学基金(61572326,61702333),同济大学嵌入式系统与服务计算教育部重点实验室开放课题(ESSCKF 2016-01),上海市科委地方院校能力建设项目(17070502800)资助。

Massive Data Parallel Query Platform Based on Distributed Shared-nothing Architecture

QIN Dong-ming1, YU Jian1, ZHANG Bo2, ZHAO Qin1,2   

  1. Key Laboratory of Embedded System and Service Computing of Ministry of Education,Tongji University,Shanghai 200092,China1
    College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 200234,China2
  • Received:2018-03-17 Online:2019-04-15 Published:2019-04-23

摘要: 针对海量数据查询所面对的数据加载和并行查询控制等难题,提出了一种基于分布式无共享架构的海量数据并行查询平台。该平台利用分布式无共享架构为海量数据查询提供结构化与非结构化数据的统一处理,实现平台内数据的聚合计算。平台的核心技术如下:首先提供了多类型数据的跨平台存储与统一数据加载;然后给出了基于负载均衡的多节点数据查询任务流分配技术,生成全局查询执行策略;最后采用Hash和Range两种方式实现查询任务流的并发控制。根据测试验证,本技术在查询时间上相比于无并行方式节约了近40%。实验结果表明,该技术在海量数据查询的正确性、可靠性、并发性上具有较好的性能。

关键词: 并发查询, 海量数据查询, 数据加载, 无共享结构

Abstract: In view of the challenges of data loading and parallel query controlling in massive data query systems,this paper proposed a massive parallel data query platform based on distributed shared-nothing architecture.The platform uses the distributed shared-nothing architecture to support the unified processing of structured and unstructured data in massive data query,and then achieves the aggregated calculation of data in the platform.The key technologies of the proposed platform are as follows.Firstly,the platform provides cross-platform storage and unified data loading of multiple types of data.Then,a multiple-node data query task flow distribution technology is proposed based on load balancing,and a global query execution strategy is generated.Finally,the platform uses Hash and Range methods to achieve parallel controlling for the query task flow.According to the performance verification of the proposed platform,the query time consumption of this platform is saved by 40% compared with nonparallel method.The experimental results show that this platform has good performance in the accuracy,reliability and concurrency of massive data query.

Key words: Data loading, Massive data query, Parallel query, Shared-nothing architecture

中图分类号: 

  • TP391
[1]WANG C K,MENG X F.Relational Query Techniques for Distributed Data Stream:A Survey[J].Chinese Journal of Compu-ters,2016,39(1):80-96.(in Chinese) 王春凯,孟小峰.分布式数据流关系查询技术研究[J].计算机学报,2016,39(1):80-96.
[2]YANG B,YANG B,CHI Y,et al.Enabling Smart Transportation Systems:A Parallel Spatio-Temporal Database Approach[J].IEEE Transactions on Computers,2016,65(5):1377-1391.
[3]GAO Q,ZHANG F L,WANG R J,et al.Trajectory Big Data:A Review of Key Technologies in Data Processing[J].Journal of Software,2017,28(4):959-992.(in Chinese) 高强,张凤荔,王瑞锦,等.轨迹大数据:数据处理关键技术研究综述[J].软件学报,2017,28(4):959-992.
[4]TÖZÜN P.Scalable and dynamically balanced shared-everything OLTP with physiological partitioning[J].Vldb Journal,2013,22(2):151-175.
[5]OHN K,CHO H.Path conscious caching of B tree indexes in a shared disks cluster[J].Journal of Parallel & Distributed Computing,2007,67(3):286-301.
[6]JIN S D,FENG Y C.Parallel DBMS Architecture Supporting Shared-Nothing Computers[J].Computer Research andDeve-lopment,1998,35(6):520-524.(in Chinese) 金树东,冯玉才.支持无共享结构的并行DBMS软件结构[J].计算机研究与发展,1998,35(6):520-524.
[7]CHUNG W,PARK S Y,BAE H Y.Efficient Parallel Spatial Join Processing Method in a Shared-Nothing Database Cluster System[J].Lecture Notes in Computer Science,2003,3605(4):81-87.
[8]DEWITT D J,NAUGHTON J F,SCHNEIDER D A.Parallel sorting on a shared-nothing architecture using probabilistic splitting[C]∥International Conference on Parallel and Distributed Information Systems.IEEE,1991:280-291.
[9]CHAKRABORTY A,SINGH A.Parallelizing Windowed Stre- am Joins in a Shared-Nothing Cluster[J].IEEE,2013:1-5.
[10]YANG J J,LIAO Z F,FENG C C.Survey on Big Data Storage Framework and Algorithm[J].Journal of Computer Application,2016,26(9):2465-2471.(in Chinese) 杨俊杰,廖卓凡,冯超超.大数据存储架构和算法研究综述[J].计算机应用,2016,36(9):2465-2471.
[11]CI X,MA Y Z,MENG X F.Method for Top-K Query on Big Data in Cloud[J].Journal of Software,2014,(4):813-825.(in Chinese) 慈祥,马友忠,孟小峰.一种云环境下的大数据Top-K查询方法[J].软件学报,2014,25(4):813-825.
[12]ZHU R,WANG B,YANG X C,et al.Indexing Probabilistic Data for supporting Range Query over Big Data[J].Chinese Journal of Computers,2016,39(10):1929-1946.(in Chinese) 朱睿,王斌,杨晓春,等.大数据环境下支持概率数据范围查询索引的研究[J].计算机学报,2016,39(10):1929-1946.
[13]ZHAO Y R,WANG W P,MENG D,et al.Efficient Join Query Processing Algorithm CHMJ Based on Hadoop[J].Journal of Software,2012,23(8):2032-2041.(in Chinese) 赵彦荣,王伟平,孟丹,等.基于Hadoop的高效连接查询处理算法CHMJ[J].软件学报,2012,23(8):2032-2041.
[14]REN Z,WAN J,SHI W,et al.Workload Analysis,Implications,and Optimization on a Production Hadoop Cluster:A Case Study on Taobao[J].IEEE Transactions on Services Computing,2014,7(2):307-321.
[15]ZELENKAUSKAITE A,SIMOES B.Big data through cross- platform interest-based interactivity[C]∥International Conference on Big Data and Smart Computing.IEEE,2014:191-196 [J].Journal of Software,2014,37(1):189-206.
[16]YANG Z K.The Architecture of OceanBase Relational Database System[J].Journal of East Normal University (Natrual Science),2014(5):141-148.(in Chinese) 阳振坤.OceanBase 关系数据库架构[J].华东师范大学学报 (自然科学版),2014(5):141-148.
[17]GUPTA K,JAIN R,KOLTSIDAS I,et al.GPFS-SNC:An enterprise storage framework for virtual-machine clouds[J].IBM Journal of Research and Development,2011,55(6):2:1-2:10.
[1] 武彤,谭光炜.
基于索引视图实现动态数据仓库的实时数据加载
Real-time Data Loading of Dynamic Data Warehouse Using Index View Set
计算机科学, 2016, 43(Z6): 493-496. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.116
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!