Computer Science ›› 2019, Vol. 46 ›› Issue (4): 44-49.doi: 10.11896/j.issn.1002-137X.2019.04.007

• Big Data & Data Science • Previous Articles     Next Articles

Massive Data Parallel Query Platform Based on Distributed Shared-nothing Architecture

QIN Dong-ming1, YU Jian1, ZHANG Bo2, ZHAO Qin1,2   

  1. Key Laboratory of Embedded System and Service Computing of Ministry of Education,Tongji University,Shanghai 200092,China1
    College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 200234,China2
  • Received:2018-03-17 Online:2019-04-15 Published:2019-04-23

Abstract: In view of the challenges of data loading and parallel query controlling in massive data query systems,this paper proposed a massive parallel data query platform based on distributed shared-nothing architecture.The platform uses the distributed shared-nothing architecture to support the unified processing of structured and unstructured data in massive data query,and then achieves the aggregated calculation of data in the platform.The key technologies of the proposed platform are as follows.Firstly,the platform provides cross-platform storage and unified data loading of multiple types of data.Then,a multiple-node data query task flow distribution technology is proposed based on load balancing,and a global query execution strategy is generated.Finally,the platform uses Hash and Range methods to achieve parallel controlling for the query task flow.According to the performance verification of the proposed platform,the query time consumption of this platform is saved by 40% compared with nonparallel method.The experimental results show that this platform has good performance in the accuracy,reliability and concurrency of massive data query.

Key words: Data loading, Massive data query, Parallel query, Shared-nothing architecture

CLC Number: 

  • TP391
[1]WANG C K,MENG X F.Relational Query Techniques for Distributed Data Stream:A Survey[J].Chinese Journal of Compu-ters,2016,39(1):80-96.(in Chinese) 王春凯,孟小峰.分布式数据流关系查询技术研究[J].计算机学报,2016,39(1):80-96.
[2]YANG B,YANG B,CHI Y,et al.Enabling Smart Transportation Systems:A Parallel Spatio-Temporal Database Approach[J].IEEE Transactions on Computers,2016,65(5):1377-1391.
[3]GAO Q,ZHANG F L,WANG R J,et al.Trajectory Big Data:A Review of Key Technologies in Data Processing[J].Journal of Software,2017,28(4):959-992.(in Chinese) 高强,张凤荔,王瑞锦,等.轨迹大数据:数据处理关键技术研究综述[J].软件学报,2017,28(4):959-992.
[4]TÖZÜN P.Scalable and dynamically balanced shared-everything OLTP with physiological partitioning[J].Vldb Journal,2013,22(2):151-175.
[5]OHN K,CHO H.Path conscious caching of B tree indexes in a shared disks cluster[J].Journal of Parallel & Distributed Computing,2007,67(3):286-301.
[6]JIN S D,FENG Y C.Parallel DBMS Architecture Supporting Shared-Nothing Computers[J].Computer Research andDeve-lopment,1998,35(6):520-524.(in Chinese) 金树东,冯玉才.支持无共享结构的并行DBMS软件结构[J].计算机研究与发展,1998,35(6):520-524.
[7]CHUNG W,PARK S Y,BAE H Y.Efficient Parallel Spatial Join Processing Method in a Shared-Nothing Database Cluster System[J].Lecture Notes in Computer Science,2003,3605(4):81-87.
[8]DEWITT D J,NAUGHTON J F,SCHNEIDER D A.Parallel sorting on a shared-nothing architecture using probabilistic splitting[C]∥International Conference on Parallel and Distributed Information Systems.IEEE,1991:280-291.
[9]CHAKRABORTY A,SINGH A.Parallelizing Windowed Stre- am Joins in a Shared-Nothing Cluster[J].IEEE,2013:1-5.
[10]YANG J J,LIAO Z F,FENG C C.Survey on Big Data Storage Framework and Algorithm[J].Journal of Computer Application,2016,26(9):2465-2471.(in Chinese) 杨俊杰,廖卓凡,冯超超.大数据存储架构和算法研究综述[J].计算机应用,2016,36(9):2465-2471.
[11]CI X,MA Y Z,MENG X F.Method for Top-K Query on Big Data in Cloud[J].Journal of Software,2014,(4):813-825.(in Chinese) 慈祥,马友忠,孟小峰.一种云环境下的大数据Top-K查询方法[J].软件学报,2014,25(4):813-825.
[12]ZHU R,WANG B,YANG X C,et al.Indexing Probabilistic Data for supporting Range Query over Big Data[J].Chinese Journal of Computers,2016,39(10):1929-1946.(in Chinese) 朱睿,王斌,杨晓春,等.大数据环境下支持概率数据范围查询索引的研究[J].计算机学报,2016,39(10):1929-1946.
[13]ZHAO Y R,WANG W P,MENG D,et al.Efficient Join Query Processing Algorithm CHMJ Based on Hadoop[J].Journal of Software,2012,23(8):2032-2041.(in Chinese) 赵彦荣,王伟平,孟丹,等.基于Hadoop的高效连接查询处理算法CHMJ[J].软件学报,2012,23(8):2032-2041.
[14]REN Z,WAN J,SHI W,et al.Workload Analysis,Implications,and Optimization on a Production Hadoop Cluster:A Case Study on Taobao[J].IEEE Transactions on Services Computing,2014,7(2):307-321.
[15]ZELENKAUSKAITE A,SIMOES B.Big data through cross- platform interest-based interactivity[C]∥International Conference on Big Data and Smart Computing.IEEE,2014:191-196 [J].Journal of Software,2014,37(1):189-206.
[16]YANG Z K.The Architecture of OceanBase Relational Database System[J].Journal of East Normal University (Natrual Science),2014(5):141-148.(in Chinese) 阳振坤.OceanBase 关系数据库架构[J].华东师范大学学报 (自然科学版),2014(5):141-148.
[17]GUPTA K,JAIN R,KOLTSIDAS I,et al.GPFS-SNC:An enterprise storage framework for virtual-machine clouds[J].IBM Journal of Research and Development,2011,55(6):2:1-2:10.
[1] WU Tong and TAN Guang-wei. Real-time Data Loading of Dynamic Data Warehouse Using Index View Set [J]. Computer Science, 2016, 43(Z6): 493-496.
[2] TAN Guang-wei and WU Tong. Study on Method of Data Warehouse Real-time Data Updating Based on Mechanism of CDC [J]. Computer Science, 2015, 42(Z6): 546-548.
[3] YU Zhi-bin and ZHOU Yan-hui. Keyword-based Privacy-preserving Retrieval over Cloud Encrypted Data [J]. Computer Science, 2015, 42(Z6): 365-369.
[4] XU Bin and GUAN Ji-hong. GML Parallel Query Based on MapReduce [J]. Computer Science, 2013, 40(11): 203-207.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!