计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 44-49.doi: 10.11896/j.issn.1002-137X.2019.04.007
秦东明1, 喻剑1, 张波2, 赵勤1,2
QIN Dong-ming1, YU Jian1, ZHANG Bo2, ZHAO Qin1,2
摘要: 针对海量数据查询所面对的数据加载和并行查询控制等难题,提出了一种基于分布式无共享架构的海量数据并行查询平台。该平台利用分布式无共享架构为海量数据查询提供结构化与非结构化数据的统一处理,实现平台内数据的聚合计算。平台的核心技术如下:首先提供了多类型数据的跨平台存储与统一数据加载;然后给出了基于负载均衡的多节点数据查询任务流分配技术,生成全局查询执行策略;最后采用Hash和Range两种方式实现查询任务流的并发控制。根据测试验证,本技术在查询时间上相比于无并行方式节约了近40%。实验结果表明,该技术在海量数据查询的正确性、可靠性、并发性上具有较好的性能。
中图分类号:
[1]WANG C K,MENG X F.Relational Query Techniques for Distributed Data Stream:A Survey[J].Chinese Journal of Compu-ters,2016,39(1):80-96.(in Chinese) 王春凯,孟小峰.分布式数据流关系查询技术研究[J].计算机学报,2016,39(1):80-96. [2]YANG B,YANG B,CHI Y,et al.Enabling Smart Transportation Systems:A Parallel Spatio-Temporal Database Approach[J].IEEE Transactions on Computers,2016,65(5):1377-1391. [3]GAO Q,ZHANG F L,WANG R J,et al.Trajectory Big Data:A Review of Key Technologies in Data Processing[J].Journal of Software,2017,28(4):959-992.(in Chinese) 高强,张凤荔,王瑞锦,等.轨迹大数据:数据处理关键技术研究综述[J].软件学报,2017,28(4):959-992. [4]TÖZÜN P.Scalable and dynamically balanced shared-everything OLTP with physiological partitioning[J].Vldb Journal,2013,22(2):151-175. [5]OHN K,CHO H.Path conscious caching of B tree indexes in a shared disks cluster[J].Journal of Parallel & Distributed Computing,2007,67(3):286-301. [6]JIN S D,FENG Y C.Parallel DBMS Architecture Supporting Shared-Nothing Computers[J].Computer Research andDeve-lopment,1998,35(6):520-524.(in Chinese) 金树东,冯玉才.支持无共享结构的并行DBMS软件结构[J].计算机研究与发展,1998,35(6):520-524. [7]CHUNG W,PARK S Y,BAE H Y.Efficient Parallel Spatial Join Processing Method in a Shared-Nothing Database Cluster System[J].Lecture Notes in Computer Science,2003,3605(4):81-87. [8]DEWITT D J,NAUGHTON J F,SCHNEIDER D A.Parallel sorting on a shared-nothing architecture using probabilistic splitting[C]∥International Conference on Parallel and Distributed Information Systems.IEEE,1991:280-291. [9]CHAKRABORTY A,SINGH A.Parallelizing Windowed Stre- am Joins in a Shared-Nothing Cluster[J].IEEE,2013:1-5. [10]YANG J J,LIAO Z F,FENG C C.Survey on Big Data Storage Framework and Algorithm[J].Journal of Computer Application,2016,26(9):2465-2471.(in Chinese) 杨俊杰,廖卓凡,冯超超.大数据存储架构和算法研究综述[J].计算机应用,2016,36(9):2465-2471. [11]CI X,MA Y Z,MENG X F.Method for Top-K Query on Big Data in Cloud[J].Journal of Software,2014,(4):813-825.(in Chinese) 慈祥,马友忠,孟小峰.一种云环境下的大数据Top-K查询方法[J].软件学报,2014,25(4):813-825. [12]ZHU R,WANG B,YANG X C,et al.Indexing Probabilistic Data for supporting Range Query over Big Data[J].Chinese Journal of Computers,2016,39(10):1929-1946.(in Chinese) 朱睿,王斌,杨晓春,等.大数据环境下支持概率数据范围查询索引的研究[J].计算机学报,2016,39(10):1929-1946. [13]ZHAO Y R,WANG W P,MENG D,et al.Efficient Join Query Processing Algorithm CHMJ Based on Hadoop[J].Journal of Software,2012,23(8):2032-2041.(in Chinese) 赵彦荣,王伟平,孟丹,等.基于Hadoop的高效连接查询处理算法CHMJ[J].软件学报,2012,23(8):2032-2041. [14]REN Z,WAN J,SHI W,et al.Workload Analysis,Implications,and Optimization on a Production Hadoop Cluster:A Case Study on Taobao[J].IEEE Transactions on Services Computing,2014,7(2):307-321. [15]ZELENKAUSKAITE A,SIMOES B.Big data through cross- platform interest-based interactivity[C]∥International Conference on Big Data and Smart Computing.IEEE,2014:191-196 [J].Journal of Software,2014,37(1):189-206. [16]YANG Z K.The Architecture of OceanBase Relational Database System[J].Journal of East Normal University (Natrual Science),2014(5):141-148.(in Chinese) 阳振坤.OceanBase 关系数据库架构[J].华东师范大学学报 (自然科学版),2014(5):141-148. [17]GUPTA K,JAIN R,KOLTSIDAS I,et al.GPFS-SNC:An enterprise storage framework for virtual-machine clouds[J].IBM Journal of Research and Development,2011,55(6):2:1-2:10. |
[1] | 武彤,谭光炜. 基于索引视图实现动态数据仓库的实时数据加载 Real-time Data Loading of Dynamic Data Warehouse Using Index View Set 计算机科学, 2016, 43(Z6): 493-496. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.116 |
|