计算机科学 ›› 2017, Vol. 44 ›› Issue (3): 42-47.doi: 10.11896/j.issn.1002-137X.2017.03.011

• 2015全国高性能计算学术年会 • 上一篇    下一篇

气象数据检索区域查询优化及并行算法设计

许婧,任开军,李小勇   

  1. 国防科技大学计算机学院 长沙410073;国防科技大学海洋科学与工程研究院 长沙410073,国防科技大学海洋科学与工程研究院 长沙410073,国防科技大学海洋科学与工程研究院 长沙410073
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家公益性行业(气象)科研专项(GYHY201306003),国家自然科学基金资助

Parallel Algorithm Design and Optimization of Range Query for Meteorological Data Retrieval

XU Jing, REN Kai-jun and LI Xiao-yong   

  • Online:2018-11-13 Published:2018-11-13

摘要: 随着数值天气预报水平和分辨率的不断提高,气象科学数据呈海量增长趋势,导致气象资料归档与检索系统(MARS)处理大数据服务请求的效率较低。针对此情况,开展了基于MARS检索区域查询方式的优化研究,结合数学补集思想与多路数组聚集计算原理,提出了一种高效的补集转换区域查询方法(CTRQ),从而实现大范围区域查询下的“大数据”计算转换为“小数据”计算。其基本思路是通过超立方体聚集维尺寸与区域查询服务请求的属性值集合大小比较,执行“过半求补”的索引计算操作,利用二次求补实现气象场数据物理存储信息的检索。实验表明,相比原始的索引计算方法,该方法能够有效降低数据检索时元数据索引计算的系统开销。在此基础上,结合并行处理方法,设计并实现了CTRQ并行算法,相比其改进后的串行算法最大获得1.9倍加速比,进一步提高了MARS的检索效率。

关键词: MARS,超立方体,区域查询,元数据索引计算,并行处理

Abstract: With continuous improvement of numerical weather prediction technology and resolution,meteorological data shows massive growth trend,resulting in less efficient meteorological archival and retrieval system (MARS) on large data service requests.Aiming at this issue,we carried out the research on optimization for region query based on retrie-val in MARS,and proposed an efficient method through complement transform range query(CTRQ) by utilizing the complement ideas of mathematics and calculating principle of multi array aggregation,which transforms “big data” to “small data” in extensive range query.The basic idea is to calculate the rest by comparing the size of aggregation dimension in hypercube with attribute values set in query service request when indexes have more than half,and to second calculate the complement of physically stored information of meteorological data in retrieval.Experiment results show that comparing with the original index calculation method,CTRQ can effectively reduce metadata index computation overhead in data retrieval.On this basis,combining with parallel processing method,we designed and implemented CTRQ parallel algorithm,and attracted 1.9 times at maximum speedup ratio compared with the improved serial algorithm,to further improve the retrieval efficiency of Mars.

Key words: MARS,Hypercube,Range query,Metadata index computation,Parallel processing

[1] RAOULT B.Architecture of the new MARS server[EB/OL].[2015-06-01].http://old.ecmwf.int/archive/publications/ma-nuals/mars/server.pdf.
[2] SARAWAGI S,AGRAWAL R,MEGIDDO N.Discovery-driven Exploration of OLAP Data Cubes[J].Lecture Notes in Compu-ter Science,1998,1377:168-182.
[3] HAN J,KAMBER M.Data Mining:Concepts and Techniques.Second Edition[J].San Francisco,2006(1):1-25.
[4] GRAY J,CHAUDHURI S,BOSWORTH A,et al.Data cube:A relational aggregation operator generalizing group-by,cross-tab,and sub-totals[J].Data Mining and Knowledge Discovery,1997,1(1):29-53.
[5] SHAPIRO M A,THORPE A J.THORPEX International Scien-ce Plan1[J].Boletín De La Organización Meteorológica Mun-dial,2004,6(11):238-242.
[6] SHEN W H,ZHAO F,GAO H Y,et al.The construction of national meteorological archival and retrieval system [J].Journal of Applied Meteorological Science,2004,15(6):727-736.(in Chinese) 沈文海,赵芳,高华云,等.国家级气象资料存储检索系统的建立[J].应用气象学报,2004,5(6):727-736.
[7] HO C T,AGRAWAL R, MEGIDDO N,et al.Range queries in OLAP data cubes[J].ACM Sigmod Record,1970,6(2):73-88.
[8] HONG S,SONG B,LEE S.Efficient Execution of Range-Aggregate Queries in Data Warehouse Environments[M]∥International Symposium on Requirements for Poultry Virus Vaccines.S.Karger,1974:299-310.
[9] AGARWAL S,AGRAWAL R,DESHPANDE P M,et al.Onthe computation of multidimension alaggre gates[C]∥VLDB.1996:506-521.
[10] ZHAO Y,DESHPANDE P M,NAUGHTON J F.An array-based algorithm for simultaneous multidimensional aggregates[C]∥Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data.1997:159-170.
[11] XUE Y S,HUANG Z H,DUAN J J,et al.An Efficient Method for Parallel Multi-Dimensional Join and Aggregation[J].Journal of Computer Research and Development,2004,41(10):1661-1669.(in Chinese) 薛永生,黄震华,段江娇,等.一种并行处理多维连接和聚集操作的有效方法[J].计算机研究与发展,2004,41(10):1661-1669.
[12] BOOCH G.Object-oriented development[J].IEEE Transactions on Software Engineering,1986,12(2):211-221.
[13] WU P,XU H P,CHEN H G .Application of Object-oriented for Metadata Research[J].Journal of Tongji University (Natural Science),2010,38(11):145-151.(in Chinese) 吴萍,许惠平,陈华根.面向对象方法在元数据研究中的应用[J].同济大学学报(自然科学版),2010,38(11):145-151.
[14] SHENG L I,WANG S.Star Cube——An Approach to Implementing Data Cube Efficiently[J].Journal of Computer Research & Development,2004,41(4):587-593.
[15] GUTTMAN A.R-trees:A dynamic index structure for spatialsearching[C]∥ Proc.of the ACM SIGMOD International Conference on Management of Data.1984:47-57.
[16] LI J,ROTEM D,SRIVASTAVA J.Aggregation Algorithms for Very Large Compressed Data Warehouses[C]∥Proceeding of the 25th VLDB Conference.1999:651-662.
[17] SONG S L,SONG J Q,REN K J.Design of a parallel algorithm for data cube of MARS[J].Computer Engineering & Science,2014,6(12):2410-2417.(in Chinese) 宋石磊,宋君强,任开军.气象数据归档与查询系统超立方体结构并行算法设计[J].计算机工程与科学,2014,6(12):2410-2417.
[18] SATO M.OpenMP:parallel programming API for shared me-mory multiprocessors and on-chip multiprocessors[C]∥International Symposium on System Synthesis.IEEE,2002:109-111.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!