计算机科学 ›› 2015, Vol. 42 ›› Issue (9): 183-190.doi: 10.11896/j.issn.1002-137X.2015.09.035

• 软件与数据库技术 • 上一篇    下一篇

面向批量处理的大数据检索过滤模型研究

李兆兴,马自堂   

  1. 解放军信息工程大学密码工程学院 郑州450000,解放军信息工程大学密码工程学院 郑州450000
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国防预研课题基金资助

Research on Big Data Retrieve Filter Model for Batch Processing

LI Zhao-xing and MA Zi-tang   

  • Online:2018-11-14 Published:2018-11-14

摘要: 大数据作为新的战略资源,在信息领域发挥着重要作用。大数据的检索规模往往达到十亿甚至百亿级,导致传统的查询机制效率低下成为常态。因此,提高大数据的查询效率、降低查询负担成为大数据研究的重要方面。为 此提出了一种面向批量处理的大数据检索过滤模型IMFM,介绍了其核心思想及工作原理,论证了IMFM对于多维查询的支持,并给出了IMFM的部署策略。在大数据索引结构中的适当位置部署该模型,在检索请求通过节点时对检索请求进行快速过滤,避免无关请求对节点下方索引结构的操作,从而降低检索对性能的消耗。实验证明,在大数据批量处理环境下,该模型可以有效缩短大数据一维和多维查询的路径长度,提高检索效率,大幅减轻大数据存储和处理平台的负担。

关键词: 大数据,检索,过滤,索引结构,多维查询

Abstract: As a new strategic resource,big data plays an important role in the field of information.The scale of big data retrieval often reaches billions or even ten billions,resulting in that traditional query mechanism’s low efficiency becomes regular.Therefore,improving the efficiency of big data query and reducing the burden of querying big data have become an important aspect of big data research.In order to speed up the querying of big data as well as reduce the burden,we proposed a big data retrieval filtering model IMFM of batch-oriented processing,demonstrated its support for multi-dimensional queries,and gave out the IMFM’s deployment strategy.By deploying the model in the appropriate position of the index structure,IMFM can filter the search requests that pass through the node quickly to avoid that the lower node is searched,so as to reduce the consumption of retrieval performance.Experiments show that,in the batch-oriented processing of big data environment,IMFM can effectively reduce the path length of both single and multi-dimensional data queries,improve the efficiency of retrieval and reduce the workload of big data storage and processing platform significantly.

Key words: Big data,Retrieve,Filter,Index architecture,Multi-dimensional query

[1] Manyika J,Chui M,Brown B,et al.Big data:The next frontier for innovation,competition and productivity[R].McKinsey Global Institute Report,2011
[2] Howe D,Costanzo M,Fey P.Big data:The future of biocuration[J].Nature,2008,455:47-50
[3] Balkir A S,Foster I,Rzhetsky A.A Distributed Look Up Architecture for Text mining Applications using MapReduce[C]∥Conference on High Performance Computing Networking,Sto-rage and Analysis,SC 2011.2011
[4] Andrew W,Shao Ming-long,Bisson T,et al.Spyglass:Fast sca-lable metadata search for large-scale storage systems[C]∥USENIX.2010
[5] 吴广君,王树鹏,陈明,等.海量结构化数据存储检索系统[J].计算机研究与发展,2012,5(1):1-6 Wu Guang-jun,Wang Shu-peng,Chen Ming,et al.Massive Structured Data Oriented Storage and Retrieve System[J].Journal of Computer Research and Development ,2012,5(1):1-6
[6] Hua Yu,Jiang Hong,Zhu Yi-feng,et al.SmartStore:a newmetadata organization paradigm with semantic-awareness for next-generation file systems[J].ACM,Portland Oregon,USA,2009,2(1):1-12
[7] Belkin N J,Croft B B.Information filtering and information retrieval:two sides of the same coin?[J].Communications of the ACM,1992,35:29-38
[8] Lieberman H,Van Dyke N W,Vivacqua A S.Let’s browse:A collaborative web browsing agent[C]∥Proceedings of the 1999 International Conference on Intelligent User Interfaces(IUI’99).1999:65-68
[9] Yan T,Garcia-Molina H.SIFT-A tool for wide-area information dissemination[C]∥Proceedings in1995 USENIX Technical Conference.1995:177-186
[10] Wang Meng-fan,Zhang Da-fang,Tian Xiao-mei.Multi-keyword search for P2P based on Counting Bloom Filter[C]∥2011 International Conference on Networking and Information Technology(IPCSIT).2011
[11] 覃雄派,王会举,杜小勇,等.数据分析-RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45 Qin Xiong-pai,Wang Hui-ju,Du Xiao-yong,et al.Big Data Analysis-Competition and Symbiosis of RDBMS and MapReduce[J].Journal of Software,2012,3(1):32-45
[12] 亓开元,赵卓峰,房俊,等.针对高速数据流的大规模数据实时处理方法[J].计算机学报,2012,35(3):477-490 Qi Kai-yuan,Zhao Zhuo-feng,Fang Jun,et al.Real-Time Processing for High Speed Stream over Large Scale Data[J].Chinese Journal of Computers,2012,5(3):477-490
[13] Boyd D,Crawford K.Critical Questions for Big Data[J].Information,Communication & Society,2012,15(5):662-679
[14] Brinkmann B H,Bower M R,Stengel K A.Large-scale electrophysiology:acquisition,compression,encryption,and storage of big data[J].Journal of Neuroscience,2009,180(1):185-192
[15] Li Ai-guo,Zhang Chi,Zhang Jiu-long,et al.A Balanced Multiway Search Tree for Multi-Dimension Searching[J].Applied Mechanics and Materials,2010,44-47:3574-3578
[16] 王珊,王会举,覃雄派,等.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752 Wang Shan,Wang Hui-ju,Qin Xiong-pai,et al.Architecting Big Data:Challenges,Studies and Forecasts[J].Chinese Journal of Computers,2011,4(10):1741-1752
[17] Karun K A,Chitharanjan K.Locality Sensitive Hashing basedIncremental Clustering for Creating Affinity Groups in Hadoop-HDFS-An Infrastructure Extension[C]∥2013 International Conference on Circuits,Power and Computing Technologies(ICCPCT’2013).2013
[18] Abouzeid A,Bajda-Pawlikowski K,Abadi D J,et al.HadoopDB:An architec-tural hybrid of MapReduce and DBMS technologies for analytical workloads[C]∥Proceedings of the 35th International Conference on Very Large Data Bases(VLDB’09).Lyon,France,2009:733-743
[19] Fusco E G,Pelc A.Distributed tree comparison with nodes oflimited memory[J].Networks,2012,60(4):235-244
[20] Dehne F,Kong Q,Rau-Chaplin A.A distributed tree data structure for real time OLAP on cloud architectures[C]∥2013 IEEE International Conference on Big Data.2013:499-505
[21] Taylor R C.An overview of the Hadoop/MapReduce/HBaseframework and its current applications in bioinformatics[C]∥Proceeding of the 11th Annual Bioinformatics Open Source Conference(BOSC).2010
[22] Sun Jian-ling,Jin Qiang.Scalable RDF store based on HBase and MapReduce[C]∥2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).2010:633-636
[23] Nishimura S,Das S,Agrawal D,et al.MD-HBase:A ScalableMulti-dimensional Data Infrastructure for Location Aware Ser-vices[C]∥2011 12th IEEE International Conference on Mobile Data Management(MDM).2011:7-16

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!