Computer Science ›› 2015, Vol. 42 ›› Issue (9): 183-190.doi: 10.11896/j.issn.1002-137X.2015.09.035

Previous Articles     Next Articles

Research on Big Data Retrieve Filter Model for Batch Processing

LI Zhao-xing and MA Zi-tang   

  • Online:2018-11-14 Published:2018-11-14

Abstract: As a new strategic resource,big data plays an important role in the field of information.The scale of big data retrieval often reaches billions or even ten billions,resulting in that traditional query mechanism’s low efficiency becomes regular.Therefore,improving the efficiency of big data query and reducing the burden of querying big data have become an important aspect of big data research.In order to speed up the querying of big data as well as reduce the burden,we proposed a big data retrieval filtering model IMFM of batch-oriented processing,demonstrated its support for multi-dimensional queries,and gave out the IMFM’s deployment strategy.By deploying the model in the appropriate position of the index structure,IMFM can filter the search requests that pass through the node quickly to avoid that the lower node is searched,so as to reduce the consumption of retrieval performance.Experiments show that,in the batch-oriented processing of big data environment,IMFM can effectively reduce the path length of both single and multi-dimensional data queries,improve the efficiency of retrieval and reduce the workload of big data storage and processing platform significantly.

Key words: Big data,Retrieve,Filter,Index architecture,Multi-dimensional query

[1] Manyika J,Chui M,Brown B,et al.Big data:The next frontier for innovation,competition and productivity[R].McKinsey Global Institute Report,2011
[2] Howe D,Costanzo M,Fey P.Big data:The future of biocuration[J].Nature,2008,455:47-50
[3] Balkir A S,Foster I,Rzhetsky A.A Distributed Look Up Architecture for Text mining Applications using MapReduce[C]∥Conference on High Performance Computing Networking,Sto-rage and Analysis,SC 2011.2011
[4] Andrew W,Shao Ming-long,Bisson T,et al.Spyglass:Fast sca-lable metadata search for large-scale storage systems[C]∥USENIX.2010
[5] 吴广君,王树鹏,陈明,等.海量结构化数据存储检索系统[J].计算机研究与发展,2012,5(1):1-6 Wu Guang-jun,Wang Shu-peng,Chen Ming,et al.Massive Structured Data Oriented Storage and Retrieve System[J].Journal of Computer Research and Development ,2012,5(1):1-6
[6] Hua Yu,Jiang Hong,Zhu Yi-feng,et al.SmartStore:a newmetadata organization paradigm with semantic-awareness for next-generation file systems[J].ACM,Portland Oregon,USA,2009,2(1):1-12
[7] Belkin N J,Croft B B.Information filtering and information retrieval:two sides of the same coin?[J].Communications of the ACM,1992,35:29-38
[8] Lieberman H,Van Dyke N W,Vivacqua A S.Let’s browse:A collaborative web browsing agent[C]∥Proceedings of the 1999 International Conference on Intelligent User Interfaces(IUI’99).1999:65-68
[9] Yan T,Garcia-Molina H.SIFT-A tool for wide-area information dissemination[C]∥Proceedings in1995 USENIX Technical Conference.1995:177-186
[10] Wang Meng-fan,Zhang Da-fang,Tian Xiao-mei.Multi-keyword search for P2P based on Counting Bloom Filter[C]∥2011 International Conference on Networking and Information Technology(IPCSIT).2011
[11] 覃雄派,王会举,杜小勇,等.数据分析-RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45 Qin Xiong-pai,Wang Hui-ju,Du Xiao-yong,et al.Big Data Analysis-Competition and Symbiosis of RDBMS and MapReduce[J].Journal of Software,2012,3(1):32-45
[12] 亓开元,赵卓峰,房俊,等.针对高速数据流的大规模数据实时处理方法[J].计算机学报,2012,35(3):477-490 Qi Kai-yuan,Zhao Zhuo-feng,Fang Jun,et al.Real-Time Processing for High Speed Stream over Large Scale Data[J].Chinese Journal of Computers,2012,5(3):477-490
[13] Boyd D,Crawford K.Critical Questions for Big Data[J].Information,Communication & Society,2012,15(5):662-679
[14] Brinkmann B H,Bower M R,Stengel K A.Large-scale electrophysiology:acquisition,compression,encryption,and storage of big data[J].Journal of Neuroscience,2009,180(1):185-192
[15] Li Ai-guo,Zhang Chi,Zhang Jiu-long,et al.A Balanced Multiway Search Tree for Multi-Dimension Searching[J].Applied Mechanics and Materials,2010,44-47:3574-3578
[16] 王珊,王会举,覃雄派,等.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752 Wang Shan,Wang Hui-ju,Qin Xiong-pai,et al.Architecting Big Data:Challenges,Studies and Forecasts[J].Chinese Journal of Computers,2011,4(10):1741-1752
[17] Karun K A,Chitharanjan K.Locality Sensitive Hashing basedIncremental Clustering for Creating Affinity Groups in Hadoop-HDFS-An Infrastructure Extension[C]∥2013 International Conference on Circuits,Power and Computing Technologies(ICCPCT’2013).2013
[18] Abouzeid A,Bajda-Pawlikowski K,Abadi D J,et al.HadoopDB:An architec-tural hybrid of MapReduce and DBMS technologies for analytical workloads[C]∥Proceedings of the 35th International Conference on Very Large Data Bases(VLDB’09).Lyon,France,2009:733-743
[19] Fusco E G,Pelc A.Distributed tree comparison with nodes oflimited memory[J].Networks,2012,60(4):235-244
[20] Dehne F,Kong Q,Rau-Chaplin A.A distributed tree data structure for real time OLAP on cloud architectures[C]∥2013 IEEE International Conference on Big Data.2013:499-505
[21] Taylor R C.An overview of the Hadoop/MapReduce/HBaseframework and its current applications in bioinformatics[C]∥Proceeding of the 11th Annual Bioinformatics Open Source Conference(BOSC).2010
[22] Sun Jian-ling,Jin Qiang.Scalable RDF store based on HBase and MapReduce[C]∥2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).2010:633-636
[23] Nishimura S,Das S,Agrawal D,et al.MD-HBase:A ScalableMulti-dimensional Data Infrastructure for Location Aware Ser-vices[C]∥2011 12th IEEE International Conference on Mobile Data Management(MDM).2011:7-16

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!