面向批量处理的大数据检索过滤模型研究

doi:10.11896/j.issn.1002-137X.2015.09.035

Abstract

Abstract: As a new strategic resource,big data plays an important role in the field of information.The scale of big data retrieval often reaches billions or even ten billions,resulting in that traditional query mechanism’s low efficiency becomes regular.Therefore,improving the efficiency of big data query and reducing the burden of querying big data have become an important aspect of big data research.In order to speed up the querying of big data as well as reduce the burden,we proposed a big data retrieval filtering model IMFM of batch-oriented processing,demonstrated its support for multi-dimensional queries,and gave out the IMFM’s deployment strategy.By deploying the model in the appropriate position of the index structure,IMFM can filter the search requests that pass through the node quickly to avoid that the lower node is searched,so as to reduce the consumption of retrieval performance.Experiments show that,in the batch-oriented processing of big data environment,IMFM can effectively reduce the path length of both single and multi-dimensional data queries,improve the efficiency of retrieval and reduce the workload of big data storage and processing platform significantly.

Key words: Big data,Retrieve,Filter,Index architecture,Multi-dimensional query

LI Zhao-xing and MA Zi-tang. Research on Big Data Retrieve Filter Model for Batch Processing[J].Computer Science, 2015, 42(9): 183-190.

References

[1] Manyika J,Chui M,Brown B,et al.Big data:The next frontier for innovation,competition and productivity[R].McKinsey Global Institute Report,2011
[2] Howe D,Costanzo M,Fey P.Big data:The future of biocuration[J].Nature,2008,455:47-50
[3] Balkir A S,Foster I,Rzhetsky A.A Distributed Look Up Architecture for Text mining Applications using MapReduce[C]∥Conference on High Performance Computing Networking,Sto-rage and Analysis,SC 2011.2011
[4] Andrew W,Shao Ming-long,Bisson T,et al.Spyglass:Fast sca-lable metadata search for large-scale storage systems[C]∥USENIX.2010
[5] 吴广君,王树鹏,陈明,等.海量结构化数据存储检索系统[J].计算机研究与发展,2012,5(1):1-6 Wu Guang-jun,Wang Shu-peng,Chen Ming,et al.Massive Structured Data Oriented Storage and Retrieve System[J].Journal of Computer Research and Development ,2012,5(1):1-6
[6] Hua Yu,Jiang Hong,Zhu Yi-feng,et al.SmartStore:a newmetadata organization paradigm with semantic-awareness for next-generation file systems[J].ACM,Portland Oregon,USA,2009,2(1):1-12
[7] Belkin N J,Croft B B.Information filtering and information retrieval:two sides of the same coin?[J].Communications of the ACM,1992,35:29-38
[8] Lieberman H,Van Dyke N W,Vivacqua A S.Let’s browse:A collaborative web browsing agent[C]∥Proceedings of the 1999 International Conference on Intelligent User Interfaces(IUI’99).1999:65-68
[9] Yan T,Garcia-Molina H.SIFT-A tool for wide-area information dissemination[C]∥Proceedings in1995 USENIX Technical Conference.1995:177-186
[10] Wang Meng-fan,Zhang Da-fang,Tian Xiao-mei.Multi-keyword search for P2P based on Counting Bloom Filter[C]∥2011 International Conference on Networking and Information Technology(IPCSIT).2011
[11] 覃雄派,王会举,杜小勇,等.数据分析-RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45 Qin Xiong-pai,Wang Hui-ju,Du Xiao-yong,et al.Big Data Analysis-Competition and Symbiosis of RDBMS and MapReduce[J].Journal of Software,2012,3(1):32-45
[12] 亓开元,赵卓峰,房俊,等.针对高速数据流的大规模数据实时处理方法[J].计算机学报,2012,35(3):477-490 Qi Kai-yuan,Zhao Zhuo-feng,Fang Jun,et al.Real-Time Processing for High Speed Stream over Large Scale Data[J].Chinese Journal of Computers,2012,5(3):477-490
[13] Boyd D,Crawford K.Critical Questions for Big Data[J].Information,Communication & Society,2012,15(5):662-679
[14] Brinkmann B H,Bower M R,Stengel K A.Large-scale electrophysiology:acquisition,compression,encryption,and storage of big data[J].Journal of Neuroscience,2009,180(1):185-192
[15] Li Ai-guo,Zhang Chi,Zhang Jiu-long,et al.A Balanced Multiway Search Tree for Multi-Dimension Searching[J].Applied Mechanics and Materials,2010,44-47:3574-3578
[16] 王珊,王会举,覃雄派,等.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752 Wang Shan,Wang Hui-ju,Qin Xiong-pai,et al.Architecting Big Data:Challenges,Studies and Forecasts[J].Chinese Journal of Computers,2011,4(10):1741-1752
[17] Karun K A,Chitharanjan K.Locality Sensitive Hashing basedIncremental Clustering for Creating Affinity Groups in Hadoop-HDFS-An Infrastructure Extension[C]∥2013 International Conference on Circuits,Power and Computing Technologies(ICCPCT’2013).2013
[18] Abouzeid A,Bajda-Pawlikowski K,Abadi D J,et al.HadoopDB:An architec-tural hybrid of MapReduce and DBMS technologies for analytical workloads[C]∥Proceedings of the 35th International Conference on Very Large Data Bases(VLDB’09).Lyon,France,2009:733-743
[19] Fusco E G,Pelc A.Distributed tree comparison with nodes oflimited memory[J].Networks,2012,60(4):235-244
[20] Dehne F,Kong Q,Rau-Chaplin A.A distributed tree data structure for real time OLAP on cloud architectures[C]∥2013 IEEE International Conference on Big Data.2013:499-505
[21] Taylor R C.An overview of the Hadoop/MapReduce/HBaseframework and its current applications in bioinformatics[C]∥Proceeding of the 11^th Annual Bioinformatics Open Source Conference(BOSC).2010
[22] Sun Jian-ling,Jin Qiang.Scalable RDF store based on HBase and MapReduce[C]∥2010 3^rd International Conference on Advanced Computer Theory and Engineering(ICACTE).2010:633-636
[23] Nishimura S,Das S,Agrawal D,et al.MD-HBase:A ScalableMulti-dimensional Data Infrastructure for Location Aware Ser-vices[C]∥2011 12^th IEEE International Conference on Mobile Data Management(MDM).2011:7-16

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research on Big Data Retrieve Filter Model for Batch Processing

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0