基于HDFS开源架构与多级索引表的海量数据检索mDHT算法

计算机科学 ›› 2013, Vol. 40 ›› Issue (2): 195-199.

基于HDFS开源架构与多级索引表的海量数据检索mDHT算法

汤羽,王英杰,范爱华,姚远哲

(电子科技大学成都 611731) (西安工程大学西安 710048)

出版日期:2018-11-16 发布日期:2018-11-16

mDHT:A Search Algorithm to Extra-large Volume of Data Based on Open HDFS Platform and Multi-level Indexing

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 针对大规模能源数据系统的存储与快速检索需求，提出了一种基于HDFS/Hadoop开源平台的云存储架构及多级索引目录体系，以及此架构下的基于多级索引表的mDH"I'算法，并完成了算法的MapRcducc编程实现。基于上述算法完成的4800万条数据的仿真实验表明:在数据量达到1200万一4800万条时，采用多级索引表的mDHT算法较常规的MS SQI. Scrvcr实现和HDFS/Hiv。方法在检索性能方面有质的飞跃;与单级索引表检索方法比较，在数据查找时间上也有24. 5 0 0-}-57. 8%的显著降低。文中提出的基于多级索引表的DH`I}算法为构建基于云存储架构的海量数据快速搜索引擎提供了一个关键技术。

关键词: 大规模数据处理，云存储，多级索引表，查找算法，MapRcducc

Abstract: Corresponding to the storing and fast searching needs of extra-large scale of energy monitoring and statistics data,we proposed a Multi indexed Distributed Hash Table (mDHh) algorithm based on the HDFS/Hadoop open plat- form and multi-level indexing design, and accomplished the MapReduce implementation of the algorithm. hhe simulation experiment at a scale up to 48 million data records indicates that, when the data volume reaches the scale of 12 millions to 48 millions, the proposed mDH T algorithm presents an outstanding performance in data adding operation, compared to that of traditional MS SQL Server implementation. Even compared to the singlaindex search application, the mDHT approach reduces the data searching time by 24. 5%一57. 8 0 o. The multi-level indexed DHT algorithm presented in this paper provides a key technique for developing a fast search engine to the extra large scale of data on the cloud storage architecture.

Key words: Extra large scale data processing, Cloud storage, Multi-index, Search algorithm, MapReduce

汤羽,王英杰,范爱华,姚远哲. 基于HDFS开源架构与多级索引表的海量数据检索mDHT算法[J]. 计算机科学, 2013, 40(2): 195-199. https://doi.org/

参考文献

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed