计算机科学 ›› 2013, Vol. 40 ›› Issue (2): 195-199.

• 软件与数据库技术 • 上一篇    下一篇

基于HDFS开源架构与多级索引表的海量数据检索mDHT算法

汤羽,王英杰,范爱华,姚远哲   

  1. (电子科技大学 成都 611731) (西安工程大学 西安 710048)
  • 出版日期:2018-11-16 发布日期:2018-11-16

mDHT:A Search Algorithm to Extra-large Volume of Data Based on Open HDFS Platform and Multi-level Indexing

  • Online:2018-11-16 Published:2018-11-16

摘要: 针对大规模能源数据系统的存储与快速检索需求,提出了一种基于HDFS/Hadoop开源平台的云存储架构 及多级索引目录体系,以及此架构下的基于多级索引表的mDH"I'算法,并完成了算法的MapRcducc编程实现。基于 上述算法完成的4800万条数据的仿真实验表明:在数据量达到1200万一4800万条时,采用多级索引表的mDHT算 法较常规的MS SQI. Scrvcr实现和HDFS/Hiv。方法在检索性能方面有质的飞跃;与单级索引表检索方法比较,在数 据查找时间上也有24. 5 0 0-}-57. 8%的显著降低。文中提出的基于多级索引表的DH`I}算法为构建基于云存储架构的 海量数据快速搜索引擎提供了一个关键技术。

关键词: 大规模数据处理,云存储,多级索引表,查找算法,MapRcducc

Abstract: Corresponding to the storing and fast searching needs of extra-large scale of energy monitoring and statistics data,we proposed a Multi indexed Distributed Hash Table (mDHh) algorithm based on the HDFS/Hadoop open plat- form and multi-level indexing design, and accomplished the MapReduce implementation of the algorithm. hhe simulation experiment at a scale up to 48 million data records indicates that, when the data volume reaches the scale of 12 millions to 48 millions, the proposed mDH T algorithm presents an outstanding performance in data adding operation, compared to that of traditional MS SQL Server implementation. Even compared to the singlaindex search application, the mDHT approach reduces the data searching time by 24. 5%一57. 8 0 o. The multi-level indexed DHT algorithm presented in this paper provides a key technique for developing a fast search engine to the extra large scale of data on the cloud storage architecture.

Key words: Extra large scale data processing, Cloud storage, Multi-index, Search algorithm, MapReduce

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!