计算机科学 ›› 2016, Vol. 43 ›› Issue (Z6): 480-484.doi: 10.11896/j.issn.1002-137X.2016.6A.113

• 软件工程与数据库技术 • 上一篇    下一篇

基于HDFS的海量视频数据重分布算法

郭建华,杨洪斌,陈圣波   

  1. 上海大学计算机工程与科学学院 上海200444,上海大学计算机工程与科学学院 上海200444,上海大学计算机工程与科学学院 上海200444
  • 出版日期:2018-11-14 发布日期:2018-11-14

Weight Distribution Algorithm for Massive Video Data Based on HDFS

GUO Jian-hua, YANG Hong-bin and CHEN Sheng-bo   

  • Online:2018-11-14 Published:2018-11-14

摘要: 基于视频数据的分布式计算与基于文本类型数据的分布式计算存在很大的差异。视频数据本身是非结构化的,并且对于同样大小的视频,若其内容不同会导致任务执行消耗的时间也不同。对于简单的结构化数据,HDFS默认的负载均衡器能够解决负载均衡的问题。但是视频文件存在热点访问以及复杂度不一致的问题。使用HDFS默认的数据分布机制不能很好地解决计算负载均衡问题。因此提出了一种基于HDFS的海量视频数据重分布算法。首先对视频文件的访问次数以及历史视频分析对视频文件的访问时间进行记录;然后对数据进行量化之后将其加权作为该视频文件的负载度;最后使用文件置换手段将负载高的视频与低的视频进行置换,直到每个节点的负载达到均衡为止。实验结果表明,使用提出的数据重分布算法可以减少海量视频数据的处理时间。

关键词: HDFS,数据重分布,视频复杂度,视频热度

Abstract: There is a big difference between the distributed computing based on the video data and the distributed computing based on the text type data.The video data are unstructured,and the same size of the video that has different content will lead to different execution time.For simple structured data,the default load equalizer of HDFS can solve the problem of load balancing.But the video file has the problem of different access times and complexity inconsistency.Using the default data distribution mechanism of HDFS are not well solve the load balancing problem.In this paper,a new algorithm for massive video data redistribution based on HDFS was proposed.Firstly,the access times and the history analysis time of the video file are recorded.Secondly,the data are quantified and weighted as the load of the video file.Lastly,the means of file replacement are used to exchange high load video and low load video,until each node achieves load balancing.Experimental results show that using the data redistribution algorithm proposed in this paper can reduce the processing time of massive video data.

Key words: HDFS,Data redistribution,Video complexity,Video popularity

[1] White T.Hadoop 权威指南[M].北京:清华大学出版社,2011:1-123
[2] Zaharia M,Chowdhury M,Das T,et al.Resilient distributeddatasets:A fault-tolerant abstraction for in-memory cluster computing[C]∥Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation.USENIX Association,2012:2-2
[3] Geng Chen-yao,et al.Distributed Video Processing PlatformBased on Map Reduce[J].Computer Engineering,2012,38(10):280-283
[4] Zaharia M,Chowdhury M,Franklin M J,et al.Spark:clustercomputing with working sets[C]∥Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing.2010:1765-1773
[5] Zaharia M,Borthakur D,Sen Sarma J,et al.Delay scheduling:a simple technique for achieving locality and fairness in cluster scheduling[C]∥Proceedings of the 5th European Conference on Computer Systems.ACM,2010:265-278
[6] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113
[7] Lin S H.An introduction to face recognition technology[J].Informing Science,2000,3(1):1-8
[8] Spark job scheduling.http://spark.apache.org/ docs/latest/job-scheduling.htm
[9] Borthakur D.HDFS architecture guide.HADOOP APACHEPROJECT.(2008).http://hadoop.apache.org/common/docs/current/hdfs design.pdf,2008
[10] Kapil B S,Kamath S S.Resource aware scheduling in Hadoop for heterogeneous workloads based on load estimation[C]∥2013 Fourth International Conference on Computing,Communications and Networking Technologies (ICCCNT).IEEE,2013:1-5
[11] Orrite C,Bernues E,Gracia J J,et al.Face detection and recognition in a video sequence[C]∥Defense and Security.InternationalSociety for Optics and Photonics,2004:94-105
[12] Bezerra A,Hernández P,Espinosa A,et al.Job scheduling for optimizing data locality in Hadoop clusters[C]∥Proceedings of the 20th European MPI Users’ Group Meeting.ACM,2013:271-276

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!