Computer Science ›› 2016, Vol. 43 ›› Issue (Z6): 480-484.doi: 10.11896/j.issn.1002-137X.2016.6A.113

Previous Articles     Next Articles

Weight Distribution Algorithm for Massive Video Data Based on HDFS

GUO Jian-hua, YANG Hong-bin and CHEN Sheng-bo   

  • Online:2018-11-14 Published:2018-11-14

Abstract: There is a big difference between the distributed computing based on the video data and the distributed computing based on the text type data.The video data are unstructured,and the same size of the video that has different content will lead to different execution time.For simple structured data,the default load equalizer of HDFS can solve the problem of load balancing.But the video file has the problem of different access times and complexity inconsistency.Using the default data distribution mechanism of HDFS are not well solve the load balancing problem.In this paper,a new algorithm for massive video data redistribution based on HDFS was proposed.Firstly,the access times and the history analysis time of the video file are recorded.Secondly,the data are quantified and weighted as the load of the video file.Lastly,the means of file replacement are used to exchange high load video and low load video,until each node achieves load balancing.Experimental results show that using the data redistribution algorithm proposed in this paper can reduce the processing time of massive video data.

Key words: HDFS,Data redistribution,Video complexity,Video popularity

[1] White T.Hadoop 权威指南[M].北京:清华大学出版社,2011:1-123
[2] Zaharia M,Chowdhury M,Das T,et al.Resilient distributeddatasets:A fault-tolerant abstraction for in-memory cluster computing[C]∥Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation.USENIX Association,2012:2-2
[3] Geng Chen-yao,et al.Distributed Video Processing PlatformBased on Map Reduce[J].Computer Engineering,2012,38(10):280-283
[4] Zaharia M,Chowdhury M,Franklin M J,et al.Spark:clustercomputing with working sets[C]∥Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing.2010:1765-1773
[5] Zaharia M,Borthakur D,Sen Sarma J,et al.Delay scheduling:a simple technique for achieving locality and fairness in cluster scheduling[C]∥Proceedings of the 5th European Conference on Computer Systems.ACM,2010:265-278
[6] Dean J,Ghemawat S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113
[7] Lin S H.An introduction to face recognition technology[J].Informing Science,2000,3(1):1-8
[8] Spark job scheduling.http://spark.apache.org/ docs/latest/job-scheduling.htm
[9] Borthakur D.HDFS architecture guide.HADOOP APACHEPROJECT.(2008).http://hadoop.apache.org/common/docs/current/hdfs design.pdf,2008
[10] Kapil B S,Kamath S S.Resource aware scheduling in Hadoop for heterogeneous workloads based on load estimation[C]∥2013 Fourth International Conference on Computing,Communications and Networking Technologies (ICCCNT).IEEE,2013:1-5
[11] Orrite C,Bernues E,Gracia J J,et al.Face detection and recognition in a video sequence[C]∥Defense and Security.InternationalSociety for Optics and Photonics,2004:94-105
[12] Bezerra A,Hernández P,Espinosa A,et al.Job scheduling for optimizing data locality in Hadoop clusters[C]∥Proceedings of the 20th European MPI Users’ Group Meeting.ACM,2013:271-276

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!