%A DU Hong-guang, LEI Zhou and CHEN Sheng-bo %T Data Block Density Scheduling Strategy Based on HDFS in Shared Cluster %0 Journal Article %D 2017 %J Computer Science %R 10.11896/j.issn.1002-137X.2017.11A.108 %P 510-515 %V 44 %N Z11 %U {https://www.jsjkx.com/CN/abstract/article_16502.shtml} %8 2018-12-01 %X With the development of cloud computing technology and mass data processing technology,shared clusters use HDFS as a distributed file system and manage computing resources through virtualization to provide operational resources for computing frameworks and applications.The data localization of mass data processing applications is a key factor which affects its performance.At present,the research of shared cluster management framework’s scheduler mainly focuses on improving the throughput and resource utilization of the system by improving the parallelism of dispatching,and there are some defects in the quality of scheduling,such as the data locality.In this paper,a scheduling strategy based on data block density was proposed to improve the data locality of the application.By using this strategy,the performance of the application can be improved by reducing the cross-host I/O during the application operation.Experiments show that the scheduling strategy proposed in this paper can effectively reduce the running time of data-intensive operations.In the test case of WordCount and TeraSort with 2.5G data,the method of this paper achieved 90% data localization and shortened the operation by 20% time.