%A JIANG Wen-bin, FU Zhi, PENG Jing, ZHU Jian %T 4Bit-based Gradient Compression Method for Distributed Deep Learning System %0 Journal Article %D 2020 %J Computer Science %R 10.11896/jsjkx.200300097 %P 220-226 %V 47 %N 7 %U {https://www.jsjkx.com/CN/abstract/article_19265.shtml} %8 2020-07-15 %X In order to reduce the communication overhead of distributed deep learning system,compression of gradient data before transmission is an effective method,such as 2Bit method in MXNet.However,there is a problem in this kind of method,that is,too high compression ratio will lead to decline in accuracy and convergence speed,especially for larger network models.To address this problem,a new gradient compression strategy called 4Bit is proposed.Four bits are used to represent a specific gradient value.Compared with 2Bit,this method can approximate the gradient more finely,thus improving the accuracy of training results and convergence speed.Furthermore,different approximation thresholds are selected according to the gradient characteristics of each layer of the network model,which makes the compressed values more reasonable,and finally improves the convergence speed and final accuracy of the model.The experimental results show that,although 4Bit is slightly lower than the 2Bit method in terms of acceleration,its accuracy is higher and practicability is better by using more bits and multiple thresholds.It is very meaningful to reduce the communication overhead of the distributed deep learning system while maintaining high accuracy by 4Bit.