计算机科学 ›› 2020, Vol. 47 ›› Issue (7): 220-226.doi: 10.11896/jsjkx.200300097
所属专题: 网络通信
蒋文斌, 符智, 彭晶, 祝简
JIANG Wen-bin, FU Zhi, PENG Jing, ZHU Jian
摘要: 对梯度数据进行压缩,是一种减少多机间通信开销的有效方法,如MXNet系统中的2Bit方法等。但这类方法存在一个突出的问题,即过高的压缩比会导致精度及收敛速度下降,尤其是对规模较大的深度神经网络模型。针对上述问题,提出了一种新的4Bit梯度压缩策略。该方法采用4个比特位表示一个具体的梯度值(通常为32位的浮点数)。相对于2Bit,该方法能够对梯度值进行更细粒度的近似,从而提高训练结果的准确率和收敛性。进一步地,根据网络模型每一层梯度特性的不同,选择不同的近似阈值,使得压缩后的数值更合理,从而进一步加快模型的收敛速度并提高最终准确率;具体地,兼顾操作的方便性和分布的合理性,根据每层梯度特性的不同,设置3组不同的阈值,以满足不同层梯度差异化特性的需求。实验结果表明,使用多组阈值的4Bit梯度压缩策略虽然在加速方面略逊于2Bit方法,但其准确率更高,实用性更强,能够在保持模型更高精度的前提下减少分布式深度学习系统的通信开销,这对于在资源受限环境下实现性能更好的深度学习模型非常有意义。
中图分类号:
[1]LECUN Y,BENGIO Y,HINTON G.Deep Learning[J].Na-ture,2015,521(7553):436-444. [2]YIN B,WANG W,WANG L.Review of Deep Learning[J].Journal of Beijing University of Technology,2015,41(1):48-59. [3]HINTON G,DENG L,YU D,et al.Deep Neural Networks for Acoustic Modeling in Speech Recognition:The Shared Views of Four Research Groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97. [4]GRAVES A,MOHAMED A,HINTON G.Speech Recognition with Deep Recurrent Neural Networks[C]//International Conference on Acoustics,Speech and Signal Processing.USA:IEEE,2013:6645-6649. [5]DAI Y L,HE L,HUANG Z C.Unsupervised image hashing algorithm based on sparse-autoencoder[J].Computer Enginee-ring,2019,45(5):222-225,236. [6]FARABET C,COUPRIE C,NAJMAN L,et al.Learning Hierarchical Features for Scene Labeling[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1915-1929. [7]SUTSKEVER I,VINYALS O,LE Q.Sequence to SequenceLearning with Neural Networks[C]//Advances in Neural Information Processing Systems 27.USA:MIT press,2014:3104-3112. [8]COLLOBERT R,WESTON J,BOTTOU L,et al.Natural Language Processing (Almost) from Scratch[J].Journal of Machine Learning Research,2011,12(8):2493-2537. [9]YU K,JIA L,CHEN Y,et al.Deep Learning:Yesterday,To-day,and Tomorrow[J].Journal of Computer Research and Development,2013,50(9):1799-1804. [10]CHE S,BOYER M,MENG J,et al.A Performance Study ofGeneral-purpose Applications on Graphics Processors Using CUDA[J].Journal of Parallel and Distributed Computing,2008,68(10):1370-1380. [11]HUILGOL R.2bit Gradient Compression [EB/OL].https://github.com/apache/incubator-mxnet/pull/8662. [12]DEAN J,CORRADO G,MONGA R,et al.Large Scale Distributed Deep Networks[C]//Advances in Neural Information Processing Systems 25.USA:Curran Associates Inc,2012:1223-1231. [13]REN Y,WU X,LI Z,et al.iRDMA:Efficient Use of RDMA in Distributed Deep Learning Systems[C]//Proceedings of the 2017 IEEE 19th International Conference on High Performance Computing and Communications.USA:IEEE,2017:231-238. [14]ZHANG H,ZHENG Z,XU S,et al.Poseidon:An EfficientCommunication Architecture for Distributed Deep Learning on GPU Clusters[C]//Proceedings of the 2017 USENIX Annual Technical Conference.USA:USENIX Association,2017:181-193. [15]WEN W,XU C,YAN F,et al.TernGrad:Ternary Gradients to Reduce Communication in Distributed Deep Learning[C]//Advances in Neural Information Processing Systems 30.USA:Curran Associates Inc,2017:1508-1518. [16]IOFFE S,SZEGEDY C.Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shift[J].ArXiv:1502.03167,2015. [17]KRIZHEVSKY A,HINTON G.Learning Multiple Layers ofFeatures from Tiny Images[R].Toronto:University of Toronto,2009. [18]ZHAO L,WANG J,LI X,et al.On the Connection of Deep Fusion to Ensembling[J].ArXiv:1611.07718,2016. [19]RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet LargeScale Visual Recognition Challenge[J].International Journal of Computer Vision,2015,115(3):211-252. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[4] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[5] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[8] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[9] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
[10] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[11] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[12] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[13] | 刘伟业, 鲁慧民, 李玉鹏, 马宁. 指静脉识别技术研究综述 Survey on Finger Vein Recognition Research 计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056 |
[14] | 孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217 |
[15] | 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩. 基于Transformer和LSTM的药物相互作用预测 Drug-Drug Interaction Prediction Based on Transformer and LSTM 计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150 |
|