自适应梯度稀疏化的深度神经网络训练方法

doi:10.11896/jsjkx.250100106

Abstract

Abstract: Top-k Sparsification Method with error compensation is one of the state-of-the-art technologies in the training of distributed deep neural networks(DNNs).This technique aims to reduce the amount of communication by dynamically transmitting only parts of the gradients in each iteration,with the amount of transmitted gradients depending on the value of k.Although a smaller k can speed up training time,it may degrade the test accuracy,even with error compensation,known as the speed-accuracy dilemma.Based on the observation that the increase speed of the training accuracy and test accuracy have a dynamic correlation over time,this paper presents AdaTopK－an adaptive Top-k compressor with convergence guarantees.AdaTopK can dynamically adjust the value of k to accelerate the training speed while keeping or enhancing the test accuracy.Extensive experiments in the static and dynamic network scenarios show that AdaTopK can reduce 29% training time over the baseline without compression,while reducing 15% training time over DC2.

Key words: Distributed training, Network compression, Sparsification, Deep neural networks, Error compensation

CLC Number:

TP319

HUANG Xinli, GAO Guoju. Adaptive Gradient Sparsification Approach to Training Deep Neural Networks[J].Computer Science, 2025, 52(11A): 250100106-6.

References

[1]朱永伟.基于深度学习和注意力机制的文本分类关键技术研究[D].南京信息工程大学,2024.
[2]CHENG Z T,HUANG H R,XUE H,et al.Event CausalityIdentification Model Based on Prompt Learning and Hypergraph[J].Computer Science,2025,52(9):303-312.
[3]LYU Y F,ZHANG X L,GAO W N,et al.The Application of Deep Learning in Customs Image Recognition Technology[J].China Port Science and Technology,2024,6(Z2):4-12.
[4]井煜.基于深度学习的局部遮挡人脸图像识别方法研究[J].互联网周刊,2024,(21):56-58.
[5]VOGELS T,KARIMIREDDY S P,JAGGI M.Powersgd:Practical low-rank gradient compression for distributed optimization[C]//NeurIPS.2019:14236-14245.
[6]李诗琪.分布式深度学习模型训练中梯度稀疏方法的改进[D].北京:北京邮电大学,2021.
[7]SAPIO A,CANINI M,HO C Y,et al.Scaling distributed machine learning with in-network aggregation[J/OL].https://www.usenix.org/system/files/nsdi21-sapio.pdf.
[8]SEIDE F,FU H,DROPPO J,et al.1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns[C]//INTERSPEECH.2014:1058-1062.
[9]欧阳硕.基于梯度压缩的分布式深度学习通信优化技术研究[D].长沙:国防科技大学,2021.
[10]ESSER S K,MCKINSTRY J L,BABLANI D,et al.Learned step size quantization[C]//ICLR.2020.
[11]LI R,WANG Y,LIANG F,et al.Fully quantized network for object detection[C]//CVPR.2019:2810-2819.
[12]刘松伟.高性能二值卷积神经网络的研究与实现[D].杭州:浙江大学,2021.
[13]AJI A FHEAFIELD K.Sparse communication for distributed gradient descent[C]//EMNLP.2017:1440-445.
[14]HORVÁTH S,RICHTÁRIK P.A better alternative to errorfeedback for communication-efficient distributed learning[C]//ICLR.2021.
[15]LIN Y,HAN S,MAO H,et al.Deep gradient compression:Reducing the communication bandwidth for distributed training[C]//ICLR.2018.
[16]牛丽玲.基于深度学习处理器的Top-K算法实现及应用[D].北京:中国科学院大学(中国科学院大学人工智能学院),2020.
[17]ABDELMONIEM A M,CANINI M.Dc2:Delay-aware compression control for distributed machine learning[C]//INFOCOM.2021.
[18]ALISTARH D,GRUBIC D,LI J,et al.Qsgd:Communication-efficient sgd via gradient quantization and encoding[C]//NIPS.2017:1709-1720.
[19]DRYDEN N,MOON T,JACOBS S A,et al.Communicationquantization for data-parallel training of deep neural networks[C]//MLHPC@SC.2016:1-8.
[20]SHI S,TANG Z,WANG Q,et al.Layer-wise adaptive gradient sparsification for distributed deep learning with convergence guarantees[C]//ECAI.2020:1467-1474.
[21]ALISTARH D,HOEFLER T,JOHANSSON M,et al.The convergence of sparsified gradient methods[C]//NeurIPS.2018:5977-5987.
[22]CHEN C Y,NI J,LU S,et al.Gopalakrishnan.Scalecom:Scalable sparsified gradient compression for communication-efficient distributed training[C]//NeurIPS.2020.
[23]BRADLEY J K,KYROLA A,BICKSON D,et al.Parallel coordinate descent for l1-regularized loss minimization[C]//ICML.2011:321-328.

Related Articles 15

[1]	XIA Zhuoqun, ZHOU Zihao, DENG Bin, KANG Chen. Security Situation Assessment Method for Intelligent Water Resources Network Based on ImprovedD-S Evidence [J]. Computer Science, 2025, 52(6A): 240600051-6.
[2]	LIN Zheng, LIU Sicong, GUO Bin, DING Yasan, YU Zhiwen. Adaptive Operator Parallel Partitioning Method for Heterogeneous Embedded Chips in AIoT [J]. Computer Science, 2025, 52(2): 299-309.
[3]	WANG Liuyi, ZHOU Chun, ZENG Wenqiang, HE Xingxing, MENG Hua. High-frequency Feature Masking-based Adversarial Attack Algorithm [J]. Computer Science, 2025, 52(10): 374-381.
[4]	DU Yu, YU Zishu, PENG Xiaohui, XU Zhiwei. Padding Load:Load Reducing Cluster Resource Waste and Deep Learning Training Costs [J]. Computer Science, 2024, 51(9): 71-79.
[5]	ZHU Fukun, TENG Zhen, SHAO Wenze, GE Qi, SUN Yubao. Semantic-guided Neural Network Critical Data Routing Path [J]. Computer Science, 2024, 51(9): 155-161.
[6]	HAN Bing, DENG Lixiang, ZHENG Yi, REN Shuang. Survey of 3D Point Clouds Upsampling Methods [J]. Computer Science, 2024, 51(7): 167-196.
[7]	XU Xiaohua, ZHOU Zhangbing, HU Zhongxu, LIN Shixun, YU Zhenjie. Lightweight Deep Neural Network Models for Edge Intelligence:A Survey [J]. Computer Science, 2024, 51(7): 257-271.
[8]	LI Wenting, XIAO Rong, YANG Xiao. Improving Transferability of Adversarial Samples Through Laplacian Smoothing Gradient [J]. Computer Science, 2024, 51(6A): 230800025-6.
[9]	CHEN Haiyan, ZHU Junlin, WANG Ping. Study on Prediction Modeling and Compensation of Circular Target Center Positioning Error Based on GA-BP [J]. Computer Science, 2023, 50(11A): 221100170-5.
[10]	REN Shuyao, SONG Jiangling, ZHANG Rui. Early Screening Method for Depression Based on EEG Signal [J]. Computer Science, 2023, 50(11A): 221100139-6.
[11]	QIAN Dong-wei, CUI Yang-guang, WEI Tong-quan. Secondary Modeling of Pollutant Concentration Prediction Based on Deep Neural Networks with Federal Learning [J]. Computer Science, 2022, 49(11A): 211200084-5.
[12]	FAN Hong-jie, LI Xue-dong, YE Song-tao. Aided Disease Diagnosis Method for EMR Semantic Analysis [J]. Computer Science, 2022, 49(1): 153-158.
[13]	ZHOU Xin, LIU Shuo-di, PAN Wei, CHEN Yuan-yuan. Vehicle Color Recognition in Natural Traffic Scene [J]. Computer Science, 2021, 48(6A): 15-20.
[14]	SUN Yan-li, YE Jiong-yao. Convolutional Neural Networks Compression Based on Pruning and Quantization [J]. Computer Science, 2020, 47(8): 261-266.
[15]	JIANG Wen-bin, FU Zhi, PENG Jing, ZHU Jian. 4Bit-based Gradient Compression Method for Distributed Deep Learning System [J]. Computer Science, 2020, 47(7): 220-226.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Adaptive Gradient Sparsification Approach to Training Deep Neural Networks

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0