神经网络模型轻量化方法综述

doi:10.11896/jsjkx.230600137

摘要/Abstract

摘要： 近年来,神经网络模型凭借着较强的特征提取能力在各行各业的应用越来越广泛,并取得了不错的效果。然而,随着数据量的不断增大以及人们对高准确率的不断追求,神经网络模型的参数规模急剧增大,网络复杂度不断提高,导致计算、存储等资源开销不断扩大,使其在资源受限场景下的部署面临极大挑战。因此,如何在不影响模型性能的前提下实现模型轻量化,进而降低模型训练和部署的成本成为当前的研究热点之一。为此,文中从复杂模型压缩以及轻量化模型设计两方面入手,对当前典型的模型轻量化方法进行总结和分析,以期厘清模型压缩技术的发展脉络。其中,复杂模型压缩技术从模型剪枝、模型量化、低秩分解、知识蒸馏及混合方式5方面进行归纳,而轻量化模型设计则从空间卷积设计、移位卷积设计和NAS架构搜索3方面进行梳理。

关键词: 神经网络, 模型压缩, 模型剪枝, 模型量化, 模型轻量化

Abstract: In recent years,with its strong feature extraction capability,neural network models have been more and more widely used in various industries and have achieved good results.However,with the increasing amount of data and the pursuit of high accuracy,the parameter size and network complexity of neural network models increase dramatically,leading to the expansion of computation,storage and other resource overheads,making their deployment in resource-constrained scenarios extremely challenging.Therefore,how to achieve model lightweighting without affecting model performance,and thus reduce model training and deployment costs,has become one of the current research hotspots.This paper summarizes and analyzes the typical model lightweighting methods from two aspects:complex model compression and lightweight model design,so as to clarify the development of model compression technology.The complex model compression techniques are summarized in five aspects:model pruning,model quantization,low-rank decomposition,knowledge distillation and hybrid approach,while the lightweight model design is sorted out in three aspects:spatial convolution design,shifted convolution design and neural architecture search.

Key words: Neural networks, Model compression, Model pruning, Model quantization, Model lightweight

中图分类号:

TP183

高杨, 曹仰杰, 段鹏松. 神经网络模型轻量化方法综述[J]. 计算机科学, 2024, 51(6A): 230600137-11. https://doi.org/10.11896/jsjkx.230600137

GAO Yang, CAO Yangjie, DUAN Pengsong. Lightweighting Methods for Neural Network Models:A Review[J]. Computer Science, 2024, 51(6A): 230600137-11. https://doi.org/10.11896/jsjkx.230600137

参考文献

[1]GAO H,TIAN Y L,XU Y,et al.A review of deep learning model compression and acceleration[J].Journal of Software,2021,32(1):68-92.
[2]TANG W H,DONG B,CHEN H,et al.A review of deep neural network model compression methods[J].Intelligent IOT Technology,2021,4(6):1-15.
[3]GENG L L,NIU B N.A review of deep neural network model compression[J].Computer Science and Exploration,2020,14(9):1441-1455.
[4]LANG L,XIA Y Q.A review of research on compact neuralnetwork model design[J].Computer Science and Exploration,2020,14(9):1456-1470.
[5]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[6]YANG A,PAN J,LIN J,et al.Chinese CLIP:Contrastive Vi-sion-Language Pretraining in Chinese[J].arXiv:2211.01335,2022.
[7]GULATI A,QIN J,CHIU C C,et al.Conformer:Convolution-augmented transformer for speech recognition[J].arXiv:2005.08100,2020.
[8]XU J,TAN X,LUO R,et al.NAS-BERT:task-agnostic andadaptive-size BERT compression with neural architecture search[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:1933-1943.
[9]LIU Y,ZHANG W,WANG J.Zero-shot adversarial quantiza-tion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1512-1521.
[10]ZHU J,TANG S,CHEN D,et al.Complementary relation con-trastive distillation[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2021:9260-9269.
[11]WIMMER P,MEHNERT J,CONDURACHE A.Interspacepruning:Using adaptive filter representations to improve trai-ning of sparse cnns[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:12527-12537.
[12]HANSON S,PRATT L.Comparing biases for minimal network construction with back-propagation[J].Advances in Neural Information Processing Systems,1988,1:177-185.
[13]LECUN Y,DENKER J,SOLLA S.Optimal brain damage[J].Advances in Neural Information Processing Systems,1990,2:598-605.
[14]HASSIBI B,STORK D G,WOLFF G J.Optimal brain surgeon and general network pruning[C]//IEEE International Confe-rence on Neural Networks.IEEE,1993:293-299.
[15]HAN S,POOL J,TRAN J,et al.Learning both weights and connections for efficient neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1.2015:1135-1143.
[16]DONG X,CHEN S,PAN S.Learning to prune deep neural networks via layer-wise optimal brain surgeon[C]//Advances in Neural Information Processing Systems.2017:4857-4867.
[17]MARIET Z,SRA S.Diversity networks:Neural network compre-ssion using determinantal point processes[J].arXiv:1511.05077,2015.
[18]KINGMA D P,SALIMANS T,WELLING M.Variationaldropout and the local reparameterization trick[C]//Advances in Neural Information Processing Systems.2015:2575-2583.
[19]MOLCHANOV D,ASHUKHA A,VETROV D.Variationaldropout sparsifies deep neural networks[C]//International Conference on Machine Learning.PMLR,2017:2498-2507.
[20]SRINIVAS S,BABUA R V.Data-free parameter pruningfordeep neural networks[J].arXiv:1507.06149,2015.
[21]WIMMER P,MEHNERT J,CONDURACHE A.Interspacepruning:Using adaptive filter representations to improve trai-ning of sparse cnns[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:12527-12537.
[22]ZHUANG L,LI J,SHEN Z,et al.Learning Efficient Convolutional Networks through Network Slimming[C]// 2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017.
[23]GUO Y,YAO A,CHEN Y.Dynamic network surgery for efficient DNNs[C]//Advances in Neural Information Processing Systems.2016:1379-1387.
[24]JI L,RAO Y,LU J,et al.Runtime neural pruning[C]// Neural Information Processing Systems.2017.
[25]LIU Z,XU J,PENG X,et al.Frequency-domain dynamic pru-ning for convolutional neural networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1051-1061.
[26]GAO X,ZHAO Y,DUDZIAK L,et al.Dynamic Channel Pru-ning:Feature Boosting and Suppression:,10.48550/arXiv.1810.05331[P].2018.
[27]CHEN J,CHEN S,PAN S J.Storage Efficient and DynamicFlexible Runtime Channel Pruning via Deep Reinforcement Learning[C]// Neural Information Processing Systems.2020.
[28]LI C,WANG G,WANG B,et al.Dynamic slimmable network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8607-8617.
[29]COURBARIAUX M,BENGIO Y,DAVID J P.Binaryconnect:Training deep neural networks with binary weights during propagations[C]//Advances in Neural Information Processing Systems.2015:3123-3131.
[30]COURBARIAUX M,HUBARA I,SOUDRY D,et al.Binarized neural networks:Training deep neural networks with weights and activations constrained to+1 or－1[J].arXiv:1602:02830,2016.
[31]RASTEGARI M,ORDONEZ V,REDMON J,et al.Xnor-net:Imagenet classification using binary convolutional neural networks[C]//Proc.of the European Conf.on Computer Vision.Cham:Springer-Verlag,2016:525-542.
[32]LI F,ZHANG B,LIU B.Ternary weight networks[J].arXiv:1605.04711,2016.
[33]ZHU C,HAN S,MAO H,et al.Trained ternary quantization[J].arXiv:1612.01064,2016.
[34]ZHOU A,YAO A,GUO Y,et al.Incremental network quantization:Towards lossless cnns with low-precision weights[J].ar-Xiv:1702.03044,2017.
[35]FARAONE J,FRASER N,BLOTT M,et al.SYQ:Learningsymmetric quantization for efficient deep neural networks[C]//Proc.of the IEEE Conf.on Computer Vision and Pattern Lopes R G,Fenu S,Starner T.Data-free Knowledge Distillation for Deep Neural Networks[J].arXiv:1710.07535,2017.
[36]LENG C,DOU Z,LI H,et al.Extremely low bit neural network:Squeeze the last bit out with ADMM[C]//Proc.of the 32nd AAAI Conf.on Artificial Intelligence.2018.
[37]GONG Y,LIU L,YANG M,et al.Compressing deep convolutional networks using vector quantization[J].arXiv:1412.6115,2014.
[38]XU Y,WANG Y,ZHOU A,et al.Deep neural network compression with single and multiple level quantization[C]//Proc.of the 32nd AAAI Conf.on Artificial Intelligence.2018.
[39]DETTMERS T.8-bit approximations for parallelism in deeplearning[J].arxiv:1511.04561,2015.
[40]WANG K,LIU Z,LIN Y,et al.HAQ:Hardware-Aware Automated Quantization With Mixed Precision[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[41]FANG J,SHAFIEE A,ABDEL-AZIZ H,et al.Post-trainingPiecewise Linear Quantization for Deep Neural Networks[C]//Computer Vision－ECCV 2020.Lecture Notes in Computer Science Vol.12347.Cham:Springer,2020.
[42]LI Y,DONG X,WANG W.Additive Powers-of-Two Quantization:An Efficient Non-uniform Discretization for Neural Networks[C]// International Conference on Learning Representations.2020.
[43]YAMAMOTO K.Learnable Companding Quantization for Accurate Low-bit Neural Networks:,10.48550/arXiv.2103.07156[P].2021.
[44]WANG Z,XIAO H,LU J,et al.Generalizable mixed-precisionquantization via attribution rank preservation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:5291-5300.
[45]CAI Y,YAO Z,DONG Z,et al.Zeroq:A novel zero shot quantization framework[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:13169-13178.
[46]ZHONG Y,LIN M,NAN G,et al.Intraq:Learning syntheticimages with intra-class heterogeneity for zero-shot network quantization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12339-12348.
[47]GAO H,TIAN Y L,XU F Y,et al.A review of deep learningmodel compression and acceleration[J].Journal of Software,2021,32(1):68-92.
[48]JADERBERG M,VEDALDI A,ZISSERMAN A.Speeding up convolutional neural networks with low rank expansions[J].arXiv:1405.3866,2014.
[49]TAI C,XIAO T,ZHANG Y,et al.Convolutional neural networks with low-rank regularization[J].arXiv:1511.06067,2015.
[50]SHEN C,XUE M,WANG X,et al.Customizing student net-works from heterogeneous teachers via adaptive knowledge amalgamation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3504-3513.
[51]LOPES R G,FENU S,STARNER T.Data-free knowledge distillation for deep neural networks[J].arXiv:1710.07535,2017.
[52]YOU S,XU C,XU C,et al.Learning from multiple teacher networks[C]//Proceedings of the 23rd ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining.2017:1285-1294.
[53]REN Y,WU J,XIAO X,et al.Online multi-granularity distillation for gan compression[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:6793-6803.
[54]ULLRICH K,MEEDS E,WELLING M.Soft weight-sharing for neural network compression[J].arXiv:1702.04008,2017.
[55]SONG H,MAO H,DALLY W J.Deep Compression:Com-pressing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding[C]// ICLR.2016.
[56]HAN S,LIU X,MAO H,et al.EIE:Efficient inference engine on compressed deep neural network[J].ACM SIGARCH Computer Architecture News,2016,44(3):243-254.
[57]DUBEY A,CHATTERJEE M,AHUJA N.Coreset-based neural network compression[C]//Proc.of the European Conf.on Computer Vision(ECCV).2018:454-470.
[58]LOUIZOS C,ULLRICH K,WELLING M.Bayesian compression for deep learning[C]//Advances in Neural Information Processing Systems.2017:3288-3298.
[59]JI Y,LIANG L,DENG L,et al.TETRIS:Tile-matching the tremendous irregular sparsity[C]//Advances in Neural Information Processing Systems.2018:4115-4125.
[60]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[61]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[62]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremelyefficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[63]MA N,ZHANG X,ZHENGH T,et al.Shufflenet v2:Practical guidelines for efficient cnn architecture design[C]//Procee-dings of the European Conference on Computer Vision(ECCV).2018:116-131.
[64]WU B,WAN A,YUE X,et al.Shift:A zero flop,zero parameter alternative to spatial convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:9127-9135.
[65]JEON Y,KIM J.Constructing fast network through deconstruction of convolution[C]//32nd Conference on Neural Information Processing Systems(NeurIPS 2018).Neural Information Processing Systems Foundation,2018:5951-5961.
[66]LIU H,SIMONYAN K,VINYALS O,et al.Hierarchical representations for efficient architecture search[J].arXiv:1711.00436,2017.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed