Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230600137-11.doi: 10.11896/jsjkx.230600137

• Artificial Intelligenc • Previous Articles     Next Articles

Lightweighting Methods for Neural Network Models:A Review

GAO Yang, CAO Yangjie, DUAN Pengsong   

  1. School of Cyberspace Security,Zhengzhou University,Zhengzhou 450000,China
  • Published:2024-06-06
  • About author:GAO Yang,born in 2000,master candidate.His main research interests include deep learning and model compression.
    DUAN Pengsong,born in 1983,Ph.D.His main research interests include edge computing and intelligent perception.
  • Supported by:
    Collaborative Innovation Major Project of Zhengzhou(20XTZX06013),Research Foundation Plan in Higher Education Institutions of Henan Province(21A520043),Strategic Research and Consulting Project of Chinese Academy of Engineering(2022HENYB03) and Science and Technology Project of Henan Province(232102210050).

Abstract: In recent years,with its strong feature extraction capability,neural network models have been more and more widely used in various industries and have achieved good results.However,with the increasing amount of data and the pursuit of high accuracy,the parameter size and network complexity of neural network models increase dramatically,leading to the expansion of computation,storage and other resource overheads,making their deployment in resource-constrained scenarios extremely challenging.Therefore,how to achieve model lightweighting without affecting model performance,and thus reduce model training and deployment costs,has become one of the current research hotspots.This paper summarizes and analyzes the typical model lightweighting methods from two aspects:complex model compression and lightweight model design,so as to clarify the development of model compression technology.The complex model compression techniques are summarized in five aspects:model pruning,model quantization,low-rank decomposition,knowledge distillation and hybrid approach,while the lightweight model design is sorted out in three aspects:spatial convolution design,shifted convolution design and neural architecture search.

Key words: Neural networks, Model compression, Model pruning, Model quantization, Model lightweight

CLC Number: 

  • TP183
[1]GAO H,TIAN Y L,XU Y,et al.A review of deep learning model compression and acceleration[J].Journal of Software,2021,32(1):68-92.
[2]TANG W H,DONG B,CHEN H,et al.A review of deep neural network model compression methods[J].Intelligent IOT Technology,2021,4(6):1-15.
[3]GENG L L,NIU B N.A review of deep neural network model compression[J].Computer Science and Exploration,2020,14(9):1441-1455.
[4]LANG L,XIA Y Q.A review of research on compact neuralnetwork model design[J].Computer Science and Exploration,2020,14(9):1456-1470.
[5]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[6]YANG A,PAN J,LIN J,et al.Chinese CLIP:Contrastive Vi-sion-Language Pretraining in Chinese[J].arXiv:2211.01335,2022.
[7]GULATI A,QIN J,CHIU C C,et al.Conformer:Convolution-augmented transformer for speech recognition[J].arXiv:2005.08100,2020.
[8]XU J,TAN X,LUO R,et al.NAS-BERT:task-agnostic andadaptive-size BERT compression with neural architecture search[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:1933-1943.
[9]LIU Y,ZHANG W,WANG J.Zero-shot adversarial quantiza-tion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1512-1521.
[10]ZHU J,TANG S,CHEN D,et al.Complementary relation con-trastive distillation[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2021:9260-9269.
[11]WIMMER P,MEHNERT J,CONDURACHE A.Interspacepruning:Using adaptive filter representations to improve trai-ning of sparse cnns[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:12527-12537.
[12]HANSON S,PRATT L.Comparing biases for minimal network construction with back-propagation[J].Advances in Neural Information Processing Systems,1988,1:177-185.
[13]LECUN Y,DENKER J,SOLLA S.Optimal brain damage[J].Advances in Neural Information Processing Systems,1990,2:598-605.
[14]HASSIBI B,STORK D G,WOLFF G J.Optimal brain surgeon and general network pruning[C]//IEEE International Confe-rence on Neural Networks.IEEE,1993:293-299.
[15]HAN S,POOL J,TRAN J,et al.Learning both weights and connections for efficient neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1.2015:1135-1143.
[16]DONG X,CHEN S,PAN S.Learning to prune deep neural networks via layer-wise optimal brain surgeon[C]//Advances in Neural Information Processing Systems.2017:4857-4867.
[17]MARIET Z,SRA S.Diversity networks:Neural network compre-ssion using determinantal point processes[J].arXiv:1511.05077,2015.
[18]KINGMA D P,SALIMANS T,WELLING M.Variationaldropout and the local reparameterization trick[C]//Advances in Neural Information Processing Systems.2015:2575-2583.
[19]MOLCHANOV D,ASHUKHA A,VETROV D.Variationaldropout sparsifies deep neural networks[C]//International Conference on Machine Learning.PMLR,2017:2498-2507.
[20]SRINIVAS S,BABUA R V.Data-free parameter pruningfordeep neural networks[J].arXiv:1507.06149,2015.
[21]WIMMER P,MEHNERT J,CONDURACHE A.Interspacepruning:Using adaptive filter representations to improve trai-ning of sparse cnns[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:12527-12537.
[22]ZHUANG L,LI J,SHEN Z,et al.Learning Efficient Convolutional Networks through Network Slimming[C]// 2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017.
[23]GUO Y,YAO A,CHEN Y.Dynamic network surgery for efficient DNNs[C]//Advances in Neural Information Processing Systems.2016:1379-1387.
[24]JI L,RAO Y,LU J,et al.Runtime neural pruning[C]// Neural Information Processing Systems.2017.
[25]LIU Z,XU J,PENG X,et al.Frequency-domain dynamic pru-ning for convolutional neural networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:1051-1061.
[26]GAO X,ZHAO Y,DUDZIAK L,et al.Dynamic Channel Pru-ning:Feature Boosting and Suppression:,10.48550/arXiv.1810.05331[P].2018.
[27]CHEN J,CHEN S,PAN S J.Storage Efficient and DynamicFlexible Runtime Channel Pruning via Deep Reinforcement Learning[C]// Neural Information Processing Systems.2020.
[28]LI C,WANG G,WANG B,et al.Dynamic slimmable network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8607-8617.
[29]COURBARIAUX M,BENGIO Y,DAVID J P.Binaryconnect:Training deep neural networks with binary weights during propagations[C]//Advances in Neural Information Processing Systems.2015:3123-3131.
[30]COURBARIAUX M,HUBARA I,SOUDRY D,et al.Binarized neural networks:Training deep neural networks with weights and activations constrained to+1 or-1[J].arXiv:1602:02830,2016.
[31]RASTEGARI M,ORDONEZ V,REDMON J,et al.Xnor-net:Imagenet classification using binary convolutional neural networks[C]//Proc.of the European Conf.on Computer Vision.Cham:Springer-Verlag,2016:525-542.
[32]LI F,ZHANG B,LIU B.Ternary weight networks[J].arXiv:1605.04711,2016.
[33]ZHU C,HAN S,MAO H,et al.Trained ternary quantization[J].arXiv:1612.01064,2016.
[34]ZHOU A,YAO A,GUO Y,et al.Incremental network quantization:Towards lossless cnns with low-precision weights[J].ar-Xiv:1702.03044,2017.
[35]FARAONE J,FRASER N,BLOTT M,et al.SYQ:Learningsymmetric quantization for efficient deep neural networks[C]//Proc.of the IEEE Conf.on Computer Vision and Pattern Lopes R G,Fenu S,Starner T.Data-free Knowledge Distillation for Deep Neural Networks[J].arXiv:1710.07535,2017.
[36]LENG C,DOU Z,LI H,et al.Extremely low bit neural network:Squeeze the last bit out with ADMM[C]//Proc.of the 32nd AAAI Conf.on Artificial Intelligence.2018.
[37]GONG Y,LIU L,YANG M,et al.Compressing deep convolutional networks using vector quantization[J].arXiv:1412.6115,2014.
[38]XU Y,WANG Y,ZHOU A,et al.Deep neural network compression with single and multiple level quantization[C]//Proc.of the 32nd AAAI Conf.on Artificial Intelligence.2018.
[39]DETTMERS T.8-bit approximations for parallelism in deeplearning[J].arxiv:1511.04561,2015.
[40]WANG K,LIU Z,LIN Y,et al.HAQ:Hardware-Aware Automated Quantization With Mixed Precision[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[41]FANG J,SHAFIEE A,ABDEL-AZIZ H,et al.Post-trainingPiecewise Linear Quantization for Deep Neural Networks[C]//Computer Vision-ECCV 2020.Lecture Notes in Computer Science Vol.12347.Cham:Springer,2020.
[42]LI Y,DONG X,WANG W.Additive Powers-of-Two Quantization:An Efficient Non-uniform Discretization for Neural Networks[C]// International Conference on Learning Representations.2020.
[43]YAMAMOTO K.Learnable Companding Quantization for Accurate Low-bit Neural Networks:,10.48550/arXiv.2103.07156[P].2021.
[44]WANG Z,XIAO H,LU J,et al.Generalizable mixed-precisionquantization via attribution rank preservation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:5291-5300.
[45]CAI Y,YAO Z,DONG Z,et al.Zeroq:A novel zero shot quantization framework[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2020:13169-13178.
[46]ZHONG Y,LIN M,NAN G,et al.Intraq:Learning syntheticimages with intra-class heterogeneity for zero-shot network quantization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12339-12348.
[47]GAO H,TIAN Y L,XU F Y,et al.A review of deep learningmodel compression and acceleration[J].Journal of Software,2021,32(1):68-92.
[48]JADERBERG M,VEDALDI A,ZISSERMAN A.Speeding up convolutional neural networks with low rank expansions[J].arXiv:1405.3866,2014.
[49]TAI C,XIAO T,ZHANG Y,et al.Convolutional neural networks with low-rank regularization[J].arXiv:1511.06067,2015.
[50]SHEN C,XUE M,WANG X,et al.Customizing student net-works from heterogeneous teachers via adaptive knowledge amalgamation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3504-3513.
[51]LOPES R G,FENU S,STARNER T.Data-free knowledge distillation for deep neural networks[J].arXiv:1710.07535,2017.
[52]YOU S,XU C,XU C,et al.Learning from multiple teacher networks[C]//Proceedings of the 23rd ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining.2017:1285-1294.
[53]REN Y,WU J,XIAO X,et al.Online multi-granularity distillation for gan compression[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:6793-6803.
[54]ULLRICH K,MEEDS E,WELLING M.Soft weight-sharing for neural network compression[J].arXiv:1702.04008,2017.
[55]SONG H,MAO H,DALLY W J.Deep Compression:Com-pressing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding[C]// ICLR.2016.
[56]HAN S,LIU X,MAO H,et al.EIE:Efficient inference engine on compressed deep neural network[J].ACM SIGARCH Computer Architecture News,2016,44(3):243-254.
[57]DUBEY A,CHATTERJEE M,AHUJA N.Coreset-based neural network compression[C]//Proc.of the European Conf.on Computer Vision(ECCV).2018:454-470.
[58]LOUIZOS C,ULLRICH K,WELLING M.Bayesian compression for deep learning[C]//Advances in Neural Information Processing Systems.2017:3288-3298.
[59]JI Y,LIANG L,DENG L,et al.TETRIS:Tile-matching the tremendous irregular sparsity[C]//Advances in Neural Information Processing Systems.2018:4115-4125.
[60]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[61]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[62]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremelyefficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[63]MA N,ZHANG X,ZHENGH T,et al.Shufflenet v2:Practical guidelines for efficient cnn architecture design[C]//Procee-dings of the European Conference on Computer Vision(ECCV).2018:116-131.
[64]WU B,WAN A,YUE X,et al.Shift:A zero flop,zero parameter alternative to spatial convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:9127-9135.
[65]JEON Y,KIM J.Constructing fast network through deconstruction of convolution[C]//32nd Conference on Neural Information Processing Systems(NeurIPS 2018).Neural Information Processing Systems Foundation,2018:5951-5961.
[66]LIU H,SIMONYAN K,VINYALS O,et al.Hierarchical representations for efficient architecture search[J].arXiv:1711.00436,2017.
[1] SUN Yang, DING Jianwei, ZHANG Qi, WEI Huiwen, TIAN Bowen. Study on Super-resolution Image Reconstruction Using Residual Feature Aggregation NetworkBased on Attention Mechanism [J]. Computer Science, 2024, 51(6A): 230600039-6.
[2] PENG Bo, LI Yaodong, GONG Xianfu, LI Hao. Method for Entity Relation Extraction Based on Heterogeneous Graph Neural Networks and TextSemantic Enhancement [J]. Computer Science, 2024, 51(6A): 230700071-5.
[3] LI Wenting, XIAO Rong, YANG Xiao. Improving Transferability of Adversarial Samples Through Laplacian Smoothing Gradient [J]. Computer Science, 2024, 51(6A): 230800025-6.
[4] LIU Hui, JI Ke, CHEN Zhenxiang, SUN Runyuan, MA Kun, WU Jun. Malicious Attack Detection in Recommendation Systems Combining Graph Convolutional Neural Networks and Ensemble Methods [J]. Computer Science, 2024, 51(6A): 230700003-9.
[5] SUN Jing, WANG Xiaoxia. Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation [J]. Computer Science, 2024, 51(5): 313-320.
[6] LU Min, YUAN Ziting. Graph Contrast Learning Based Multi-graph Neural Network for Session-based RecommendationMethod [J]. Computer Science, 2024, 51(5): 54-61.
[7] ZHAO Miao, XIE Liang, LIN Wenjing, XU Haijiao. Deep Reinforcement Learning Portfolio Model Based on Dynamic Selectors [J]. Computer Science, 2024, 51(4): 344-352.
[8] WANG Ruiping, WU Shihong, ZHANG Meihang, WANG Xiaoping. Review of Vision-based Neural Network 3D Dynamic Gesture Recognition Methods [J]. Computer Science, 2024, 51(4): 193-208.
[9] ZHENG Cheng, SHI Jingwei, WEI Suhua, CHENG Jiaming. Dual Feature Adaptive Fusion Network Based on Dependency Type Pruning for Aspect-basedSentiment Analysis [J]. Computer Science, 2024, 51(3): 205-213.
[10] XU Tianyue, LIU Xianhui, ZHAO Weidong. Knowledge Graph and User Interest Based Recommendation Algorithm [J]. Computer Science, 2024, 51(2): 55-62.
[11] GUO Yuxing, YAO Kaixuan, WANG Zhiqiang, WEN Liangliang, LIANG Jiye. Black-box Graph Adversarial Attacks Based on Topology and Feature Fusion [J]. Computer Science, 2024, 51(1): 355-362.
[12] LIU Yubo, GUO Bin, MA Ke, QIU Chen, LIU Sicong. Design of Visual Context-driven Interactive Bot System [J]. Computer Science, 2023, 50(9): 260-268.
[13] ZHU Yuying, GUO Yan, WAN Yizhao, TIAN Kai. New Word Detection Based on Branch Entropy-Segmentation Probability Model [J]. Computer Science, 2023, 50(7): 221-228.
[14] HUANG Yujiao, CHEN Mingkai, ZHENG Yuan, FAN Xinggang, XIAO Jie, LONG Haixia. Text Classification Based on Weakened Graph Convolutional Networks [J]. Computer Science, 2023, 50(6A): 220700039-5.
[15] LUO Huilan, LONG Jun, LIANG Miaomiao. Attentional Feature Fusion Approach for Siamese Network Based Object Tracking [J]. Computer Science, 2023, 50(6A): 220300237-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!