计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 261-266.doi: 10.11896/jsjkx.190700062

• 人工智能 • 上一篇    下一篇

基于剪枝与量化的卷积神经网络压缩方法

孙彦丽, 叶炯耀   

  1. 华东理工大学信息科学与工程学院 上海 200237
  • 出版日期:2020-08-15 发布日期:2020-08-10
  • 通讯作者: 叶炯耀(yejy@ecust.edu.cn)
  • 作者简介:1072992052@qq.com

Convolutional Neural Networks Compression Based on Pruning and Quantization

SUN Yan-li, YE Jiong-yao   

  1. College of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:SUN Yan-li, born in 1992, postgraduate.Her main research interests include neural network compression and so on.
    YE Jiong-yao, born in 1978, professor, postgraduate’s supervisor.His main research interests include IC/SOC design and low-power research, video, radio and television chip development.

摘要: 随着深度学习的发展, 卷积神经网络作为其重要算法之一, 被广泛应用到计算机视觉、自然语言处理及语音处理等各个领域, 并取得了比传统算法更为优秀的成绩。但是, 卷积神经网络结构复杂, 参数量和计算量巨大, 使得很多算法必须在GPU上实现, 导致卷积神经网络难以应用在资源不足且实时性要求很高的移动端。为了解决上述问题, 文中提出通过同时优化卷积神经网络的结构和参数来对卷积神经网络进行压缩, 以使网络模型尺寸变小。首先, 根据权重对网络模型结果的影响程度来对权重进行剪枝, 保证在去除网络结构冗余信息的同时保留模型的重要连接;然后通过量化感知(quantization-aware-trai-ning)对卷积神经网络的浮点型权重和激活值进行完全量化, 将浮点运算转换成定点运算, 在降低网络模型计算量的同时减少网络模型的尺寸。文中选用tensorflow深度学习框架, 在Ubuntu16.04操作系统中使用Spyder编译器对所提算法进行验证。实验结果表明, 该算法使结构简单的LeNet模型从1.64M压缩至0.36M, 压缩比达到78%, 准确率只下降了了0.016;使轻量级网络Mobilenet模型从16.9M压缩至3.1M, 压缩比达到81%, 准确率下降0.03。实验数据说明, 在对卷积神经网络权重剪枝与参数量化之后, 该算法可以做到在准确率损失较小的情况下, 对模型进行有效压缩, 解决了卷积神经网络模型难以部署到移动端的问题。

关键词: 参数量化, 卷积神经网络, 量化感知, 权重剪枝, 网络压缩

Abstract: With the development of deep learning, Convolutional Neural Networks(CNN), as one of its important algorithms, is widely applied in a variety of fields such as target detection, natural language processing, speech recognition and image identification, and achieves better results than the traditional algorithm.The number of parameters and calculation are increasing with the depth of network structure, resulting in many algorithms must be implemented on GPU.So it is difficult to apply the CNN model to mobile terminals which has limited resources and high real-time request.To solve this problem, this paper presents a method of optimizing network structure and parameters simultaneously.Firstly, the algorithms prunes weight according to its influence on the results of network, and ensure that the redundant information is removed while retaining the important connection of the mo-del.Then, this paper quantizes the float-point weight and activation of CNN.This changes float-point operation to fixed-point ope-ration.It not only reduces the computational complexity of the network model, but also reduces the size of the network model.To verify the algorithm, deep learning framework tensorflow is selected, and Spyder compiler is used in Ubuntu 16.04 operating system.The experimental results show that this method reduces size of LeNet model with simple structure from 1.64M to 0.36M.Compression ratio reaches 78%, but the accuracy drops only 0.016.And it also reduces the size of MobileNet with Lightweight network from 16.9M to 3.1M.Compression ratio reaches 81% with the accuracy dropping only 0.03.The data show that combining the weight pruning and parameter quantification of convolution neural network can effectively compress the convolution neural network within the acceptable range of accuracy loss.So, this method solve the difficulty of deploying the convolution neural network to the mobile terminal.

Key words: Convolutional neural networks, Network compression, Parameters quantization, Quantization-aware-training, Weight pruning

中图分类号: 

  • TP183
[1]KRIZHEVSKY A, SUTSKEVER I, HINTON G E.Imagenet classification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[2]SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition.arXiv:1409.1556, 2014.
[3]SZEGEDY C, LIU W, JIA Y, et al.Going deeper with convolutions[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[4]HE K, ZHANG X, REN S, et al.Deep residual learning for ima-ge recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[5]REN S, HE K, GIRSHICK R, et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]∥Advances in Neural Information Processing Systems.2015:91-99.
REDMON J, DIVVALA S, GIRSHICK R, et al.You only look once:Unified, real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:779-788.
[7]LONG J, SHELHAMER E, DARRELL T.Fully convolutional networks for semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[8]HOWARD A G, ZHU M, CHEN B, et al.Mobilenets:Efficientconvolutional neural networks for mobile vision applications.arXiv:1704.04861, 2017.
[9]IANDOLA F N, HAN S, MOSKEWICZ M W, et al.Squee-zeNet:AlexNet-level accuracy with 50x fewer parameters and< 0.5MB model size.arXiv:1602.07360, 2016.
[10]ZHANG X, ZHOU X, LIN M, et al.Shufflenet:An extremely efficient convolutional neural network for mobile devices[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[11]HUANG G, LIU Z, VAN DER MAATEN L, et al.Denselyconnected convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[12]HAN S, POOL J, TRAN J, et al.Learning both weights andconnections for efficient neural network[C]∥Advances in Neural Information Processing Systems.2015:1135-1143.
[13]HAN S, MAO H, DALLY W J.Deep compression:Compressing deep neural networks with pruning, trained quantization and huffman coding.arXiv:1510.00149, 2015.
[14]LECUN Y, DENKER J S, SOLLA S A.Optimal brain damage[C]∥Advances in Neural Information Processing Systems.1990:598-605.
[15]LEBEDEV V, LEMPITSKY V.Fast convNets using group-wise brain damage∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2554-2564.
COURBARIAUX M, BENGIO Y, DAVID J P.Binaryconnect:Training deep neural networks with binary weights during propa-gations[C]∥Advances in Neural Information Processing Systems.2015:3123-3131.
RASTEGARI M, ORDONEZ V, REDMON J, et al.Xnor-net:Imagenet classification using binary convolutional neural networks[C]∥European Conference on Computer Vision.Cham:Springer, 2016:525-542.
LI F, ZHANG B, LIU B.Ternary weight networks.ar-Xiv:1605.04711, 2016.
ZHOU A, YAO A, GUO Y, et al.Incremental network quantization:Towards lossless cnns with low-precision weights .ar-Xiv:1702.03044, 2017.
COURBARIAUX M, HUBARA I, SOUDRY D, et al.Binarized Neural Networks:Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1.arXiv:1602.02830, 2016.
WANG P, HU Q, ZHANG Y, et al.Two-step quantization for low-bit neural networks[C]∥Proceedings of the IEEEConfe-rence on Computer Vision and Pattern Recognition.2018:4376-4384.
KRISHNAMOORTHI R.Quantizing deep convolutional net-works for efficient inference:A whitepaper.arXiv:1806.08342, 2018.
CAI R C, ZHONG C R, YU Y, et al.CNN quantization and com-pression strategy for edge computing applications.Journal of Computer Applications, 2018, 38(9):2449-2454.
JACOB B, KLIGYS S, CHEN B, et al.Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:2704-2713.
GYSEL P, MOTAMEDI M, GHIASI S.Hardware-oriented approximation of convolutional neural networks.arXiv:1604.03168, 2016.
HAMMERSTROM D.A VLSI architecture for high-perfor-mance, low-cost, on-chip learning∥1990 IJCNN International Joint Conference on Neural Networks.IEEE, 1990:537-544.
IOFFE S, SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift.arXiv:1502.0316.2015.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[7] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[8] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[9] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[10] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[11] 王杉, 徐楚怡, 师春香, 张瑛.
基于CNN-LSTM的卫星云图云分类方法研究
Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM
计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177
[12] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[13] 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤.
不同数据增强方法对模型识别精度的影响
Influence of Different Data Augmentation Methods on Model Recognition Accuracy
计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
[14] 孙洁琪, 李亚峰, 张文博, 刘鹏辉.
基于离散小波变换的双域特征融合深度卷积神经网络
Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation
计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199
[15] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!