计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 272-276.doi: 10.11896/jsjkx.180901630

• 图形图像与模式识别 • 上一篇    下一篇

融合权重与卷积核删减的SSD网络压缩

韩佳林1, 王琦琦1, 杨国威1, 陈隽2, 王以忠1   

  1. (天津科技大学电子信息与自动化学院 天津300000)1
    (麦克马斯特大学电子工程系 汉密尔顿L8P3H9)2
  • 收稿日期:2018-09-03 出版日期:2019-11-15 发布日期:2019-11-14
  • 通讯作者: 王琦琦(1984-),男,博士,讲师,主要研究方向为深度学习、室内定位,E-mail:wangqiqi@tust.edu.cn
  • 作者简介:韩佳林(1992-),女,硕士生,主要研究方向为深度学习、网络压缩与加速;杨国威(1988-),男,博士,讲师,主要研究方向为视觉检测、深度学习与人工智能;陈隽(1978-),男,博士,教授,主要研究方向为信息论、数字通信、多媒体信号处理、分布式数据压缩和存储、机器学习以及大数据处理;王以忠(1963-),男,博士,教授,主要研究方向为深度学习与人工智能。

SSD Network Compression Fusing Weight and Filter Pruning

HAN Jia-lin1, WANG Qi-qi1, YANG Guo-wei1, CHEN Jun2, WANG Yi-zhong1   

  1. (School of Electronic Information and Automation,Tianjin University of Science and Technology,Tianjin 300000,China)1
    (Department of Electronic Engineering,McMaster University,Hamilton L8P3H9,Canada )2
  • Received:2018-09-03 Online:2019-11-15 Published:2019-11-14

摘要: 目标检测是计算机视觉领域中重要的研究方向。近几年,深度学习在基于视频的目标检测领域取得了突破性研究进展。深度学习强大的特征学习和特征表达能力,使其能够自动学习和提取相关特征并加以利用。然而,复杂的网络结构使得深度学习模型具有参数规模大、计算需求高、占用存储空间大等问题。基于深度神经网络的单发多框检测器(Single-shot Multi-box Detector 300,SSD300)能够对视频中的目标进行实时检测,但无法移植到嵌入式设备或移动终端以满足实际应用中的需求。为了解决该问题,文中提出了一种权重删减和卷积核删减融合的方法。首先,针对深度卷积神经网络模型权重参数过多导致模型过大的问题,采用权重删减的方法移除各卷积层中的冗余权重,确定各层权重的稀疏度;然后,针对卷积层计算量大的问题,根据各卷积层中的权重稀疏度对冗余卷积核进行删减,以减少冗余参数和计算量;最后,对删减后的神经网络进行训练,以恢复其检测精度。为验证该方法的有效性,在卷积神经网络框架caffe平台上对SSD网络模型进行验证。结果表明,压缩加速后的SSD300网络模型的大小为12.5MB,检测速度最高可达50FPS (frames per second)。实验实现了在网络检测准确率下降尽量小的前提下,将SSD300网络压缩了8.4×,加速了2×。权重删减和卷积核删减融合的方法为SSD300网络在视频检测中的智能化应用提供了可行性方案。

关键词: 深度神经网络, 单发多框检测器, 网络压缩与加速, 权重删减, 卷积核删减

Abstract: Object detection is an important research direction in the field of computer vision.In recent years,deep lear-ning has achieved great breakthroughs in object detection which is based on the video.Deep learning has powerful ability of feature learning and feature representation.The ability enables it to automatically learn,extract and utilize relevant features.However,complex network structure makes the deep learning model have a large scale of parameter.The deep neural network is both computationally intensive and memory intensive.Single Shot MultiBox Detector300 (SSD300),a single-shot detector,produces markedly superior detection accuracy and speed by using a single deep neural network.But it is difficult to deploy it on object detection systems with limited hardware resources.To address this limitation,the fusing method of weight pruning and filter pruning was proposed to reduce the storage requirement and inference time required by neural networks without affecting its accuracy.Firstly,in order to reduce the number of excessive weight parameters in the model of deep neural network,the weight pruning method is proposed.Network connections is pruned,in which weight is unimportant.Then,to reduce the large computation in convolution layer,the redundant filters are pruned according to the percentage of effective weights in each layer.Finally,the pruned neural network is trained to restore its detection accuracy.To verify the effectiveness of the method,the SSD300 was validated on caffe which is the convolutional neural network framework.After compression and acceleration,the storage of SSD300 neural network required is 12.5MB and the detection speed is 50FPS.The fusion of weight and filter pruning achieves the result by 2× speed-up,which reduces the storage required by SSD300 by 8.4×,as little increase of error as possible.The fusing method of weight and filter pruning makes it possible for SSD300 to be embedded in intelligent systems to detect and track objects.

Key words: Deep neural networks, Single-shot multi-box detector (SSD), Network compression and acceleration, Weight pruning, Filter pruning

中图分类号: 

  • TP183
[1] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:580-587.
[2] GIRSHICK R.Fast R-CNN[C]∥Proceedings of the IEEE Conference on International Conference on Computer Vision.Boston:IEEE,2015:1440-1448.
[3] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks [J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[4] REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2016:779-788.
[5] LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shotmultibox detector[C]∥Proceedings of European Conference on Computer Vision.Amsterdam:Springer International Publishing,2016:21-37.
[6] HAN S,POOL J,TRAN J,et al.Learning both weights andconnections for efficient neural networks[M]∥Neural Information Processing Systems.Morgan Kaufmann Publishers Inc,2015:1135-1143.
[7] HAN S,POOL J,DALLY W J,et al.Deep Compression:compressing deep neuralnetworks with pruning,trained quantization and huffman coding[C]∥Proceedings of Conference on Learning Representations.San Juan:IEEE,2016:233-242.
[8] MOHAMMAD R,VICENTE O,JOSEPH R,et al.XOR-Net:ImageNet Classification Using Binary Convolutional Neural Networks[C]∥Proceedings of European Conference on Computer Vision.Amsterdam:ECCV,2016:525-542.
[9] MATTHIEU C,ITAY H,DANIEL S,et al.Binarized NeuralNetworks:Training Neural Networks with Weights and Activations Constrained to +1 or -1[EB/OL].https://arxiv.org/abs/1704.04861.pdf.
[10] GEOFFREY H,ORIOL V,JEFF D,et al.Distilling the knowledge in a Neural Network[C]∥Proceedings of Conference on Advances in Neural Infermation Processing Systems.Montreal:IEEE,2014:2644-2652.
[11] BHARAT BHUSAN S,VINEETH N.B.Deep Model Compression:Distilling Knowledge from Noisy Teachers[EB/OL].https://arxiv.org/abs/1610.09650.pdf.
[12] MAX J,ANDREA V,ANDREW Z,et al.Speeding up Convolutional Neural Networks with Low Rank Expansions[J].Computer Science,2014,4(4):1-7.
[13] VIKAS S,TARA N S,SANJIV K,et al.Structured Transforms for Small-Footprint Deep Learning[EB/OL].https://arxiv.org/abs/1510.01722.pdf.
[14] WEN W,WU C,WANG Y,et al.Learning structured sparsity in deep neural networks[M]∥Advances in Neural Information Processing Systems.Berlin:Springer,2016:2074-2082.
[15] LIU Z,SHEN Z,HUANG G,et al.Learning efficient convolutional networks through network slimming[C]∥Proceedings of the IEEE International Conference on Computer Vision(ICCV).IEEE,2017:2755-2763.
[16] HE Y,ZHANG X,SUN J,et al.Channel pruning for accelerating very deep neural networks [EB/OL].https://arxiv.org/abs/1707.06168.pdf.
[17] IANDOLA F N,HAN S,MOSKEWICZ M W,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[C]∥Proceedings of International Conference on Learning Representations.San Juan:ICLR,2016.
[18] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:efficient convolutional neural networks for mobile vision applications [EB/OL].https://arxiv.org/abs/1704.04861.pdf.
[19] ZHANG X,ZHOU X,LIN M,et al.ShuffleNet:an extremely efficient convolutional neural network for mobile devices [EB/OL].https:// arxiv.org/abs/1707.01083.pdf.
[20] EVERINGHAM M,VAN G L,WILLIAMS C K I,et al.Thepascal visual object classes (voc) challenge[J].International journal of computer vision,2010,88(2):303-338.
[21] HANSON S J,PRATT L Y.Comparing biases for minimal network construction with back-propagation[M]∥Neural Information Processing Systems.Morgan Kaufmann Publishers Inc,1989:177-185.
[22] CUN Y L,DENKER J S,SOLLA S A,et al.Optimal brain damage[C]∥Neural Information Processing Systems.Morgan Kaufmann Publishers Inc,1990:598-605.
[23] HASSIBI B,STORK D G.Second Order derivatives for network pruning:optimal brain surgeon[C]∥Neural Information Processing Systems.Morgan Kaufmann Publishers Inc,1992:164-171.
[24] HAN Y F,JIANG T H,MA Y P,et al.Compression of deep neural networks [J].Application Research of Computers,2018,35(10):2894-2897.(in Chinese)韩云飞,蒋同海,马玉鹏,等.深度神经网络的压缩研究[J].计算机应用研究,2018,35(10):2894-2897.
[25] 焦李成.深度学习、优化与识别[M].北京:清华大学出版社,2017:104.
[1] 张艳梅, 楼胤成. 基于深度神经网络的庞氏骗局合约检测方法[J]. 计算机科学, 2021, 48(1): 273-279.
[2] 丁子昂, 乐曹伟, 吴玲玲, 付明磊. 基于CEEMD-Pearson和深度LSTM混合模型的PM2.5浓度预测方法[J]. 计算机科学, 2020, 47(6A): 444-449.
[3] 尚骏远, 杨乐涵, 何琨. 基于特征可视化分析深度神经网络的内部表征[J]. 计算机科学, 2020, 47(5): 190-197.
[4] 唐国强,高大启,阮彤,叶琪,王祺. 融入语言模型和注意力机制的临床电子病历命名实体识别[J]. 计算机科学, 2020, 47(3): 211-216.
[5] 樊玮, 刘挺, 黄睿, 郭青, 张宝. 卷积神经网络低层特征辅助的图像实例分割方法[J]. 计算机科学, 2020, 47(11): 186-191.
[6] 孔繁钰, 周愉峰, 陈纲. 基于时空特征挖掘的交通流量预测方法[J]. 计算机科学, 2019, 46(7): 322-326.
[7] 彭金喜, 苏远歧, 薛笑荣. 基于深度学习和同生矩阵的SAR图像纹理特征检索方法[J]. 计算机科学, 2019, 46(6A): 196-199.
[8] 肖锐, 蒋家琪, 张云春. 多义词语义拓扑及有监督的词义消歧研究[J]. 计算机科学, 2019, 46(11A): 13-18.
[9] 陈胜, 朱国胜, 祁小云, 雷龙飞, 吴善超, 吴梦宇. 基于深度神经网络的自定义用户异常行为检测[J]. 计算机科学, 2019, 46(11A): 442-445.
[10] 张爱英. 基于多语言语音数据选择的资源稀缺蒙语语音识别研究[J]. 计算机科学, 2018, 45(9): 308-313.
[11] 郭文生, 包灵, 钱智成, 曹万里. 基于自适应叠合分割与深度神经网络的人数统计方法[J]. 计算机科学, 2018, 45(8): 229-235.
[12] 崔璐,张鹏,车进. 基于深度神经网络的遥感图像分类算法综述[J]. 计算机科学, 2018, 45(6A): 50-53.
[13] 张爱英,倪崇嘉. 资源稀缺蒙语语音识别研究[J]. 计算机科学, 2017, 44(10): 318-322.
[14] 李昆,柴玉梅,赵红领,赵悦淑,南晓斐. 基于深度神经网络的胎儿体重预测[J]. 计算机科学, 2016, 43(Z11): 73-76.
[15] 李伟林,文剑,马文凯. 基于深度神经网络的语音识别系统研究[J]. 计算机科学, 2016, 43(Z11): 45-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .