基于YOLOv4的目标检测知识蒸馏算法研究

doi:10.11896/jsjkx.210600204

摘要/Abstract

摘要： 知识蒸馏作为一种基于教师-学生网络思想的训练方法,它通过复杂的教师网络来引导网络结构相对简单的学生网络进行训练,使得学生网络获得与教师网络相媲美的精度。知识蒸馏在自然语言处理和图像分类领域均有广泛的研究,而在目标检测领域的研究则相对较少且实验效果有待提升。目标检测的蒸馏算法主要是在特征提取层进行,而单一的特征提取层的蒸馏方式易导致学生不能充分学习教师网络知识,使得模型的精度较差。针对上述问题,通过在特征提取和目标分类与边框预测上都利用了教师网络的“知识”来指导学生网络进行训练,并提出了一种基于多尺度注意力机制的蒸馏算法,使得教师网络的知识更好地流向学生网络。实验分析表明,在YOLOv4基础上提出的蒸馏算法可有效地提高学生网络的检测精度。

关键词: YOLOv4, 模型压缩, 深度学习, 知识蒸馏, 注意力机制

Abstract: Knowledge distillation,as a training method based on the teacher-student network,guides the relatively simple student network to be trained through the complex teacher network,so that the student network can obtain the same precision as the teacher network.It has been widely studied in the field of natural language processing and image classification,while the research in the field of object detection is relatively less,and the experimental effect needs to be improved.The Distillation Algorithm of object detection is mainly carried out in the feature extraction layer,and the distillation method of single feature extraction layer will cause students can't learn the teacher's network knowledge fully,which makes the accuracy of the model poorly.In view of the above problem,this paper uses the “knowledge” in feature extraction,target classification and border prediction of teacher network to guide student network to be trained,and proposes a multi-scale attention Distillation Algorithm to make the know-ledge of teacher network influence student network.Experimental results show that the distillation algorithm proposed in this paper based on YOLOv4 can effectively improve the detection accuracy of the original student network.

Key words: Attention mechanism, Deep learning, Knowledge distillation, Model compression, YOLOv4

中图分类号:

TP391

楚玉春, 龚航, 王学芳, 刘培顺. 基于YOLOv4的目标检测知识蒸馏算法研究[J]. 计算机科学, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204

CHU Yu-chun, GONG Hang, Wang Xue-fang, LIU Pei-shun. Study on Knowledge Distillation of Target Detection Algorithm Based on YOLOv4[J]. Computer Science, 2022, 49(6A): 337-344. https://doi.org/10.11896/jsjkx.210600204

参考文献

[1] GIRSHICK R J,DONAHUE T,DARRE L,et al.Region BasedConvolutional Networks for Accurate Object Detection and Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38:142-158.
[2] HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2014,37(9):1904-1916.
[3] GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1440-1448.
[4] REN S Q, HE K M,GIRSHICK R,et al,Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39:1137-1149.
[5] LIN T Y,PIOTOR DOLLA R,GIRSHICK R,et al.FeaturePyramid Networks for Object Detection[C]//Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[6] CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving intoHigh Quality Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recogntion.2018:6154-6162.
[7] REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:6517-6525.
[8] REDMON J,FARHADI A.YOLOv3:AnIncremental Improve-ment[J].arxiv:1804.02767,2018.
[9] BOCHKOVSKIY A,WANG C Y,LIAO H.YOLOv4:Optimal Speed and Accuracy of Object Detection[J].arXiv:2004.10934,2020.
[10] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Proceedings of European Conference on Computer Vision.2016:21-37.
[11] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//Proceedings of IEEE International Conference on Computer Vision.2017:2980-2988.
[12] BUCILA C,CARUANA R,NICULESCU-MIZIL A.ModelCompression[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'06).Computer Science Cornell University,2006:535-541.
[13] HINTON G,VINYALS O,DEAN J.Distilling the Knowledge in a Neural Network[J].Computer Science,2015,14(7):38-39.
[14] ZAGORUYKO S,KOMODAKIS N.Paying More Attention to Attention:Improving the Performance of Convolutional Neural Networks via Attention Transfer[J].arXiv:1612.03928,2016.
[15] HOU Y,MA Z,LIU C,et al.Learning Lightweight Lane Detection CNNS By Self AttentionDistillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:1013-1021.
[16] HE T,SHEN C,TIAN Z,et al.KnowledgeAdaptation for Efficient Semantic Segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:578-587.
[17] WANG T,YUAN L,ZHANG X,et al.Distilling Object Detectors With Fine-Grained Feature Imitation[C]//2019IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2020.
[18] GAO M,SHEN Y,LI Q,et al.Residual Knowledge Distillation[J].arXiv:2002.09168,2020.
[19] WANG W,HONG W,WANG F,et al.Gan-knowledge distil-lation for one-stage object detection[J].IEEE Access,2020,8:60719-60727.
[20] CHAWLA A,YIN H,MOLCHANOV P,et al.Data-FreeKnowledge Distillation for Object Detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2021:3289-3298.
[21] DAI X,JIANG Z,WU Z,et al.General Instance Distillation for Object Detection[C]//Computer Vision and Pattern Recognition(CVPR).IEEE,2021.
[22] ZHOU D W,MA L Y,TIAN J Y,et al.Super-resolution Reconstruction of Images Based on Feature Fusion Attention Networks[J].Acta Automatica Sinica,2019,57:1-9.
[23] ROMERO A,BALLAS N,KAHOU S E,et al.Fitnets:Hintsfor Thin Deep Nets[J].arXiv:1412.6550,2014.
[24] LI H.Exploring Knowledge Distillation of Deep Neural Nets for Efficient Hardware Solutions[C]//CS230 Report.2018.
[25] EVERINGHAM M,ESLAMI S,GOOL L V,et al.The PASCAL Visual Object Classes Challenge:A Retrospective[J].International Journal of Computer Vision,2015,111(1):98-136.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[4]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[5]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[6]	徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[7]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[8]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[10]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[11]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[12]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[13]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[14]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[15]	侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed