计算机科学 ›› 2023, Vol. 50 ›› Issue (5): 170-176.doi: 10.11896/jsjkx.220400085

• 计算机图形学&多媒体 • 上一篇    下一篇

残差学习与循环注意力下的SSD目标检测算法

贾天豪, 彭力   

  1. 江南大学物联网工程学院物联网技术应用教育部工程研究中心 江苏 无锡 214122
  • 收稿日期:2022-04-11 修回日期:2022-09-13 出版日期:2023-05-15 发布日期:2023-05-06
  • 通讯作者: 彭力(penglimail2002@163.com)
  • 作者简介:(1483794156@qq.com)
  • 基金资助:
    国家自然科学基金(61873112,61802107);台州市发改委基金项目(2106-331000-04-04-295510)

SSD Object Detection Algorithm with Residual Learning and Cyclic Attention

JIA Tianhao, PENG Li   

  1. Engineering Research Center of Internet of Things Technology Applications,School of IoT Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China
  • Received:2022-04-11 Revised:2022-09-13 Online:2023-05-15 Published:2023-05-06
  • About author:JIA Tianhao,born in 1996,postgra-duate.His main research interests include computer vision and deep lear-ning.
    PENG Li,born in 1967,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include visual Internet of Things,action recognition and deep learning.
  • Supported by:
    National Natural Science Foundation of China(61873112,61802107) and Taizhou Development and Reform Commission Foundation Project(2106-331000-04-04-295510).

摘要: 针对Single-Shot Detection的特征金字塔中生成的浅层特征语义信息不足,导致小目标检测性能较差的问题,提出了一种基于残差学习与循环注意力的SSD目标检测算法。首先主干网络采用学习能力更强的Resnet101来提取有效的特征信息;然后通过构建轻量级的单向特征融合块对原特征金字塔中的深特征层与浅特征层特征进行融合,并生成新的特征金字塔,进而丰富用于预测的有效特征层的语义信息;最后提出一种新的空间池化策略,并与残差网络中的跳跃连接相结合构成循环注意力模块,从而引入全局的上下文信息,为局部特征建立全局信息关联。为了解决难易样本数量不平衡的问题,将Focalloss作为回归损失函数。实验结果表明,在PASCAL VOC公共数据集上,该算法的平均检测精度(mAP)为79.7%,较SSD 提高了2.5%。在MS COCO公共数据集上的mAP为30.0%,较SSD 提高了4.9 %。

关键词: 目标检测, 残差学习, 深度学习, 注意力机制, 特征融合

Abstract: To address the problem that the shallow feature semantic information generated in the feature pyramid of Single-Shot Detection is insufficient,resulting in poor performance of small object detection,an SSD object detection algorithm based on resi-dual learning with cyclic attention is proposed.Firstly,the backbone network uses Resnet101,which is more capable of learning,to extract valid feature information.The deep feature layer of the original feature pyramid is then fused with the shallow feature layer by constructing a lightweight one-way feature fusion block,and a new feature pyramid is generated,which in turn enriches the semantic information of the effective feature layer used for prediction.Finally,a new spatial pooling strategy is proposed and combined with jump connections in residual networks to form a cyclic attention module to introduce global contextual information and establish full image dependencies for local features.To address the imbalance in the number of difficult and easy samples,Focalloss is used as the regression loss function.Experimental results show that the average detection accuracy(mAP) of the algorithm is 79.7% on the PASCAL VOC public dataset,an improvement of 2.5 % over SSD.The mAP on the MS COCO public dataset is 30.0%,an improvement of 4.9 % over SSD.

Key words: Object detection, Residual learning, Deep learning, Attention mechanism, Feature fusion

中图分类号: 

  • TP391.4
[1]LI S P,LI C L,HAN J P,et al.Application of Binocular Vision Single Step Multi-target Detection Method for Robot Grasping[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2021,38(5):68-74.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolution-al neural networks[J].Advances in Neural Information Processing Systems,2012,25:1097-1105.
[3]ZHANG K,ZHANG Z,LI Z,et al.Joint face detection andalignment using multitask cascaded convolutional networks[J].IEEE Signal Processing Letters,2016,23(10):1499-1503.
[4]WANG X,HAN T X,YAN S.An HOG-LBP human detector with partial occlusion handling[C]//2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:32-39.
[5]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[6]KONG T,SUN F,YAO A,et al.Ron:Reverse connection with objectness prior networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5936-5944.
[7]LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shotmultibox detector[C]//Proceedings of the European Conference on Computer Vision.2016:21-37.
[8]FU C Y,LIN W,RANGA A,et al.DSSD:deconvolutional single shot detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[9]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[10]SINGH B,DAVIS L S.An analysis of scale invariance in object detection snip[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3578-3587.
[11]LI Z,ZHOU F.FSSD:feature fusion single shot multibox detector[J].arXiv:1712.00960,2017.
[12]YU X,WU S,LU X,et al.Adaptive multiscale feature for object detection[J].Neurocomputing,2021,449:146-158.
[13]ZHANG L,ZHOU B W,WU H L.SSD Network Based on Improved Convolutional Attention Module and Residual Structure[J].Computer Science,2022,49(3):211-217.
[14]MA Y,ZHANG S.Feature Selection Module for CNN Based Object Detector[J].IEEE Access,2021,9:69456-69466.
[15]HUANG D,CHEN Z,FENG X,et al.Object detection method based on graph convolution net under limited samples[J].Journal of Chongqing Institute of Technology University(Natural Science Edition),2022,36(6):172-180.
[16]ZHOU K X,ZUO Y B,GU Y M,et al.Method of Retail Commodity Target Detection Based on YOLO-GT Network[J].Journal of Chongqing Institute of Technology University(Natural Science Edition),2021,35(6):174-184.
[17]HU K,XU D,KAN J.Single-Shot Detection Based on CyclicAttention[J].IEEE Access,2021,9:50557-50569.
[18]WANG F S,CHEN J G,WANG Q S,et al.Multi-scale object detection algorithm based on adaptive context features[J].CAAI Transactions on Intelligent Systems,2021,17(2):276-285.
[19]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the 2020 IEEE Conference on ComputerVision and Pattern Recognition.Piscataway:IEEE,2020:2011-2023.
[20]LI X,WANG W,HU X,et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:510-519.
[21]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[22]HOU Q,ZHANG L,CHENG M M,et al.Strip pooling:Rethin-king spatial pooling for scene parsing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:4003-4012.
[23]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[24]ZHOU P,NI B,GENG C,et al.Scale-transferrable object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:528-537.
[25]DAI J,LI Y,HE K,et al.R-fcn:Object detection via region-based fully convolutional networks[J].arXiv:1605.06409,2016.
[26]BELL S,ZITNICK C L,BALA K,et al.Inside-outside net:Detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2016:2874-2883.
[27]REN S,HE K,GIRSHICK R B,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[1] 杨斌, 梁婧, 周佳薇, 赵梦赐.
基于注意力机制的可解释点击率预估模型研究
Study on Interpretable Click-Through Rate Prediction Based on Attention Mechanism
计算机科学, 2023, 50(5): 12-20. https://doi.org/10.11896/jsjkx.221000032
[2] 李炳辉, 方欢, 梅振辉.
基于BERT和弱行为轮廓的可解释性事件日志修复方法
Interpretable Repair Method for Event Logs Based on BERT and Weak Behavioral Profiles
计算机科学, 2023, 50(5): 38-51. https://doi.org/10.11896/jsjkx.220900030
[3] 陈冲, 陈杰, 张慧, 蔡磊, 薛亚茹.
深度学习可解释性综述
Review on Interpretability of Deep Learning
计算机科学, 2023, 50(5): 52-63. https://doi.org/10.11896/jsjkx.221000044
[4] 黄迅迪, 庞雄文.
基于深度学习的智能设备故障诊断研究综述
Review of Intelligent Device Fault Diagnosis Based on Deep Learning
计算机科学, 2023, 50(5): 93-102. https://doi.org/10.11896/jsjkx.220500197
[5] 王慧妍, 于明鹤, 于戈.
基于深度学习的异质信息网络表示学习方法综述
Deep Learning-based Heterogeneous Information Network Representation:A Survey
计算机科学, 2023, 50(5): 103-114. https://doi.org/10.11896/jsjkx.220800112
[6] 王先旺, 周浩, 张明慧, 朱尤伟.
基于Swin Transformer和三维残差多层融合网络的高光谱图像分类
Hyperspectral Image Classification Based on Swin Transformer and 3D Residual Multilayer Fusion Network
计算机科学, 2023, 50(5): 155-160. https://doi.org/10.11896/jsjkx.220400035
[7] 胡绍凯, 赫晓慧, 田智慧.
基于MLUM-Net的高分遥感影像土地利用多分类方法
Land Use Multi-classification Method of High Resolution Remote Sensing Images Based on MLUM-Net
计算机科学, 2023, 50(5): 161-169. https://doi.org/10.11896/jsjkx.220300110
[8] 阳影, 张凡, 李天瑞.
基于情感知识的双通道图卷积网络的方面级情感分析
Aspect-based Sentiment Analysis Based on Dual-channel Graph Convolutional Network with Sentiment Knowledge
计算机科学, 2023, 50(5): 230-237. https://doi.org/10.11896/jsjkx.220300008
[9] 张雪, 赵晖.
基于多事件语义增强的情感分析
Sentiment Analysis Based on Multi-event Semantic Enhancement
计算机科学, 2023, 50(5): 238-247. https://doi.org/10.11896/jsjkx.220400256
[10] 雪峰豪, 蒋海波, 唐聃.
深度学习在健康医疗中的应用研究综述
Review of Deep Learning Applications in Healthcare
计算机科学, 2023, 50(4): 1-15. https://doi.org/10.11896/jsjkx.220600166
[11] 韩雪明, 贾彩燕, 李轩涯, 张鹏飞.
传播树结构结点及路径双注意力谣言检测模型
Dual-attention Network Model on Propagation Tree Structures for Rumor Detection
计算机科学, 2023, 50(4): 22-31. https://doi.org/10.11896/jsjkx.220200037
[12] 尹恒, 张凡, 李天瑞.
基于多邻接图与多头注意力机制的短期交通流量预测
Short-time Traffic Flow Forecasting Based on Multi-adjacent Graph and Multi-head Attention Mechanism
计算机科学, 2023, 50(4): 40-46. https://doi.org/10.11896/jsjkx.220200079
[13] 雒晓辉, 吴云, 王晨星, 余文婷.
基于用户长短期偏好的序列推荐模型
Sequential Recommendation Model Based on User’s Long and Short Term Preference
计算机科学, 2023, 50(4): 47-55. https://doi.org/10.11896/jsjkx.220100264
[14] 伍瀚, 聂佳浩, 张照娓, 何志伟, 高明煜.
基于深度学习的视觉多目标跟踪研究综述
Deep Learning-based Visual Multiple Object Tracking:A Review
计算机科学, 2023, 50(4): 77-87. https://doi.org/10.11896/jsjkx.220300173
[15] 尹海涛, 王天由.
基于深度多尺度卷积稀疏编码的图像去噪算法
Image Denoising Algorithm Based on Deep Multi-scale Convolution Sparse Coding
计算机科学, 2023, 50(4): 133-140. https://doi.org/10.11896/jsjkx.220100090
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!