计算机科学 ›› 2023, Vol. 50 ›› Issue (5): 170-176.doi: 10.11896/jsjkx.220400085

• 计算机图形学&多媒体 • 上一篇    下一篇

残差学习与循环注意力下的SSD目标检测算法

贾天豪, 彭力   

  1. 江南大学物联网工程学院物联网技术应用教育部工程研究中心 江苏 无锡 214122
  • 收稿日期:2022-04-11 修回日期:2022-09-13 出版日期:2023-05-15 发布日期:2023-05-06
  • 通讯作者: 彭力(penglimail2002@163.com)
  • 作者简介:(1483794156@qq.com)
  • 基金资助:
    国家自然科学基金(61873112,61802107);台州市发改委基金项目(2106-331000-04-04-295510)

SSD Object Detection Algorithm with Residual Learning and Cyclic Attention

JIA Tianhao, PENG Li   

  1. Engineering Research Center of Internet of Things Technology Applications,School of IoT Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China
  • Received:2022-04-11 Revised:2022-09-13 Online:2023-05-15 Published:2023-05-06
  • About author:JIA Tianhao,born in 1996,postgra-duate.His main research interests include computer vision and deep lear-ning.
    PENG Li,born in 1967,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include visual Internet of Things,action recognition and deep learning.
  • Supported by:
    National Natural Science Foundation of China(61873112,61802107) and Taizhou Development and Reform Commission Foundation Project(2106-331000-04-04-295510).

摘要: 针对Single-Shot Detection的特征金字塔中生成的浅层特征语义信息不足,导致小目标检测性能较差的问题,提出了一种基于残差学习与循环注意力的SSD目标检测算法。首先主干网络采用学习能力更强的Resnet101来提取有效的特征信息;然后通过构建轻量级的单向特征融合块对原特征金字塔中的深特征层与浅特征层特征进行融合,并生成新的特征金字塔,进而丰富用于预测的有效特征层的语义信息;最后提出一种新的空间池化策略,并与残差网络中的跳跃连接相结合构成循环注意力模块,从而引入全局的上下文信息,为局部特征建立全局信息关联。为了解决难易样本数量不平衡的问题,将Focalloss作为回归损失函数。实验结果表明,在PASCAL VOC公共数据集上,该算法的平均检测精度(mAP)为79.7%,较SSD 提高了2.5%。在MS COCO公共数据集上的mAP为30.0%,较SSD 提高了4.9 %。

关键词: 目标检测, 残差学习, 深度学习, 注意力机制, 特征融合

Abstract: To address the problem that the shallow feature semantic information generated in the feature pyramid of Single-Shot Detection is insufficient,resulting in poor performance of small object detection,an SSD object detection algorithm based on resi-dual learning with cyclic attention is proposed.Firstly,the backbone network uses Resnet101,which is more capable of learning,to extract valid feature information.The deep feature layer of the original feature pyramid is then fused with the shallow feature layer by constructing a lightweight one-way feature fusion block,and a new feature pyramid is generated,which in turn enriches the semantic information of the effective feature layer used for prediction.Finally,a new spatial pooling strategy is proposed and combined with jump connections in residual networks to form a cyclic attention module to introduce global contextual information and establish full image dependencies for local features.To address the imbalance in the number of difficult and easy samples,Focalloss is used as the regression loss function.Experimental results show that the average detection accuracy(mAP) of the algorithm is 79.7% on the PASCAL VOC public dataset,an improvement of 2.5 % over SSD.The mAP on the MS COCO public dataset is 30.0%,an improvement of 4.9 % over SSD.

Key words: Object detection, Residual learning, Deep learning, Attention mechanism, Feature fusion

中图分类号: 

  • TP391.4
[1]LI S P,LI C L,HAN J P,et al.Application of Binocular Vision Single Step Multi-target Detection Method for Robot Grasping[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2021,38(5):68-74.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolution-al neural networks[J].Advances in Neural Information Processing Systems,2012,25:1097-1105.
[3]ZHANG K,ZHANG Z,LI Z,et al.Joint face detection andalignment using multitask cascaded convolutional networks[J].IEEE Signal Processing Letters,2016,23(10):1499-1503.
[4]WANG X,HAN T X,YAN S.An HOG-LBP human detector with partial occlusion handling[C]//2009 IEEE 12th International Conference on Computer Vision.IEEE,2009:32-39.
[5]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2117-2125.
[6]KONG T,SUN F,YAO A,et al.Ron:Reverse connection with objectness prior networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:5936-5944.
[7]LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shotmultibox detector[C]//Proceedings of the European Conference on Computer Vision.2016:21-37.
[8]FU C Y,LIN W,RANGA A,et al.DSSD:deconvolutional single shot detector[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2881-2890.
[9]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[10]SINGH B,DAVIS L S.An analysis of scale invariance in object detection snip[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:3578-3587.
[11]LI Z,ZHOU F.FSSD:feature fusion single shot multibox detector[J].arXiv:1712.00960,2017.
[12]YU X,WU S,LU X,et al.Adaptive multiscale feature for object detection[J].Neurocomputing,2021,449:146-158.
[13]ZHANG L,ZHOU B W,WU H L.SSD Network Based on Improved Convolutional Attention Module and Residual Structure[J].Computer Science,2022,49(3):211-217.
[14]MA Y,ZHANG S.Feature Selection Module for CNN Based Object Detector[J].IEEE Access,2021,9:69456-69466.
[15]HUANG D,CHEN Z,FENG X,et al.Object detection method based on graph convolution net under limited samples[J].Journal of Chongqing Institute of Technology University(Natural Science Edition),2022,36(6):172-180.
[16]ZHOU K X,ZUO Y B,GU Y M,et al.Method of Retail Commodity Target Detection Based on YOLO-GT Network[J].Journal of Chongqing Institute of Technology University(Natural Science Edition),2021,35(6):174-184.
[17]HU K,XU D,KAN J.Single-Shot Detection Based on CyclicAttention[J].IEEE Access,2021,9:50557-50569.
[18]WANG F S,CHEN J G,WANG Q S,et al.Multi-scale object detection algorithm based on adaptive context features[J].CAAI Transactions on Intelligent Systems,2021,17(2):276-285.
[19]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the 2020 IEEE Conference on ComputerVision and Pattern Recognition.Piscataway:IEEE,2020:2011-2023.
[20]LI X,WANG W,HU X,et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:510-519.
[21]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[22]HOU Q,ZHANG L,CHENG M M,et al.Strip pooling:Rethin-king spatial pooling for scene parsing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:4003-4012.
[23]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//European Conference on Computer Vision.Cham:Springer,2014:740-755.
[24]ZHOU P,NI B,GENG C,et al.Scale-transferrable object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:528-537.
[25]DAI J,LI Y,HE K,et al.R-fcn:Object detection via region-based fully convolutional networks[J].arXiv:1605.06409,2016.
[26]BELL S,ZITNICK C L,BALA K,et al.Inside-outside net:Detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2016:2874-2883.
[27]REN S,HE K,GIRSHICK R B,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!