计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 231-237.doi: 10.11896/jsjkx.211100281

• 计算机图形学&多媒体 • 上一篇    下一篇

跨层融合和感受野扩增的SSD目标检测算法

张卫良, 陈秀宏   

  1. 江南大学人工智能与计算机学院 江苏 无锡 214122
    江苏省媒体设计与软件技术重点实验室 江苏 无锡 214122
  • 收稿日期:2021-11-28 修回日期:2022-08-20 出版日期:2023-03-15 发布日期:2023-03-15
  • 通讯作者: 陈秀宏(625325682@163.com)
  • 作者简介:(17760867927@163.com)

SSD Object Detection Algorithm with Cross-layer Fusion and Receptive Field Amplification

ZHANG Weiliang, CHEN Xiuhong   

  1. School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China
    Jiangsu Key Laboratory of Media Design and Software Technology,Wuxi,Jiangsu 214122,China
  • Received:2021-11-28 Revised:2022-08-20 Online:2023-03-15 Published:2023-03-15
  • About author:ZHANG Weiliang,born in 1997,postgraduate.His main research interests include object detection and so on.
    CHEN Xiuhong,born in 1964,professor.His main research interests include digital image processing,pattern recognition,artificial intelligence and moving targets tracking,etc.

摘要: 鉴于SSD(Single Shot Multibox Detector)不同层缺乏信息的交互以及模型感受野的限制,提出了一种改进的SSD目标检测算法——ESSD(Enhanced SSD),以提高目标检测的准确性。首先,使用SSD模型中原有的多尺度特征图,利用FPN(Feature Pyramid Networks)的思想,设计了一种跨层信息交互模块,在增强了不同层的语义信息能力的同时减小了不同层的信息差异。然后,为了提高模型的感受野和多尺度检测能力,设计了一种感受野扩增模块。最后,采用批处理归一化层缩短训练时间,以提高模型的收敛速度。为了评价ESSD的有效性,在PASCAL VOC2007测试集以及PASCAL VOC2012测试集上进行了实验。实验结果表明,在PASCAL VOC2007数据集上其mAP为82.1%且检测速度为15.7FPS,相比原有的SSD512,其mAP提升了2.3%;在PASCAL VOC2012测试集上其mAP达到了80.6%,也比SSD512高2.1%。实验证明了ESSD检测器在达到较高检测精度的情况下,仍然可以满足实时性。

关键词: 目标检测, 信息融合, 感受野, 多尺度, SSD

Abstract: In view of the lack of information interaction between different layers of single shot multibox detector(SSD) and the limitation of the model's receptive field,an improved SSD object detection algorithm,named ESSD(enhanced SSD),is proposed to improve the accuracy of object detection.First of all,using the original multi-scale feature map in the SSD model and using the idea of feature pyramid networks(FPN),a cross-layer information interaction module is designed,which enhances the semantic information capabilities of different layers and reduces the information difference of different layers.Then,in order to improve the receptive field and multi-scale detection capabilities of the model,a receptive field amplification module is designed.Finally,the batch normalization layer is used to reduce the training time and improve the convergence speed of the model.In order to evaluate the effectiveness of ESSD,experiments are conducted on the PASCAL VOC2007 and PASCAL VOC2012 test sets.Experimental results show that on the PASCAL VOC2007 data set,its mAP is 82.1% and the detection speed is 15.7FPS.Compared with the original SSD512,its mAP increases by 2.3%;on the PASCAL VOC2012 test set,its mAP reaches 80.6%,which is also 2.1% higher than SSD512.Experiments have proved that the ESSD detector can still meet the real-time performance under the condition of high detection accuracy.

Key words: Object detection, Information fusion, Receptive field, Multi-scale, SSD

中图分类号: 

  • TP391
[1]LIU W,ANGUELOY D,ERHAN D,et al.Ssd:single shotmultibox detector[C]//European Conference on Computer Vision.2016:21-37.
[2]LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.California:IEEE Computer Society,2017:2117-2125.
[3]LIU S,QI L,QIN H,et al.Path aggregation network for instance segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New Jersey:IEEE,2018:8759-8768.
[4]GHIASI G,LIN T Y,LE Q V.Nas-fpn:learning scalable feature pyramid architecture for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Jersey:IEEE,2019:7036-7045.
[5]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.New Jersey:IEEE,2016:779-788.
[6]REDMON J,FARUADI A.YOLO9000:Better,Faster,Stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.New Jersey:IEEE,2017:6517-6525.
[7]REDMON J,FARUADI A.Yolov3:an incremental improve-ment[J].arXiv:1804.02767,2018.
[8]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2015.
[9]IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[J].arXiv:1502.03167,2015.
[10]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[11]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington:IEEE Computer Society,2005:886-893.
[12]NOBLE W S.What is a support vector machine?[J].Nature Biotechnology,2006,24(12):1565-1567.
[13]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems(NIPS).2012:1097-1105.
[14]RUSSAKOVSKY O,DENG J,SU H,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115(3):211-252.
[15]GIRSHICK R.,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).New Jersey:IEEE,2014:580-587.
[16]UIJLINGSS J R R,VAJ DE S,GEVERS T,et al.Selectivesearch for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
[17]SERMANET P,EIGEN D,ZHANG X,et al.Overfeat:integra-ted recognition,localization and detection using convolutional networks[J].arXiv:1312.6229,2013.
[18]EVERINGHAM M,ESLAMI S M A,VAN G L,et al.The pas-cal visual object classes challenge:a retrospective[J].International Journal of Computer Vision,2015,111(1):98-136.
[19]GIRSHICK R B.Fast r-cnn[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision,Santiago.Wa-shington:IEEE Computer Society,2015:1440-1448.
[20]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[21]FU C Y,LIU W,RANGA A,et al.Dssd:Deconvolutional single shot detector[J].arXiv:1701.06659,2017.
[22]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New Jersey:IEEE,2016:770-778.
[23]LIU S T,HUANG D,WANG Y H.Receptive field block net for accurate and fast object detection[C]//European Conference on Computer Vision.2018:404-419.
[24]JEONG J,PARK H,KWAK N.Enhancement of ssd by concatenating feature maps for object detection[C]//British Machine Vision Conference.2017.
[25]HOLSCHNEIDER M,KRONLAND-MARTINET R,MORLET J,et al.A real-time algorithm for signal analysis with the help of the wavelet transform[M]//Wavelets.Berlin,Heidelberg:Springer,1990:286-297.
[26]SRIVSTAVA N,HINTON G,KRIZHEVSKY A,et al.Dro-pout:a simple way to prevent neural networks from verfitting[J].Journal of Machine Learning Research,2014,15(6):1929-1958.
[27]LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//2017 IEEE International Conference on Computer Vision(ICCV).New Jersey:IEEE,2017:2999-3007.
[28]ZHANG P,ZHONG Y,LI X.Slimyolov3:narrower,faster and better for real-time uav applications[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.New Jersey:IEEE,2019:37-45.
[29]CHEN K,WANG J,PANG J,et al.Mmdetection:open mmlab detection toolbox and benchmark[J].arXiv:1906.07155,2019.
[30]LI S P,LI C L,HAN J B,et al.Application of Binocular Vision Single Step Multi-target Detection Method for Robot Grasping[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2021,38(5):68-74.
[31]ZOU H H,HOU J.Research on Road Small Target Detection with Improved SSD Algorithm[J].Computer Engineering,2022,48(5):281-288.
[1] 白雪飞, 马亚楠, 王文剑.
基于特征融合的边缘引导乳腺超声图像分割方法
Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion
计算机科学, 2023, 50(3): 199-207. https://doi.org/10.11896/jsjkx.211200294
[2] 刘航, 普园媛, 吕大华, 赵征鹏, 徐丹, 钱文华.
极化自注意力约束颜色溢出的图像自动上色
Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image
计算机科学, 2023, 50(3): 208-215. https://doi.org/10.11896/jsjkx.220100149
[3] 陈亮, 王璐, 李生春, 刘昌宏.
基于深度学习的可视化仪表板生成技术研究
Study on Visual Dashboard Generation Technology Based on Deep Learning
计算机科学, 2023, 50(3): 238-245. https://doi.org/10.11896/jsjkx.230100064
[4] 陈真, 普园媛, 赵征鹏, 徐丹, 钱文华.
基于自适应门控信息融合的多模态情感分析
Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion
计算机科学, 2023, 50(3): 298-306. https://doi.org/10.11896/jsjkx.220100156
[5] 华杰, 刘学亮, 赵烨.
基于特征融合的小样本目标检测
Few-shot Object Detection Based on Feature Fusion
计算机科学, 2023, 50(2): 209-213. https://doi.org/10.11896/jsjkx.220500153
[6] 瞿中, 王彩云.
基于注意力机制和轻量级空洞卷积的混凝土路面裂缝检测
Crack Detection of Concrete Pavement Based on Attention Mechanism and Lightweight DilatedConvolution
计算机科学, 2023, 50(2): 231-236. https://doi.org/10.11896/jsjkx.211200290
[7] 商迪, 吕彦锋, 乔红.
受人脑中记忆机制启发的增量目标检测方法
Incremental Object Detection Inspired by Memory Mechanisms in Brain
计算机科学, 2023, 50(2): 267-274. https://doi.org/10.11896/jsjkx.220900212
[8] 蔡肖, 陈志华, 盛斌.
基于移位窗口金字塔Transformer的遥感图像目标检测
SPT:Swin Pyramid Transformer for Object Detection of Remote Sensing
计算机科学, 2023, 50(1): 105-113. https://doi.org/10.11896/jsjkx.211100208
[9] 黄泽南, 刘晓捷, 赵晨晖, 邓亚彬, 郭东辉.
类脑计算脉冲神经网络模型及其学习算法研究进展
Spiking Neural Network Model for Brain-like Computing and Progress of Its Learning Algorithm
计算机科学, 2023, 50(1): 229-242. https://doi.org/10.11896/jsjkx.220100058
[10] 荣欢, 钱敏峰, 马廷淮, 孙圣杰.
基于先验知识图谱的多代理被遮挡目标类别推理模型
Novel Class Reasoning Model Towards Covered Area in Given Image Based on InformedKnowledge Graph Reasoning and Multi-agent Collaboration
计算机科学, 2023, 50(1): 243-252. https://doi.org/10.11896/jsjkx.220700112
[11] 魏恺轩, 付莹.
基于重参数化多尺度融合网络的高效极暗光原始图像降噪
Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising
计算机科学, 2022, 49(8): 120-126. https://doi.org/10.11896/jsjkx.220200179
[12] 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉.
基于边框距离度量的增量目标检测方法
Incremental Object Detection Method Based on Border Distance Measurement
计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[13] 王灿, 刘永坚, 解庆, 马艳春.
基于软标签和样本权重优化的Anchor Free目标检测算法
Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization
计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240
[14] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[15] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!