计算机科学 ›› 2022, Vol. 49 ›› Issue (3): 211-217.doi: 10.11896/jsjkx.201200019

• 计算机图形学&多媒体 • 上一篇    下一篇

基于改进卷积注意力模块与残差结构的SSD网络

张侣, 周博文, 吴亮红   

  1. 湖南科技大学信息与电气工程学院 湖南 湘潭411100
  • 收稿日期:2020-12-02 修回日期:2021-05-28 出版日期:2022-03-15 发布日期:2022-03-15
  • 通讯作者: 吴亮红(lhwu@hnust.edu.cn)
  • 作者简介:(1336039926@qq.com)
  • 基金资助:
    国家自然科学基金(61603132,61672226);湖南省自然科学基金(2018JJ2137,2020JJ5170);湖南省科技创新计划项目(2017XK2302);湖南省教育厅一般项目(18C0299)

SSD Network Based on Improved Convolutional Attention Module and Residual Structure

ZHANG Lyu, ZHOU Bo-wen, WU Liang-hong   

  1. School of Information and Electrical Engineering,Hunan University of Science and Technology,Xiangtan,Hunan 411100,China
  • Received:2020-12-02 Revised:2021-05-28 Online:2022-03-15 Published:2022-03-15
  • About author:ZHANG Lyu,born in 1996,postgra-duate.His main research interests include computer vision,deeping lear-ning,image processing,etc.
    WU Liang-hong,born in 1977,Ph.D.His main research interests include intelligent computation,evolutionary computation,computer vision,etc.
  • Supported by:
    National Natural Science Foundation of China(61603132,61672226),Natural Science Foundation of Hunan Province,China(2018JJ2137,2020JJ5170),Hunan Province Science and Technology Innovation Plan Project(2017XK2302) and General Project of Hunan Education Department(18C0299).

摘要: SSD(Single Shot Multibox Detector) 是一种基于卷积神经网络的单阶检测算法,相比双阶检测算法,它在保证一定精度的同时显著提高了检测速度,但仍难以满足很多实际应用,尤其是在小目标检测任务中,检测精度更是难以满足需求。针对该不足,文中提出了一种基于改进残差结构与卷积注意力模块的特征提取网络Res-Am CNN (Residual with Attention Module Convolutional Neural Networks),大幅提高了网络的特征提取能力,并在原始SSD金字塔结构中引入上采样加法融合 (Additive Fusion with Upsample,AFU) 来进行特征融合,增强了浅层特征的表征能力。在 PASCAL VOC数据集上的实验结果表明,相比原始SSD网络和主流的检测网络,Res-Am&AFU SSD (SSD with Res-Am CNN and AFU) 网络在VOC测试集上的平均精度均值(mean Average Precision,mAP) 达到69.1%,在精度上领先单阶网络,接近双阶网络,在检测速度上远快于双阶网络。在小目标测试集上的实验结果表明,Res-Am&AFU SSD网络的mAP为67.2%,比原始SSD提高了9.4%,且该方法具有更加灵活、无需预训练等优点。

关键词: SSD网络, 残差结构, 卷积神经网络, 目标检测, 注意力机制

Abstract: SSD(single shot multibox detector) is a single-order detection algorithm based on convolution neural network.Compared with the two-stage detection algorithm,it can not meet the requirements of many practical applications,especially in the small target detection task.In order to solve this problem,this paper proposes a feature extraction network Res-Am CNN based on improved residual structure and convolutional attention module.The feature extraction ability of the network is greatly improved,and the additive fusion with upsample (AFU) is introduced into the original SSD pyramid structure for feature fusion to enhance the representation ability of shallow features.The experimental results on PASCAL VOC data set show that compared with the original SSD network and mainstream detection network,the mean average precision (mAP) of Res-Am &AFU SSD (SSD with Res-Am CNN and AFU) network on VOC test set is 69.1%,which is ahead of one stage network in accuracy,close to two stage network,and greatly ahead of two stage network in speed.The experimental results on a small target test set show that the mAP of Res-Am&AFU SSD network is 67.2%,which is 9.4% higher than that of the original SSD,and the method is more flexible and does not need pre training.

Key words: Attention mechanism, Convolutional neural network, Residual structure, SSD network, Target detection

中图分类号: 

  • TP183
[1]VIOLA P,JONES M.Robust real-time face detection[J].International Journal of Computer Vision,2004,57(2):137-154.
[2]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.San Diego,2005:886-893.
[3]FELZENSZWALB P,MCALLESTER D,RAMANAN D,et al.A discriminatively trained,multiscale,deformable part model[C]//Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition.Anchorage,2008:1-8.
[4]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Compu-ter Vision and Pattern Recognition.Columbus,2014:580-587.
[5]GIRSHICK R.Fast R-CNN[C]//Proceedings of the 2015 IEEEInternational Conference on Computer Vision.Santiago,2015:1440-1448.
[6]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towardsreal-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[7]HE K,GEORGIA G,PIOTR D,et al.Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference onCompu-ter Vision.Venice,2017:2980-2988.
[8]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,2016:779-788.
[9]HEI L,JIA D.CornerNet:Detecting objects as paired keypoints[J].International Journal of Computer Vision,2020,128(2):734-750.
[10]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Computer Vision-ECCV 2016.Cham,2016:21-37.
[11]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//3rd International Conference on Learning Representations.San Diego,2015:1-14.
[12]GIMPEL K,SMITH N A.Softmax-Margin CRFs:TrainingLog-Linear Models with Cost Functions[C]//Proceedings of the North American Chapter for the Association for Computational Linguistics.Los Angeles,2010:733-736.
[13]NESTEROV Y.Smooth minimization of non-smooth functions[J].Mathematical Programming,2005,103(1):127-152.
[14]PAN M Y,SONG H H,ZHANG K H,et al.Learning Global Guided Progressive Feature Aggregation Lightweight Network for Salient Object Detection[J].Computer Science,2021,48(6):103-109.
[15]TONG Z,TANAKA G.Hybrid pooling for enhancement of ge-neralization ability in deep convolutional neural networks[J].Neurocomputing,2019,333(14):76-85.
[16]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional BlockAttention Module[C]//Proceedings of the 2018 European Conference on Computer Vision.2018:3-19.
[17]YUAN Y,HE X G,ZHU D K,et al.Survey of Visual Image Sa-liency Detection[J].Computer Science,2020,47(7):84-91.
[18]ZENG Q G,LI X R,LIN H T.Concat Convolutional NeuralNetwork for pulsar candidate selection[J].Monthly Notices of the Royal Astronomical Society,2020,494(3):3110-3119.
[19]WANG X L,LI X.Target Tracking Algorithm Based on Correlated Filters and Convolutional Neural Network[J].Journal of Chongqing Technology and Business University (Natural Science Edition),2020,37(1):19-24.
[20]ZHANG H,WU G,LING Q.Distributed stochastic gradient descent for link prediction in signed social networks[J].EURASIP Journal on Advances in Signal Processing,2019,2019(1):1-11.
[21]ZHU Y,MA C,DU J.Rotated cascade R-CNN:A shape robust detector with coordinate regression[J].Pattern Recognition,2019,96(1):106964-106975.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[3] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[6] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[7] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[8] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[10] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11] 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉.
基于边框距离度量的增量目标检测方法
Incremental Object Detection Method Based on Border Distance Measurement
计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[12] 王灿, 刘永坚, 解庆, 马艳春.
基于软标签和样本权重优化的Anchor Free目标检测算法
Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization
计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240
[13] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[14] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[15] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!