Computer Science ›› 2022, Vol. 49 ›› Issue (3): 211-217.doi: 10.11896/jsjkx.201200019

• Computer Graphics & Multimedia • Previous Articles     Next Articles

SSD Network Based on Improved Convolutional Attention Module and Residual Structure

ZHANG Lyu, ZHOU Bo-wen, WU Liang-hong   

  1. School of Information and Electrical Engineering,Hunan University of Science and Technology,Xiangtan,Hunan 411100,China
  • Received:2020-12-02 Revised:2021-05-28 Online:2022-03-15 Published:2022-03-15
  • About author:ZHANG Lyu,born in 1996,postgra-duate.His main research interests include computer vision,deeping lear-ning,image processing,etc.
    WU Liang-hong,born in 1977,Ph.D.His main research interests include intelligent computation,evolutionary computation,computer vision,etc.
  • Supported by:
    National Natural Science Foundation of China(61603132,61672226),Natural Science Foundation of Hunan Province,China(2018JJ2137,2020JJ5170),Hunan Province Science and Technology Innovation Plan Project(2017XK2302) and General Project of Hunan Education Department(18C0299).

Abstract: SSD(single shot multibox detector) is a single-order detection algorithm based on convolution neural network.Compared with the two-stage detection algorithm,it can not meet the requirements of many practical applications,especially in the small target detection task.In order to solve this problem,this paper proposes a feature extraction network Res-Am CNN based on improved residual structure and convolutional attention module.The feature extraction ability of the network is greatly improved,and the additive fusion with upsample (AFU) is introduced into the original SSD pyramid structure for feature fusion to enhance the representation ability of shallow features.The experimental results on PASCAL VOC data set show that compared with the original SSD network and mainstream detection network,the mean average precision (mAP) of Res-Am &AFU SSD (SSD with Res-Am CNN and AFU) network on VOC test set is 69.1%,which is ahead of one stage network in accuracy,close to two stage network,and greatly ahead of two stage network in speed.The experimental results on a small target test set show that the mAP of Res-Am&AFU SSD network is 67.2%,which is 9.4% higher than that of the original SSD,and the method is more flexible and does not need pre training.

Key words: Attention mechanism, Convolutional neural network, Residual structure, SSD network, Target detection

CLC Number: 

  • TP183
[1]VIOLA P,JONES M.Robust real-time face detection[J].International Journal of Computer Vision,2004,57(2):137-154.
[2]DALAL N,TRIGGS B.Histograms of oriented gradients forhuman detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.San Diego,2005:886-893.
[3]FELZENSZWALB P,MCALLESTER D,RAMANAN D,et al.A discriminatively trained,multiscale,deformable part model[C]//Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition.Anchorage,2008:1-8.
[4]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Compu-ter Vision and Pattern Recognition.Columbus,2014:580-587.
[5]GIRSHICK R.Fast R-CNN[C]//Proceedings of the 2015 IEEEInternational Conference on Computer Vision.Santiago,2015:1440-1448.
[6]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towardsreal-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[7]HE K,GEORGIA G,PIOTR D,et al.Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference onCompu-ter Vision.Venice,2017:2980-2988.
[8]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,2016:779-788.
[9]HEI L,JIA D.CornerNet:Detecting objects as paired keypoints[J].International Journal of Computer Vision,2020,128(2):734-750.
[10]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Computer Vision-ECCV 2016.Cham,2016:21-37.
[11]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//3rd International Conference on Learning Representations.San Diego,2015:1-14.
[12]GIMPEL K,SMITH N A.Softmax-Margin CRFs:TrainingLog-Linear Models with Cost Functions[C]//Proceedings of the North American Chapter for the Association for Computational Linguistics.Los Angeles,2010:733-736.
[13]NESTEROV Y.Smooth minimization of non-smooth functions[J].Mathematical Programming,2005,103(1):127-152.
[14]PAN M Y,SONG H H,ZHANG K H,et al.Learning Global Guided Progressive Feature Aggregation Lightweight Network for Salient Object Detection[J].Computer Science,2021,48(6):103-109.
[15]TONG Z,TANAKA G.Hybrid pooling for enhancement of ge-neralization ability in deep convolutional neural networks[J].Neurocomputing,2019,333(14):76-85.
[16]WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional BlockAttention Module[C]//Proceedings of the 2018 European Conference on Computer Vision.2018:3-19.
[17]YUAN Y,HE X G,ZHU D K,et al.Survey of Visual Image Sa-liency Detection[J].Computer Science,2020,47(7):84-91.
[18]ZENG Q G,LI X R,LIN H T.Concat Convolutional NeuralNetwork for pulsar candidate selection[J].Monthly Notices of the Royal Astronomical Society,2020,494(3):3110-3119.
[19]WANG X L,LI X.Target Tracking Algorithm Based on Correlated Filters and Convolutional Neural Network[J].Journal of Chongqing Technology and Business University (Natural Science Edition),2020,37(1):19-24.
[20]ZHANG H,WU G,LING Q.Distributed stochastic gradient descent for link prediction in signed social networks[J].EURASIP Journal on Advances in Signal Processing,2019,2019(1):1-11.
[21]ZHU Y,MA C,DU J.Rotated cascade R-CNN:A shape robust detector with coordinate regression[J].Pattern Recognition,2019,96(1):106964-106975.
[1] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[2] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[3] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[4] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[5] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[6] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[7] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[8] SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[9] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[10] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[12] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[13] XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219.
[14] PENG Shuang, WU Jiang-jiang, CHEN Hao, DU Chun, LI Jun. Satellite Onboard Observation Task Planning Based on Attention Neural Network [J]. Computer Science, 2022, 49(7): 242-247.
[15] ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!