计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 198-206.doi: 10.11896/jsjkx.210800214

• 计算机图形学&多媒体 • 上一篇    下一篇

基于改进拆分注意力网络的目标检测算法

潘毅, 王丽萍   

  1. 浙江工业大学计算机科学与技术学院 杭州 310023
  • 收稿日期:2021-08-24 修回日期:2022-03-04 出版日期:2022-10-15 发布日期:2022-10-13
  • 通讯作者: 王丽萍(wlp@zjut.edu.cn)
  • 作者简介:(735442196@qq.com)
  • 基金资助:
    浙江省重点研发计划(2018C01080)

Object Detection Algorithm Based on Improved Split-attention Network

PAN Yi, WANG Li-ping   

  1. College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2021-08-24 Revised:2022-03-04 Online:2022-10-15 Published:2022-10-13
  • About author:PAN Yi,born in 1996,postgraduate.His main research interests include object detection and multi-objective optimization.
    WANG Li-ping,born in 1964,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include computing intelligence and decision optimization.
  • Supported by:
    Key Technologies Research and Development Program of Zhejiang Province,China(2018C01080).

摘要: 当前,以卷积神经网络为基础的目标检测算法大多存在缺少对有价值的上下文信息的合理利用以及易对困难目标漏检等问题。针对这些问题,提出了一种基于改进拆分注意力网络的目标检测算法。首先,引入拆分注意力机制,将多通道结构与注意力机制相结合,提升其特征表示。然后,在网络的卷积层中使用多尺度卷积取代传统的卷积操作,增强了神经网络对尺度变化的敏感性。最后,将改进的网络应用于Faster R-CNN中,并在Pascal VOC数据集和MS COCO数据集上进行实验。所提算法在不增加超参数量及计算复杂度的情况下,其mAP相较于原始算法分别提升了1.6%和2.4%,且对比其他算法也有所优势,验证了所提算法的良好性能。

关键词: 卷积神经网络, 上下文信息, 目标检测, 拆分注意力, 多尺度卷积

Abstract: Recently,most object detection algorithms based on convolutional neural network have the problems of lacking of reasonable use of meaningful contextual information and are easy to miss the detection of hard targets.In order to solve these problems,this paper proposes an object detection algorithm based on improved split-attention networks.Firstly,the split attention mechanism is introduced,and the multi-path structure is combined with feature-map attention mechanism to improve its feature representations.Then,in the convolution layer,poly-scale convolution is used to replace the vanilla convolution to enhance the scale-sensitivity of the neural network.Finally,the proposed algorithm is applied to Faster R-CNN.Experiments are carried out on Pascal VOC and MS COCO datasets.Compared with the original algorithm,the mAP of the proposed algorithm has improved 1.6% and 2.4% respectively without introducing additional parameters and computational complexities,and the mAP of the proposed algorithm is also higher than that of other algorithms,which verifies its good performance.

Key words: Convolutional neural network, Contextual information, Object detection, Split-attention, Poly-scale convolution

中图分类号: 

  • TP391
[1]CHEN L,MA N,PANG G L,et al.Research on multi-view datafusion and balanced YOLOv3 for pedestrian detection[J].CAAI Transactions on Intelligent Systems,2021,16(1):57-65.
[2]YUAN Z H,SUN Q,LI G X,et al.Automatic Driving TargetDetection Based on Yolov3[J].Journal of Chongqing University of Technology(Natural Science),2020,34(9):56-61.
[3]HE Z H,HUANG S,RAN G,et al.An Improved Visual Back-ground Extractor Model for Moving Objects Detection Algorithm[J].Journal of Chinese Mini-Micro Computer Systems,2015,36(11):2559-2562.
[4]HE K,GKIOXARI G,DOLLÁRP,et al.Mask r-cnn[C] //Proceedings of the IEEE International Conference on Computer Vision.Venice,2017:2961-2969.
[5]LI J W,ZHOU X L,CHAN S X,et al.A Novel Video Target Tracking Method Based on Adaptive Convolutional Neural[J].Journal of Computer-Aided Design & Computer Graphics,2018,30(2):273-281.
[6]ZOU Z,SHI Z,GUO Y,et al.Object detection in 20 years:Asurvey[J].arXiv:1905.05055,2019.
[7]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.Las Vegas,2016:779-788.
[8]LIN T Y,GOYAL P,GIRSHICKR,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,2017:2980-2988.
[9]REN S,HE K,GIRSHICKR,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[10]DAI J,LI Y,HE K,et al.R-fcn:Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems.Barcelona,2016:379-387.
[11]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,2016:770-778.
[12]ZHANG H,WU C,ZHANG Z,et al.Resnest:Split-attentionnetworks[J].arXiv:2004.08955,2020.
[13]LONG J,SHELHAMER E,DARRELLT.Fully Convolutional Networks for Semantic Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.
[14]LI D,YAO A,CHEN Q.PSConv:Squeezing feature pyramid into one compact poly-scale convolutional layer[C]//Computer Vision-ECCV 2020.Glasgow,2020:615-632.
[15]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//Computer Vision-ECCV 2014.Zu-rich,2014:740-755.
[16]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,25:1097-1105.
[17]SIMONYAN K,ZISSERMANA.Very deep convolutional net-works for large-scale image recognition[J].arXiv:1409.1556,2014.
[18]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,2015:1-9.
[19]SZEGEDY C,VANHOUCKE V,IOFFES,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,2016:2818-2826.
[20]GIRSHICK R,DONAHUE J,DARRELLT,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Columbus,2014:580-587.
[21]GIRSHICK R.Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago,2015:1440-1448.
[22]REN S,HE K,GIRSHICKR,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[23]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,2017:4700-4708.
[24]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,2018:7132-7141.
[25]XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residualtransformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Ho-nolulu,2017:1492-1500.
[26]LI X,WANG W,HU X,et al.Selective kernel networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,2019:510-519.
[27]LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,2017:2117-2125.
[28]LIU W,ANGUELOV D,ERHAND,et al.Ssd:Single shotmultibox detector[C]//Computer Vision-ECCV 2016.Amsterdam.2016:21-37.
[29]SUN S,PANG J,SHI J,et al.Fishnet:A versatile backbone for image,region,and pixel level prediction[J].arXiv:1901.03495,2019.
[30]CHEN C F,FAN Q,MALLINAR N,et al.Big-little net:An efficient multi-scale feature representation for visual and speech recognition[J].arXiv:1807.03848,2018.
[31]LI Y,CHEN Y,WANG N,et al.Scale-aware trident networks for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Long Beach,2019:6054-6063.
[32]DAI J,QI H,XIONG Y,et al.Deformable convolutional net-works[C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,2017:764-773.
[33]TAN M,LE Q V.Mixconv:Mixed depthwise convolutional kernels[J].arXiv:1907.09595,2019.
[34]CAI Z,VASCONCELOS N.Cascade R-CNN:High Quality Object Detection and Instance Segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,43(5):1483-1498.
[35]DUAN K,BAI S,XIE L,et al.Centernet:Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Long Beach,2019:6569-6578.
[36]REDMON J,FARHADI A.Yolov3:An incremental improve-ment[J].arXiv:1804.02767,2018.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉.
基于边框距离度量的增量目标检测方法
Incremental Object Detection Method Based on Border Distance Measurement
计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[6] 王灿, 刘永坚, 解庆, 马艳春.
基于软标签和样本权重优化的Anchor Free目标检测算法
Anchor Free Object Detection Algorithm Based on Soft Label and Sample Weight Optimization
计算机科学, 2022, 49(8): 157-164. https://doi.org/10.11896/jsjkx.210600240
[7] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[8] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[9] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[10] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[11] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[12] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[13] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[14] 祝文韬, 兰先超, 罗唤霖, 岳彬, 汪洋.
改进Faster R-CNN的光学遥感飞机目标检测
Remote Sensing Aircraft Target Detection Based on Improved Faster R-CNN
计算机科学, 2022, 49(6A): 378-383. https://doi.org/10.11896/jsjkx.210300121
[15] 马宾, 付永康, 王春鹏, 李健, 王玉立.
基于GDIoU损失函数的YOLOv4绝缘子高效定位算法
High Performance Insulators Location Scheme Based on YOLOv4 with GDIoU Loss Function
计算机科学, 2022, 49(6A): 412-417. https://doi.org/10.11896/jsjkx.210600089
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!