感受野扩展与多分支聚合的目标检测方法

doi:10.11896/jsjkx.230600151

计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600151-6.doi: 10.11896/jsjkx.230600151

• 图像处理&多媒体技术 • 上一篇下一篇

感受野扩展与多分支聚合的目标检测方法

阙越, 甘梦晗, 刘志伟

华东交通大学信息工程学院南昌 330013

发布日期:2024-06-06
通讯作者: 刘志伟(zwliu1982@hotmail.com)
作者简介:(qki120@163.com)
基金资助:
国家自然科学基金(62362032);江西省自然科学基金(20232BAB212011)

Object Detection with Receptive Field Expansion and Multi-branch Aggregation

QUE Yue, GAN Menghan, LIU Zhiwei

School of Information Engineering,East China Jiaotong University,Nanchang 330013,China

Published:2024-06-06
About author:QUE Yue,born in 1991,Ph.D,lecture,is a member of CCF(No.P2963M).His main research interests include compu-ter vision and deep learning.
LIU Zhiwei,born in 1982,Ph.D,professor.His main research interests include target-aware imaging and high-perfor-mance computing.
Supported by:
National Natural Science Foundation of China(62362032) and Natural Science Foundation of Jiangxi Province,China(20232BAB212011).

摘要/Abstract

摘要： 目标检测旨在实现对图像中目标的精确识别和定位,是计算机视觉中一个重要的研究领域。基于深度学习的目标检测已取得长足的发展,但依然存在不足之处。大的下采样系数带来的语义信息有利于图像分类,但下采样过程中不可避免地会造成信息损失,导致模型特征提取不充分,从而检测准确性下降。针对上述问题,提出一种感受野增强与多分支聚合模型用于目标检测。首先,设计感受野增强模块,以扩大主干网络的感受野。该模块可以获取目标上下文线索,且不改变特征的空间分辨率,可以缓解下采样过程中目标信息丢失问题。然后,为了充分利用卷积神经网络的局部性以及自注意力机制的长距离特征依赖特性,构建感受野扩展复合主干网络,以保留局部特征以及提高模型的全局特征感知能力。最后,提出多分支聚合检测头网络,在3个预测分支之间形成信息流动,融合分支之间的特征信息,以提高模型检测能力。在MS COCO数据集上进行了验证实验,结果表明所提模型的平均精度优于多种主流目标检测模型。

关键词: 目标检测, 自注意力机制, 感受野扩展, 特征融合, 解耦检测头

Abstract: Object detection aims to achieve accurate recognition and localization of objects in images and is an important research area in computer vision.Deep learning-based object detection has made great progress,but there are still shortcomings.The semantic information brought by large down-sampling coefficients is beneficial to image classification,but the down-sampling process inevitably brings information loss,resulting in insufficient model feature extraction and thus a decrease in detection accuracy.To address these problems,this paper proposes a receptive field enhancement and multi-branch aggregation network for object detection.First,the receptive field enhancement module is designed to expand the receptive field of the backbone network.This module can acquire object context cues and can alleviate the problem of object information loss during down-sampling because it does not change the feature spatial resolution.Then,in order to take full advantage of the localization of convolutional neural networks and the long-range feature-dependent property of the self-attention mechanism,the receptive field expanding composite backbone network is constructed to retain local features as well as to improve the global feature perception capability of the model.Finally,a multi-branch aggregation detection head network is proposed to form information flow between three prediction branches and fuse feature information between branches to improve the detection capability of the model.Validation experiments are carried out on MS COCO datasets,and the results show that the average accuracy of the proposed model is better than that of many mainstream object detection models.

Key words: Object detection, Self-attention mechanism, Receptive field expansion, Feature fusion, Decoupled head

中图分类号:

TP391.4

阙越, 甘梦晗, 刘志伟. 感受野扩展与多分支聚合的目标检测方法[J]. 计算机科学, 2024, 51(6A): 230600151-6. https://doi.org/10.11896/jsjkx.230600151

QUE Yue, GAN Menghan, LIU Zhiwei. Object Detection with Receptive Field Expansion and Multi-branch Aggregation[J]. Computer Science, 2024, 51(6A): 230600151-6. https://doi.org/10.11896/jsjkx.230600151

参考文献

[1]BOCHKOVSKIY A,WANG C,LIAO H,et al.Yolov4:Optimal Speed and Accuracy of Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Online,2021:13029-13038.
[2]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[3]WANG C,BOCHKOVSKIY A,LIAO H,et al.YOLOv7:Trai-nable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors[J].arXiv:2207.02696,2022.
[4]LIU S,HUANG D,et al.Receptive Field Block Net for Accurate and Fast Object Detection[C]//European Conference on Computer Vision.Munich,Germany,2018:385-400.
[5]LI Y,CHEN Y,WANG N,et al.Scale-Aware Trident Networksfor Object Detection[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea(South),2019:6054-6063.
[6]CHEN Q,WANG Y,YANG T,et al.You Only Look One-Level Feature[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Online,2021:13039-13048.
[7]GIRSHICK R.Fast R-CNN[C]//IEEE International Confe-rence on Computer Vision.Santiago,Chile,2015:1440-1448.
[8]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[9]REDMON J,FARHADI A,et al.Yolov3:An Incremental Improvement[J].arXiv:1804.02767,2018.
[10]SONG G,LIU Y,WANG X,et al.Revisiting The Sibling Head in Object Detector[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA,2020:11563-11572.
[11]WU Y,CHEN Y,YUAN L,et al.Rethinking Classification and Localization for Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA,2020:10186-10195.
[12]GE Z,LIU S,WANG F,et al.Yolox:Exceeding Yolo Series in 2021[J].arXiv:2107.08430,2021.
[13]RAMACHANDRAN P,ZOPH B,LE Q,et al.Searching for Activation Functions[J].arXiv:1710.05941,2017.
[14]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA,2017:1251-1258.
[15]DING X,ZHANG X,HAN J,et al.Scaling Up Your Kernels to 31x31:Revisiting Large Kernel Design in CNNs[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,2022:11963-11975.
[16]LIN T,MAIRE M,BELONGIE S,et al.Microsoft Coco:Common Objects in Context[C]//European Conference on Compu-ter Vision.Zurich,Switzerland,2014:740-755.
[17]SAMET N,HICSONMEZ S,AKBAS E,et al.HoughNet:Integrating Near and Long-Range Evidence for Bottom-Up Object Detection[C]//European Conference on Computer Vision.Glasgow,US,2020:406-423.
[18]CHEN K,WANG J,PANG J,et al.Mmdetection:Open Mmlab Detection Toolbox and Benchmark[J].arXiv:1906.07155,2019.
[19]DAI Z,CAI B,LIN Y,et al.Up-Detr:Unsupervised Pre-trainingfor Object Detection with Transformers[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1601-1610.
[20]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//European Conference on Computer Vision.Glasgow,US,2020:213-229.
[21]TIAN Z,SHEN C,CHEN H,et al.Fcos:Fully ConvolutionalOne-Stage Object Detection[C]//IEEE/CVF International Confe-rence on Computer Vision.Seoul,Korea(South),2019:9627-9636.
[22]YANG Z,LIU S,HU H,et al.Reppoints:Point Set Representation for Object Detection[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea(South),2019:9657-9666.
[23]LAW H,DENG J.Cornernet:Detecting Objects as Paired Keypoints[C]//European Conference on Computer Vision.Munich,Germany,2018:734-750.
[24]ZENG N,WU P,WANG Z,et al.A Small-Sized Object Detection Oriented Multi-Scale Feature Fusion Approach with Application to Defect Detection[J].IEEE Transactions on Instrumentation and Measurement,2022,71:1-14.
[25]YANG C,HUANG Z,WANG N,et al.QueryDet:CascadedSparse Query for Accelerating High-Resolution Small Object Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,LA,USA,2022:13668-13677.
[26]LIN T,GOYAL P,GIRSHICK R,et al.Focal Loss for DenseObject Detection[C]//IEEE International Conference on Computer Vision.Venice,Italy,2017:2980-2988.
[27]CAI Z,VASCONCELOS N.Cascade R-CNN:High Quality Object Detection and Instance Segmentation[J].IEEE Transactions Pn pattern Analysis and Machine Intelligence,2019,43(5):1483-1498.
[28]HAN K,XIAO A,WU E,et al.Transformer in Transformer[J].Advances in Neural Information Processing Systems,2021,34:15908-15919.
[29]WANG W,XIE E,LI X,et al.Pyramid Vision Transformer:A Versatile Backbone for Dense Prediction Without Convolutions[C]//IEEE/CVF International Conference on Computer Vision.Montreal,Canada,2021:568-578.
[30]WANG W,XIE E,LI X.Pvtv2:Improved Baselines with Pyra-mid Vision Transformer[J].Computational Visual Media,2022,8(3):415-424.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

感受野扩展与多分支聚合的目标检测方法

Object Detection with Receptive Field Expansion and Multi-branch Aggregation

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0