Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230600151-6.doi: 10.11896/jsjkx.230600151

• Image Processing & Multimedia Technolog • Previous Articles     Next Articles

Object Detection with Receptive Field Expansion and Multi-branch Aggregation

QUE Yue, GAN Menghan, LIU Zhiwei   

  1. School of Information Engineering,East China Jiaotong University,Nanchang 330013,China
  • Published:2024-06-06
  • About author:QUE Yue,born in 1991,Ph.D,lecture,is a member of CCF(No.P2963M).His main research interests include compu-ter vision and deep learning.
    LIU Zhiwei,born in 1982,Ph.D,professor.His main research interests include target-aware imaging and high-perfor-mance computing.
  • Supported by:
    National Natural Science Foundation of China(62362032) and Natural Science Foundation of Jiangxi Province,China(20232BAB212011).

Abstract: Object detection aims to achieve accurate recognition and localization of objects in images and is an important research area in computer vision.Deep learning-based object detection has made great progress,but there are still shortcomings.The semantic information brought by large down-sampling coefficients is beneficial to image classification,but the down-sampling process inevitably brings information loss,resulting in insufficient model feature extraction and thus a decrease in detection accuracy.To address these problems,this paper proposes a receptive field enhancement and multi-branch aggregation network for object detection.First,the receptive field enhancement module is designed to expand the receptive field of the backbone network.This module can acquire object context cues and can alleviate the problem of object information loss during down-sampling because it does not change the feature spatial resolution.Then,in order to take full advantage of the localization of convolutional neural networks and the long-range feature-dependent property of the self-attention mechanism,the receptive field expanding composite backbone network is constructed to retain local features as well as to improve the global feature perception capability of the model.Finally,a multi-branch aggregation detection head network is proposed to form information flow between three prediction branches and fuse feature information between branches to improve the detection capability of the model.Validation experiments are carried out on MS COCO datasets,and the results show that the average accuracy of the proposed model is better than that of many mainstream object detection models.

Key words: Object detection, Self-attention mechanism, Receptive field expansion, Feature fusion, Decoupled head

CLC Number: 

  • TP391.4
[1]BOCHKOVSKIY A,WANG C,LIAO H,et al.Yolov4:Optimal Speed and Accuracy of Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Online,2021:13029-13038.
[2]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[3]WANG C,BOCHKOVSKIY A,LIAO H,et al.YOLOv7:Trai-nable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors[J].arXiv:2207.02696,2022.
[4]LIU S,HUANG D,et al.Receptive Field Block Net for Accurate and Fast Object Detection[C]//European Conference on Computer Vision.Munich,Germany,2018:385-400.
[5]LI Y,CHEN Y,WANG N,et al.Scale-Aware Trident Networksfor Object Detection[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea(South),2019:6054-6063.
[6]CHEN Q,WANG Y,YANG T,et al.You Only Look One-Level Feature[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Online,2021:13039-13048.
[7]GIRSHICK R.Fast R-CNN[C]//IEEE International Confe-rence on Computer Vision.Santiago,Chile,2015:1440-1448.
[8]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[9]REDMON J,FARHADI A,et al.Yolov3:An Incremental Improvement[J].arXiv:1804.02767,2018.
[10]SONG G,LIU Y,WANG X,et al.Revisiting The Sibling Head in Object Detector[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA,2020:11563-11572.
[11]WU Y,CHEN Y,YUAN L,et al.Rethinking Classification and Localization for Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA,2020:10186-10195.
[12]GE Z,LIU S,WANG F,et al.Yolox:Exceeding Yolo Series in 2021[J].arXiv:2107.08430,2021.
[13]RAMACHANDRAN P,ZOPH B,LE Q,et al.Searching for Activation Functions[J].arXiv:1710.05941,2017.
[14]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA,2017:1251-1258.
[15]DING X,ZHANG X,HAN J,et al.Scaling Up Your Kernels to 31x31:Revisiting Large Kernel Design in CNNs[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,2022:11963-11975.
[16]LIN T,MAIRE M,BELONGIE S,et al.Microsoft Coco:Common Objects in Context[C]//European Conference on Compu-ter Vision.Zurich,Switzerland,2014:740-755.
[17]SAMET N,HICSONMEZ S,AKBAS E,et al.HoughNet:Integrating Near and Long-Range Evidence for Bottom-Up Object Detection[C]//European Conference on Computer Vision.Glasgow,US,2020:406-423.
[18]CHEN K,WANG J,PANG J,et al.Mmdetection:Open Mmlab Detection Toolbox and Benchmark[J].arXiv:1906.07155,2019.
[19]DAI Z,CAI B,LIN Y,et al.Up-Detr:Unsupervised Pre-trainingfor Object Detection with Transformers[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1601-1610.
[20]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//European Conference on Computer Vision.Glasgow,US,2020:213-229.
[21]TIAN Z,SHEN C,CHEN H,et al.Fcos:Fully ConvolutionalOne-Stage Object Detection[C]//IEEE/CVF International Confe-rence on Computer Vision.Seoul,Korea(South),2019:9627-9636.
[22]YANG Z,LIU S,HU H,et al.Reppoints:Point Set Representation for Object Detection[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea(South),2019:9657-9666.
[23]LAW H,DENG J.Cornernet:Detecting Objects as Paired Keypoints[C]//European Conference on Computer Vision.Munich,Germany,2018:734-750.
[24]ZENG N,WU P,WANG Z,et al.A Small-Sized Object Detection Oriented Multi-Scale Feature Fusion Approach with Application to Defect Detection[J].IEEE Transactions on Instrumentation and Measurement,2022,71:1-14.
[25]YANG C,HUANG Z,WANG N,et al.QueryDet:CascadedSparse Query for Accelerating High-Resolution Small Object Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,LA,USA,2022:13668-13677.
[26]LIN T,GOYAL P,GIRSHICK R,et al.Focal Loss for DenseObject Detection[C]//IEEE International Conference on Computer Vision.Venice,Italy,2017:2980-2988.
[27]CAI Z,VASCONCELOS N.Cascade R-CNN:High Quality Object Detection and Instance Segmentation[J].IEEE Transactions Pn pattern Analysis and Machine Intelligence,2019,43(5):1483-1498.
[28]HAN K,XIAO A,WU E,et al.Transformer in Transformer[J].Advances in Neural Information Processing Systems,2021,34:15908-15919.
[29]WANG W,XIE E,LI X,et al.Pyramid Vision Transformer:A Versatile Backbone for Dense Prediction Without Convolutions[C]//IEEE/CVF International Conference on Computer Vision.Montreal,Canada,2021:568-578.
[30]WANG W,XIE E,LI X.Pvtv2:Improved Baselines with Pyra-mid Vision Transformer[J].Computational Visual Media,2022,8(3):415-424.
[1] LI Guo, CHEN Chen, YANG Jing, QUN Nuo. Study on Tibetan Short Text Classification Based on DAN and FastText [J]. Computer Science, 2024, 51(6A): 230700064-5.
[2] LIU Xiaohu, CHEN Defu, LI Jun, ZHOU Xuwen, HU Shan, ZHOU Hao. Speaker Verification Network Based on Multi-scale Convolutional Encoder [J]. Computer Science, 2024, 51(6A): 230700083-6.
[3] WANG Yanlin, SUN Jing, YANG Hongbo, GUO Tao, PAN Jiahua, WANG Weilian. Classification Model of Heart Sounds in Pulmonary Hypertension Based on Time-Frequency Fusion Features [J]. Computer Science, 2024, 51(6A): 230800091-7.
[4] ZHENG Shenhai, GAO Xi, LIU Pengwei, LI Weisheng. Occluded Video Instance Segmentation Method Based on Feature Fusion of Tracking and Detection in Time Sequence [J]. Computer Science, 2024, 51(6A): 230600186-6.
[5] LIU Hongli, WANG Yulin, SHAO Lei, LI Ji. Study on Monocular Vision Vehicle Ranging Based on Lower Edge of Detection Frame [J]. Computer Science, 2024, 51(6A): 231000077-6.
[6] CHEN Yuzhang, WANG Shiqi, ZHOU Wen, ZHOU Wanting. Small Object Detection for Fish Based on SPD-Conv and NAM Attention Module [J]. Computer Science, 2024, 51(6A): 230500176-7.
[7] ZHANG Lanxin, XIANG Ling, LI Xianze, CHEN Jinpeng. Intelligent Fault Diagnosis Method for Rolling Bearing Based on SAMNV3 [J]. Computer Science, 2024, 51(6A): 230700167-6.
[8] JIAO Ruodan, GAO Donghui, HUANG Yanhua, LIU Shuo, DUAN Xuanfei, WANG Rui, LIU Weidong. Study and Verification on Few-shot Evaluation Methods for AI-based Quality Inspection in Production Lines [J]. Computer Science, 2024, 51(6A): 230700086-8.
[9] LIU Heng, LIN Hongyu, WU Tao. Detection Method for Workers’ Illegal Operation Behavior in PackagingWorkshop of CigaretteFactory [J]. Computer Science, 2024, 51(6A): 230700123-8.
[10] KANG Zhiyong, LI Bicheng, LIN Huang. User Interest Recognition Method Incorporating Category Labels and Topic Information [J]. Computer Science, 2024, 51(6A): 230500169-8.
[11] HAN Zhigeng, ZHOU Ting, CHEN Geng, FU Chunshuo, CHEN Jian. RM-RT2NI:A Recommendation Model with Review Timeliness and Trusted Neighbor Influence [J]. Computer Science, 2024, 51(6A): 230800160-7.
[12] LI Yuehao, WANG Dengjiang, JIAN Haifang, WANG Hongchang, CHENG Qinghua. LiDAR-Radar Fusion Object Detection Algorithm Based on BEV Occupancy Prediction [J]. Computer Science, 2024, 51(6): 215-222.
[13] LIAO Junshuang, TAN Qinhong. DETR with Multi-granularity Spatial Attention and Spatial Prior Supervision [J]. Computer Science, 2024, 51(6): 239-246.
[14] GAO Nan, ZHANG Lei, LIANG Ronghua, CHEN Peng, FU Zheng. Scene Text Detection Algorithm Based on Feature Enhancement [J]. Computer Science, 2024, 51(6): 256-263.
[15] LIU Jiasen, HUANG Jun. Center Point Target Detection Algorithm Based on Improved Swin Transformer [J]. Computer Science, 2024, 51(6): 264-271.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!