计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600151-6.doi: 10.11896/jsjkx.230600151
阙越, 甘梦晗, 刘志伟
QUE Yue, GAN Menghan, LIU Zhiwei
摘要: 目标检测旨在实现对图像中目标的精确识别和定位,是计算机视觉中一个重要的研究领域。基于深度学习的目标检测已取得长足的发展,但依然存在不足之处。大的下采样系数带来的语义信息有利于图像分类,但下采样过程中不可避免地会造成信息损失,导致模型特征提取不充分,从而检测准确性下降。针对上述问题,提出一种感受野增强与多分支聚合模型用于目标检测。首先,设计感受野增强模块,以扩大主干网络的感受野。该模块可以获取目标上下文线索,且不改变特征的空间分辨率,可以缓解下采样过程中目标信息丢失问题。然后,为了充分利用卷积神经网络的局部性以及自注意力机制的长距离特征依赖特性,构建感受野扩展复合主干网络,以保留局部特征以及提高模型的全局特征感知能力。最后,提出多分支聚合检测头网络,在3个预测分支之间形成信息流动,融合分支之间的特征信息,以提高模型检测能力。在MS COCO数据集上进行了验证实验,结果表明所提模型的平均精度优于多种主流目标检测模型。
中图分类号:
[1]BOCHKOVSKIY A,WANG C,LIAO H,et al.Yolov4:Optimal Speed and Accuracy of Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Online,2021:13029-13038. [2]HE K,ZHANG X,REN S,et al.Spatial Pyramid Pooling inDeep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916. [3]WANG C,BOCHKOVSKIY A,LIAO H,et al.YOLOv7:Trai-nable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors[J].arXiv:2207.02696,2022. [4]LIU S,HUANG D,et al.Receptive Field Block Net for Accurate and Fast Object Detection[C]//European Conference on Computer Vision.Munich,Germany,2018:385-400. [5]LI Y,CHEN Y,WANG N,et al.Scale-Aware Trident Networksfor Object Detection[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea(South),2019:6054-6063. [6]CHEN Q,WANG Y,YANG T,et al.You Only Look One-Level Feature[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Online,2021:13039-13048. [7]GIRSHICK R.Fast R-CNN[C]//IEEE International Confe-rence on Computer Vision.Santiago,Chile,2015:1440-1448. [8]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [9]REDMON J,FARHADI A,et al.Yolov3:An Incremental Improvement[J].arXiv:1804.02767,2018. [10]SONG G,LIU Y,WANG X,et al.Revisiting The Sibling Head in Object Detector[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA,2020:11563-11572. [11]WU Y,CHEN Y,YUAN L,et al.Rethinking Classification and Localization for Object Detection[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA,2020:10186-10195. [12]GE Z,LIU S,WANG F,et al.Yolox:Exceeding Yolo Series in 2021[J].arXiv:2107.08430,2021. [13]RAMACHANDRAN P,ZOPH B,LE Q,et al.Searching for Activation Functions[J].arXiv:1710.05941,2017. [14]CHOLLET F.Xception:Deep Learning with Depthwise Separable Convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA,2017:1251-1258. [15]DING X,ZHANG X,HAN J,et al.Scaling Up Your Kernels to 31x31:Revisiting Large Kernel Design in CNNs[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,Louisiana,2022:11963-11975. [16]LIN T,MAIRE M,BELONGIE S,et al.Microsoft Coco:Common Objects in Context[C]//European Conference on Compu-ter Vision.Zurich,Switzerland,2014:740-755. [17]SAMET N,HICSONMEZ S,AKBAS E,et al.HoughNet:Integrating Near and Long-Range Evidence for Bottom-Up Object Detection[C]//European Conference on Computer Vision.Glasgow,US,2020:406-423. [18]CHEN K,WANG J,PANG J,et al.Mmdetection:Open Mmlab Detection Toolbox and Benchmark[J].arXiv:1906.07155,2019. [19]DAI Z,CAI B,LIN Y,et al.Up-Detr:Unsupervised Pre-trainingfor Object Detection with Transformers[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1601-1610. [20]CARION N,MASSA F,SYNNAEVE G,et al.End-to-End Object Detection with Transformers[C]//European Conference on Computer Vision.Glasgow,US,2020:213-229. [21]TIAN Z,SHEN C,CHEN H,et al.Fcos:Fully ConvolutionalOne-Stage Object Detection[C]//IEEE/CVF International Confe-rence on Computer Vision.Seoul,Korea(South),2019:9627-9636. [22]YANG Z,LIU S,HU H,et al.Reppoints:Point Set Representation for Object Detection[C]//IEEE/CVF International Conference on Computer Vision.Seoul,Korea(South),2019:9657-9666. [23]LAW H,DENG J.Cornernet:Detecting Objects as Paired Keypoints[C]//European Conference on Computer Vision.Munich,Germany,2018:734-750. [24]ZENG N,WU P,WANG Z,et al.A Small-Sized Object Detection Oriented Multi-Scale Feature Fusion Approach with Application to Defect Detection[J].IEEE Transactions on Instrumentation and Measurement,2022,71:1-14. [25]YANG C,HUANG Z,WANG N,et al.QueryDet:CascadedSparse Query for Accelerating High-Resolution Small Object Detection[C]//IEEE Conference on Computer Vision and Pattern Recognition.New Orleans,LA,USA,2022:13668-13677. [26]LIN T,GOYAL P,GIRSHICK R,et al.Focal Loss for DenseObject Detection[C]//IEEE International Conference on Computer Vision.Venice,Italy,2017:2980-2988. [27]CAI Z,VASCONCELOS N.Cascade R-CNN:High Quality Object Detection and Instance Segmentation[J].IEEE Transactions Pn pattern Analysis and Machine Intelligence,2019,43(5):1483-1498. [28]HAN K,XIAO A,WU E,et al.Transformer in Transformer[J].Advances in Neural Information Processing Systems,2021,34:15908-15919. [29]WANG W,XIE E,LI X,et al.Pyramid Vision Transformer:A Versatile Backbone for Dense Prediction Without Convolutions[C]//IEEE/CVF International Conference on Computer Vision.Montreal,Canada,2021:568-578. [30]WANG W,XIE E,LI X.Pvtv2:Improved Baselines with Pyra-mid Vision Transformer[J].Computational Visual Media,2022,8(3):415-424. |
|