计算机科学 ›› 2024, Vol. 51 ›› Issue (9): 162-172.doi: 10.11896/jsjkx.230700106

• 计算机图形学&多媒体 • 上一篇    下一篇

重参数化增强的双模态实时目标检测模型

李允臣, 张睿, 王家宝, 李阳, 王梓祺, 陈瑶   

  1. 陆军工程大学指挥控制工程学院 南京 210007
  • 收稿日期:2023-07-17 修回日期:2023-11-06 出版日期:2024-09-15 发布日期:2024-09-10
  • 通讯作者: 张睿(Lydiazhang09@163.com)
  • 作者简介:(liyunchen1012@163.com)
  • 基金资助:
    江苏省高校自然科学研究基金(BK20200581)

Re-parameterization Enhanced Dual-modal Realtime Object Detection Model

LI Yunchen, ZHANG Rui, WANG Jiabao, LI Yang, WANG Ziqi, CHEN Yao   

  1. College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China
  • Received:2023-07-17 Revised:2023-11-06 Online:2024-09-15 Published:2024-09-10
  • About author:LI Yunchen,born in 1987,postgra-duate.His main research interest is object detection.
    ZHANG Rui,born in 1977,Ph.D,professor,Ph.D supervisor.His main research interests include data enginee-ring and information fusion.
  • Supported by:
    Natural Science Foundation of the Higher Education Institutions of Jiangsu Province,China(BK20200581).

摘要: 无人机高空航拍的目标普遍尺寸小、特征弱,而且受复杂天候条件影响大,导致基于可见光或红外单模态图像的目标检测漏检、误检率较高。对此,提出了重参数化增强的双模态实时目标检测模型DM-YOLO。首先,采用通道拼接的方法融合可见光和红外图像,以极低的成本融合双模态图像的互补信息。其次,提出更加高效的重参数化模块并基于此构建了更加强大的骨干网RepCSPDarkNet,有效增强了骨干网对双模态图像的特征提取能力。然后,提出了多层次特征融合模块,通过多感受野卷积和注意力机制融合弱小目标的多尺度特征信息,增强了弱小目标的多尺度特征表示。最后,删除了对弱小目标检测基本不起作用的特征金字塔深层检测层,在检测精度保持不变的情况下,减小了模型规模。实验结果表明,在大规模的双模态图像数据集DroneVehicle上,DM-YOLO的检测精度比基准YOLOv5s高出2.45%,且优于规模相当的YOLOv6和YOLOv7模型,有效提高了复杂光照条件下目标检测的准确性和鲁棒性,同时检测速度达到82FPS,可满足实时检测的需求。

关键词: 重参数化, 双模态, 实时目标检测, 多尺度特征, 注意力机制

Abstract: The objects captured by drones at high altitudes are generally small and have weak features,and they are greatly affec-ted by complex weather conditions.Object detection based on visible or infrared images often has high rates of missed detection and false detection.To address this problem,this paper proposes a dual-modal realtime object detection model DM-YOLO with reparameterization enhancement.Firstly,the visible and infrared images are effectively fused by channel concatenation,which makes efficient use of the complementary information in the dual-modal images at a very low cost.Secondly,a more efficient reparameterization module is proposed and a more powerful backbone network RepCSPDarkNet is constructed based on it,which effectively improves the feature extraction capability of the backbone network for dual-modal images.Then,a multi-level feature fusion module is proposed to enhance the multiscale feature representation of weak and small objects by fusing multi-scale feature information of weak and small objects with multi-receptive field dilated convolution and attention mechanism.Finally,the deep feature layer of the feature pyramid is removed,which reduces the model size while maintaining the detection accuracy.Experimental results on the large-scale dual-modal image dataset DroneVehicle show that,the detection accuracy of DM-YOLO is 2.45% higher than that of the baseline YOLOv5s,and is better than that of the YOLOv6 and YOLOv7 models.Furthermore,it effectively improves the accuracy and robustness of object detection under complex weather conditions,while achieving a detection speed of 82 frames per second,which can meet the requirements of realtime detection.

Key words: Reparameterization, Dual modality, Real-time object detection, Multiscale features, Attention mechanism

中图分类号: 

  • TP391
[1]NIU W H,YIN M M.Road Small Target Detection Algorithm Based on Improved YOLOv5[J].Chinese Journal of Sensors and Actuators,2023,36(1):36-44.
[2]XIE P X,CUI J R,ZHAO M.Electiric Bike Helment Wearing Detection Alogrithm Based on Improved YOLOv5[J].Computer Science,2023,50(S1):420-425.
[3]YANG Y H,ZHONG B J,TIAN H W.Target Detection Model of DS-yolov4-Tiny Rescue Robot[J].Computer Simulation,2022,39(1):387-393.
[4]GIRSHICK R B,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:580-587.
[5]GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE In-ternational Conference on Computer Vision.Santiago:IEEE.2015:1440-1448.
[6]REN S Q,HE K M,GIRSHICK R B,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//Conference and Workshop on Neural Information Processing Systems.Montreal:MIT Press.2015:91-99.
[7]LIN T Y,DOLLÁR P,GIRSHICK R B,et al.Feature Pyramid Networks for Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:936-944.
[8]CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving IntoHigh Quality Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:6154-6162.
[9]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:779-788.
[10]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:6517-6525.
[11]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[J].arXiv:1804.02767,2018.
[12]BOCHKOVSKIY A,WANG C Y,LIAO H Y.YOLOv4:Optimal Speed and Accuracy of Object Detection[J].arXiv:2004.10934,2020.
[13]ULTRALYTICS.YOLOv5[EB/OL].https://github.com/ul-tralytics/yolov5.
[14]LI C,LI L L,JIANG H L,et al.YOLOv6:A Single-Stage Object Detection Framework for Industrial Applications[J].arXiv:2209.02976,2022.
[15]WANG C Y,ALEXEY B,MARK L,et al.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[16]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Proceedings of the European Confe-rence on Computer Vision.Amsterdam:Springer.2016:21-37.
[17]FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional Single Shot Detector[J].arXiv:1701.06659,2017.
[18]WU Z,MIAO X D,LI W W,et al.Low-Visibility Road Target Detection Algorithm Based on Infrared and Visible Light Fusion[J].Infrared Technology,2022,44(11):1154-1160.
[19]LIU J Y,FAN X,HUANG Z B,et al.Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition,2022:5792-5801.
[20]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Proceedings of the International Conference on Neural Information Processing Systems.Mon-treal,2014:2672-2680.
[21]GENG K K,ZOU W,YIN G D,et al.Low-observable targetsdetection for autonomous vehicles based on dual-modal sensor fusion with deep learning approach[J].Journal of Automobile Engineering,2019,233(9):2270-2283.
[22]ZHOU H,SUN M,REN X,et al.Visible-Thermal Image Object Detection via the Combination of Illumination Conditions and Temperature Information[J].Remote Sensing,2021,13(18):36-56.
[23]CHEN Y T,SHI J G,YE Z L,et al.Multimodal Object Detection via Probabilistic Ensembling[C]//Proceedings of the European Conference on Computer Vision.2022(9):139-158.
[24]SUN Y M,CAO B,ZHU P F,et al.Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Lear-ning[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32:6700-6713.
[25]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[26]DING X H,ZHANG X Y,MA N N,et al.RepVGG:MakingVGG-Style ConvNets Great Again[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:13733-13742.
[27]DING X H,ZHANG X Y,HAN J G,et al.Diverse BranchBlock:Building a Convolution as an Inception-Like Unit[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:10886-10895.
[28]KUMAR P,GABRIEL J,ZHU J,et al.MobileOne:An Im-proved One millisecond Mobile Backbone[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2023:7907-7917.
[29]SANDLER M,HOWARD A,ZHU M L,et al.Mobilenetv2:Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4510-4520.
[30]HOWARD A,SANDLER M,CHU G,et al.Searching for MobileNetV3[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision.2019:1314-1324.
[31]MA N N,ZHANG X Y,ZHENG H T,et al.ShuffleNet V2:Practical Guidelines for Efficient CNN Architecture Design[C]//Proceedings of the European Conference on Computer Vision.2018(14):122-138.
[32]HAN K,WANG Y H,TIAN Q,et al.GhostNet:More FeaturesFrom Cheap Operations[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition.2020:1577-1586.
[33]HAN K,WANG Y H,XU C,et al.GhostNets on Heteroge-neous Devices via Cheap Operations[J].International Journal of Computer Vision,2022,130:1050-1069.
[34]CHEN C P,GUO Z C,ZENG H E,et al.RepGhost:A Hardware-Efficient Ghost Module via Re-parameterization[J].arXiv:2211.06088,2022.
[35]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE.2018:7132-7141.
[36]WOO S,PARK J,LEE J Y,et al.CBAM:convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision.Munich:Springer.2018,11211:3-19.
[37]HOU Q B,ZHOU D Q,FENG J S.Coordinate Attention for Ef-ficient Mobile Network Design[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Vir-tual:IEEE.2021:13713-13722.
[38]ZHANG H,ZU K K,LU J,et al.EPSANet:An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network[C]//Proceedings of the Asian Conference on Computer Vision.2022:541-557.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!