Computer Science ›› 2024, Vol. 51 ›› Issue (9): 162-172.doi: 10.11896/jsjkx.230700106

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Re-parameterization Enhanced Dual-modal Realtime Object Detection Model

LI Yunchen, ZHANG Rui, WANG Jiabao, LI Yang, WANG Ziqi, CHEN Yao   

  1. College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China
  • Received:2023-07-17 Revised:2023-11-06 Online:2024-09-15 Published:2024-09-10
  • About author:LI Yunchen,born in 1987,postgra-duate.His main research interest is object detection.
    ZHANG Rui,born in 1977,Ph.D,professor,Ph.D supervisor.His main research interests include data enginee-ring and information fusion.
  • Supported by:
    Natural Science Foundation of the Higher Education Institutions of Jiangsu Province,China(BK20200581).

Abstract: The objects captured by drones at high altitudes are generally small and have weak features,and they are greatly affec-ted by complex weather conditions.Object detection based on visible or infrared images often has high rates of missed detection and false detection.To address this problem,this paper proposes a dual-modal realtime object detection model DM-YOLO with reparameterization enhancement.Firstly,the visible and infrared images are effectively fused by channel concatenation,which makes efficient use of the complementary information in the dual-modal images at a very low cost.Secondly,a more efficient reparameterization module is proposed and a more powerful backbone network RepCSPDarkNet is constructed based on it,which effectively improves the feature extraction capability of the backbone network for dual-modal images.Then,a multi-level feature fusion module is proposed to enhance the multiscale feature representation of weak and small objects by fusing multi-scale feature information of weak and small objects with multi-receptive field dilated convolution and attention mechanism.Finally,the deep feature layer of the feature pyramid is removed,which reduces the model size while maintaining the detection accuracy.Experimental results on the large-scale dual-modal image dataset DroneVehicle show that,the detection accuracy of DM-YOLO is 2.45% higher than that of the baseline YOLOv5s,and is better than that of the YOLOv6 and YOLOv7 models.Furthermore,it effectively improves the accuracy and robustness of object detection under complex weather conditions,while achieving a detection speed of 82 frames per second,which can meet the requirements of realtime detection.

Key words: Reparameterization, Dual modality, Real-time object detection, Multiscale features, Attention mechanism

CLC Number: 

  • TP391
[1]NIU W H,YIN M M.Road Small Target Detection Algorithm Based on Improved YOLOv5[J].Chinese Journal of Sensors and Actuators,2023,36(1):36-44.
[2]XIE P X,CUI J R,ZHAO M.Electiric Bike Helment Wearing Detection Alogrithm Based on Improved YOLOv5[J].Computer Science,2023,50(S1):420-425.
[3]YANG Y H,ZHONG B J,TIAN H W.Target Detection Model of DS-yolov4-Tiny Rescue Robot[J].Computer Simulation,2022,39(1):387-393.
[4]GIRSHICK R B,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:580-587.
[5]GIRSHICK R.Fast R-CNN[C]//Proceedings of the IEEE In-ternational Conference on Computer Vision.Santiago:IEEE.2015:1440-1448.
[6]REN S Q,HE K M,GIRSHICK R B,et al.Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks[C]//Conference and Workshop on Neural Information Processing Systems.Montreal:MIT Press.2015:91-99.
[7]LIN T Y,DOLLÁR P,GIRSHICK R B,et al.Feature Pyramid Networks for Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:936-944.
[8]CAI Z W,VASCONCELOS N.Cascade R-CNN:Delving IntoHigh Quality Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:6154-6162.
[9]REDMON J,DIVVALA S,GIRSHICK R,et al.You Only Look Once:Unified,Real-Time Object Detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:779-788.
[10]REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Honolulu:IEEE,2017:6517-6525.
[11]REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[J].arXiv:1804.02767,2018.
[12]BOCHKOVSKIY A,WANG C Y,LIAO H Y.YOLOv4:Optimal Speed and Accuracy of Object Detection[J].arXiv:2004.10934,2020.
[13]ULTRALYTICS.YOLOv5[EB/OL].https://github.com/ul-tralytics/yolov5.
[14]LI C,LI L L,JIANG H L,et al.YOLOv6:A Single-Stage Object Detection Framework for Industrial Applications[J].arXiv:2209.02976,2022.
[15]WANG C Y,ALEXEY B,MARK L,et al.YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2023:7464-7475.
[16]LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single ShotMultiBox Detector[C]//Proceedings of the European Confe-rence on Computer Vision.Amsterdam:Springer.2016:21-37.
[17]FU C Y,LIU W,RANGA A,et al.DSSD:Deconvolutional Single Shot Detector[J].arXiv:1701.06659,2017.
[18]WU Z,MIAO X D,LI W W,et al.Low-Visibility Road Target Detection Algorithm Based on Infrared and Visible Light Fusion[J].Infrared Technology,2022,44(11):1154-1160.
[19]LIU J Y,FAN X,HUANG Z B,et al.Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition,2022:5792-5801.
[20]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Proceedings of the International Conference on Neural Information Processing Systems.Mon-treal,2014:2672-2680.
[21]GENG K K,ZOU W,YIN G D,et al.Low-observable targetsdetection for autonomous vehicles based on dual-modal sensor fusion with deep learning approach[J].Journal of Automobile Engineering,2019,233(9):2270-2283.
[22]ZHOU H,SUN M,REN X,et al.Visible-Thermal Image Object Detection via the Combination of Illumination Conditions and Temperature Information[J].Remote Sensing,2021,13(18):36-56.
[23]CHEN Y T,SHI J G,YE Z L,et al.Multimodal Object Detection via Probabilistic Ensembling[C]//Proceedings of the European Conference on Computer Vision.2022(9):139-158.
[24]SUN Y M,CAO B,ZHU P F,et al.Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Lear-ning[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32:6700-6713.
[25]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[26]DING X H,ZHANG X Y,MA N N,et al.RepVGG:MakingVGG-Style ConvNets Great Again[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:13733-13742.
[27]DING X H,ZHANG X Y,HAN J G,et al.Diverse BranchBlock:Building a Convolution as an Inception-Like Unit[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:10886-10895.
[28]KUMAR P,GABRIEL J,ZHU J,et al.MobileOne:An Im-proved One millisecond Mobile Backbone[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2023:7907-7917.
[29]SANDLER M,HOWARD A,ZHU M L,et al.Mobilenetv2:Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4510-4520.
[30]HOWARD A,SANDLER M,CHU G,et al.Searching for MobileNetV3[C]//Proceedings of the 2019 IEEE International Conference on Computer Vision.2019:1314-1324.
[31]MA N N,ZHANG X Y,ZHENG H T,et al.ShuffleNet V2:Practical Guidelines for Efficient CNN Architecture Design[C]//Proceedings of the European Conference on Computer Vision.2018(14):122-138.
[32]HAN K,WANG Y H,TIAN Q,et al.GhostNet:More FeaturesFrom Cheap Operations[C]//Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition.2020:1577-1586.
[33]HAN K,WANG Y H,XU C,et al.GhostNets on Heteroge-neous Devices via Cheap Operations[J].International Journal of Computer Vision,2022,130:1050-1069.
[34]CHEN C P,GUO Z C,ZENG H E,et al.RepGhost:A Hardware-Efficient Ghost Module via Re-parameterization[J].arXiv:2211.06088,2022.
[35]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE.2018:7132-7141.
[36]WOO S,PARK J,LEE J Y,et al.CBAM:convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision.Munich:Springer.2018,11211:3-19.
[37]HOU Q B,ZHOU D Q,FENG J S.Coordinate Attention for Ef-ficient Mobile Network Design[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Vir-tual:IEEE.2021:13713-13722.
[38]ZHANG H,ZU K K,LU J,et al.EPSANet:An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network[C]//Proceedings of the Asian Conference on Computer Vision.2022:541-557.
[1] HU Pengfei, WANG Youguo, ZHAI Qiqing, YAN Jun, BAI Quan. Night Vehicle Detection Algorithm Based on YOLOv5s and Bistable Stochastic Resonance [J]. Computer Science, 2024, 51(9): 173-181.
[2] LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[3] LI Zhe, LIU Yiyang, WANG Ke, YANG Jie, LI Yafei, XU Mingliang. Real-time Prediction Model of Carrier Aircraft Landing Trajectory Based on Stagewise Autoencoders and Attention Mechanism [J]. Computer Science, 2024, 51(9): 273-282.
[4] LIU Qilong, LI Bicheng, HUANG Zhiyong. CCSD:Topic-oriented Sarcasm Detection [J]. Computer Science, 2024, 51(9): 310-318.
[5] YAO Yao, YANG Jibin, ZHANG Xiongwei, LI Yihao, SONG Gongkunkun. CLU-Net Speech Enhancement Network for Radio Communication [J]. Computer Science, 2024, 51(9): 338-345.
[6] LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning [J]. Computer Science, 2024, 51(8): 133-142.
[7] ZHANG Rui, WANG Ziqi, LI Yang, WANG Jiabao, CHEN Yao. Task-aware Few-shot SAR Image Classification Method Based on Multi-scale Attention Mechanism [J]. Computer Science, 2024, 51(8): 160-167.
[8] WANG Qian, HE Lang, WANG Zhanqing, HUANG Kun. Road Extraction Algorithm for Remote Sensing Images Based on Improved DeepLabv3+ [J]. Computer Science, 2024, 51(8): 168-175.
[9] XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[10] PU Bin, LIANG Zhengyou, SUN Yu. Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion [J]. Computer Science, 2024, 51(8): 192-199.
[11] ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
[12] WANG Chao, TANG Chao, WANG Wenjian, ZHANG Jing. Infrared Human Action Recognition Method Based on Multimodal Attention Network [J]. Computer Science, 2024, 51(8): 232-241.
[13] ZHANG Lu, DUAN Youxiang, LIU Juan, LU Yuxi. Chinese Geological Entity Relation Extraction Based on RoBERTa and Weighted Graph Convolutional Networks [J]. Computer Science, 2024, 51(8): 297-303.
[14] CHEN Shanshan, YAO Subin. Study on Recommendation Algorithms Based on Knowledge Graph and Neighbor PerceptionAttention Mechanism [J]. Computer Science, 2024, 51(8): 313-323.
[15] BAI Wenchao, BAI Shuwen, HAN Xixian, ZHAO Yubo. Efficient Query Workload Prediction Algorithm Based on TCN-A [J]. Computer Science, 2024, 51(7): 71-79.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!