计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241100137-10.doi: 10.11896/jsjkx.241100137

• 计算机图形学&多媒体 • 上一篇    下一篇

面向全天候多场景的多模态融合目标检测方法

张帆1, 李昂1,2   

  1. 1 南京邮电大学通信与信息工程学院 南京 210003
    2 南京邮电大学空天地海通信技术一体化研究院 南京 210003
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 李昂(liang@njupt.edu.cn)
  • 作者简介:zhangfan_0536@163.com
  • 基金资助:
    国家自然科学基金(62306151)

Multi-modal Fusion Based Object Detection for All-day and Multi-scenario Environments

ZHANG Fan1, LI Ang1,2   

  1. 1 School of Communication and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
    2 Institute of Space-air-ground-sea Integrated Communication Technology,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    National Natural Science Foundation of China(62306151).

摘要: 传统的目标检测方法在处理复杂场景时存在局限性,尤其在夜间低光照和白天阴影环境中难以取得理想效果。现有多模态图像融合技术多偏重红外图像在低光照场景中的重要性,却忽视了白天复杂环境对红外与可见光融合的需求平衡。因此,针对全天候、多场景的目标检测需求,提出了一种基于特征图分类与生成对抗网络(Generative Adversarial Network,GAN)的多模态融合目标检测方法。与以往强调图像视觉质量的融合方法不同,该方法着眼于提升融合图像的目标检测性能。通过多尺度注意机制将特征图分类为显著性和细节特征图,并在交叉对抗训练网络中通过生成器及显著性、细节判别器优化融合效果,捕捉各模态的关键信息,以满足不同场景的检测需求。实验结果表明,所提出的方法在TNO,RoadScene和M3FD数据集上的表现优异,显著提升了多模态融合目标检测的性能。

关键词: 模态融合, 目标检测, 生成对抗网络, 注意力机制

Abstract: Traditional object detection methods have limitations in handling complex scenes,especially in low light conditions at night and shaded environments during the day,making it difficult to achieve ideal results.Existing multimodal image fusion techniques tend to emphasize the importance of infrared images in low light scenes,while neglecting the need for a balance between infrared and visible light fusion in complex daytime environments.Therefore,in response to the object detection demand for all-day and multi-scenario environments,this paper proposes a multi-modal fusion object detection method based on feature map classification and GAN.Unlike previous fusion methods that emphasize visual quality of images,this paper focuses on improving the object detection performance of fused images.By using a multi-scale attention mechanism to classify feature maps into saliency and detail feature maps,and optimizing the fusion effect through a generator and saliency and detail discriminators in a cross adversa-rial training network,key information of each modality is captured to meet the detection needs of different scenarios.The experimental results show that the proposed method performs well on TNO,RoadScene,and M3FD datasets,significantly improving the performance of multimodal fusion object detection.

Key words: Multimodal fusion, Object detection, Generate adversarial networks, Attention mechanism

中图分类号: 

  • TP391
[1]PARAMANANDHAM N,RAJENDIRAN K.Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications[J].Infrared Physics and Technology,2018,88:13-22.
[2]GAO H B,CHENG B,WANG J Q,et al.Object Classification Using CNN-Based Fusion of Vision and LIDAR in Autonomous Vehicle Environment[J].IEEE Transactions on Industrial Informatics,2018,14(9):4224-4231.
[3]ZHOU Z,WANG B,LI S,et al.Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters[J].Information Fusion,2016,30:15-26.
[4]LI H,QI X,XIE W.Fast infrared and visible image fusion with structural decomposition[J].Knowledge-Based Systems,2020,204:106182.
[5]MA J,ZHOU Y.Infrared and visible image fusion via gradientlet filter[J].Computer Vision and Image Understanding,2020,197/198:103016.
[6]LI H,WU X J,KITTLER J.MDLatLRR:A novel decomposition method for infrared and visible image fusion[J].IEEE Transactions on Image Processing,2020,29:4733-4746.
[7]CHEN J,LI X,LUO L,et al.Infrared and visible image fusion based on target-enhanced multiscale transform decomposition[J].Information Sciences,2020:508:64-78.
[8]CVEJIC N,BULL D,CANAGARAJAH N.Region-Based Multimodal Image Fusion Using ICA Bases[J].IEEE Sensors Journal,2007,7(5):743-751.
[9]LIU Y,CHEN X,WARD K R,et al.Image Fusion With Convolutional Sparse Representation[J].IEEE Signal Processing Letters,2016,23(12):1882-1886.
[10]MA J,CHEN C,LI C,et al.Infrared and visible image fusion via gradient transfer and total variation minimization[J].Information Fusion,2016,31:100-109.
[11]LI S,KANG X,FANG L,et al.Pixel-level image fusion:A survey of the state of the art[J].Information Fusion,2017,33:100-112.
[12]LI H,WU X.DenseFuse:A Fusion Approach to Infrared andVisible Images[J].IEEE Transactions on Image Processing,2019,28(5):2614-2623.
[13]LI H,WU J X,DURRANI T.NestFuse:An Infrared and Visible Image Fusion Architecture based on Nest Connection and Spatial/Channel Attention Models[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9645-9656.
[14]LI H,WU X J,KITTLER J.RFN-Nest:An end-to-end residual fusion network for infrared and visible images[J].Information Fusion,2021,73:72-86.
[15]ZHANG Y,LIU Y,SUN P,et al.IFCNN:A general image fusion framework based on convolutional neural network[J].Information Fusion,2020,54:99-118.
[16]TANG L,ZHANG H,XU H,et al.Rethinking the necessity of image fusion in high-level vision tasks:A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity[J].Information Fusion,2023,99:101870.
[17]LI D Y,NIE R C,PAN L N,et al.UMGN:An Infrared and Visible Image Fusion Network Based on Unsupervised Significance MaskGuidance[J].Computer Science,2024,51(6A):230600170-5.
[18]TANG L,XIANG X,ZHANG H,et al.DIVFusion:Darkness-free infrared and visible image fusion[J].Information Fusion,2023,91:477-493.
[19]MA J,YU W,LIANG P,et al.FusionGAN:A generative adversarial network for infrared and visible image fusion[J].Information Fusion,2018,48:11-26.
[20]MA J,XU H,JIANG J,et al.DDcGAN:A Dual-discriminator Conditional Generative Adversarial Network for Multi-resolution Image Fusion.[J].IEEE Transactions on Image Proces-sing,2020,29:4980-4995.
[21]ZHANG G D,CHEN Z H,SHENG B.Infrared Small Target Detection Based on Dilated Convolutional Conditional GenerativeAdversarial Networks[J].Computer Science,2024,51(2):151-160.
[22]MA J,ZHANG H,SHAO Z,et al.GANMcC:A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-14.
[23]TOET A.The TNO Multiband Image Data Collection[J].Data in Brief,2017,15:249-251.
[24]XU H,MA J,JIANG J,et al.U2Fusion:A Unified Unsupervised Image Fusion Network[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(1):502-518.
[25]LIU J,FAN X,HUANG Z,et alTarget-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5802-5811.
[26]LIU Y,JIN J,WANG Q,et al.Region level based multi-focus image fusion using quaternion wavelet and normalized cut[J].Signal Processing,2014,97:9-30.
[27]TOET A.Image fusion by a ratio of low-pass pyramid[J].Pattern Recognition Letters,1989,9(4):245-253.
[28]BHATNAGAR G,WU Q M J,LIU Z.Directive contrast based multimodal medical image fusion in NSCT domain[J].IEEE Transactions on Multimedia,2013,15(5):1014-1024.
[29]SURYANARAYANA G,KONDAMURI R S,YANG J.Lear-ning infrared degradations for coherent visible image fusion in the undecimated dual-tree complex wavelet domain[J].Infrared Physics and Technology,2024:143:105596.
[30]LIU Z,TSUKADA K,HANASAKI K,et al.Image fusion by using steerable pyramid[J].Pattern Recognition Letters,2001,22(9):929-939.
[31]LI L,SHI Y,LV M,et al.Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain[J].Remote Sensing,2024,16(20):3804.
[32]XIANG T,YAN L,GAO R.A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain[J].Infrared Physics & Technology,2015,69:53-61.
[33]YIGUO Y,DAN L,YANYAN L,et al.Multispectral and hyperspectral images fusion based on subspace representation and nonlocal low-rank regularization[J].International Journal of Remote Sensing,2024,45(9):2965-2984.
[34]CHEN Y,ZHOU F C,DONG K.Dual distinguisher infrared and visible light image fusion with enhanced visual significance [J/OL].Journal of Beijing University of Aeronautics and Astronautics,1-12[2024-11-19].https://doi.org/10.13700/j.bh.1001-5965.2024.0072.
[35]LIU C H,QI Y,DING W R.Infrared and visible image fusion method based on saliency detection in sparse domain[J].Infrared Physics & Technology,2017,83:94-102.
[36]NAIDU V P S.Hybrid DDCT-PCA based multi sensor image fusion[J].Journal of Optics,2014,43:48-61.
[37]PRABHAKAR K R,SRIKAR V S,BABU R V.Deepfuse:Adeep unsupervised approach for exposure fusion with extreme exposure image pairs[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:4714-4722.
[38]SHI Y,GANG J,XI L,et al.TCPMFNet:An infrared and visible image fusion network with composite auto encoder and transformer-convolutional parallel mixed fusion strategy[J].Infrared Physics and Technology,2022,127:104405.
[39]ZHANG H,XU H,XIAO Y,et al.Rethinking the image fusion:A fast unified image fusion network based on proportional maintenance of gradient and intensity[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12797-12804.
[40]MA J,TANG L,XU M,et al.STDFusionNet:An infrared and visible image fusion network based on salient target detection[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-13.
[41]WANG W,WEI C,YANG W,et al.Gladnet:Low-light en-hancement network with global awareness[C]//2018 13th IEEE International Conference on Automatic Face & Gesture recognition(FG 2018).IEEE,2018:751-755.
[42]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Advances in Neural Information Processing Systems.2014.
[43]ZHANG Y,YU J,WANG Y,et al.Small object detection based on hierarchical attention mechanism and multi-scale separable detection[J].IET Image Processing,2023,17(14):3986-3999.
[44]DENG Z,HU X,ZHU L,et al.R3net:Recurrent residual refinement network for saliency detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.Menlo Park,CA:AAAI,2018:684-690.
[45]LI A,NI S,CHEN Y,et al.Cross-modal object detection viaUAV[J].IEEE Transactions on Vehicular Technology,2023,72(8):10894-10905.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!