Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 241100137-10.doi: 10.11896/jsjkx.241100137

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Multi-modal Fusion Based Object Detection for All-day and Multi-scenario Environments

ZHANG Fan1, LI Ang1,2   

  1. 1 School of Communication and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
    2 Institute of Space-air-ground-sea Integrated Communication Technology,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
  • Online:2025-11-15 Published:2025-11-10
  • About author:ZHANG Fan,born in 1999,postgra-duate.His main research interests include multi-modal fusion and object detection.
    LI Ang,born in 1995,Ph.D,lecturer.His main research interests include computer vision and multi-modal lear-ning.
  • Supported by:
    National Natural Science Foundation of China(62306151).

Abstract: Traditional object detection methods have limitations in handling complex scenes,especially in low light conditions at night and shaded environments during the day,making it difficult to achieve ideal results.Existing multimodal image fusion techniques tend to emphasize the importance of infrared images in low light scenes,while neglecting the need for a balance between infrared and visible light fusion in complex daytime environments.Therefore,in response to the object detection demand for all-day and multi-scenario environments,this paper proposes a multi-modal fusion object detection method based on feature map classification and GAN.Unlike previous fusion methods that emphasize visual quality of images,this paper focuses on improving the object detection performance of fused images.By using a multi-scale attention mechanism to classify feature maps into saliency and detail feature maps,and optimizing the fusion effect through a generator and saliency and detail discriminators in a cross adversa-rial training network,key information of each modality is captured to meet the detection needs of different scenarios.The experimental results show that the proposed method performs well on TNO,RoadScene,and M3FD datasets,significantly improving the performance of multimodal fusion object detection.

Key words: Multimodal fusion, Object detection, Generate adversarial networks, Attention mechanism

CLC Number: 

  • TP391
[1]PARAMANANDHAM N,RAJENDIRAN K.Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications[J].Infrared Physics and Technology,2018,88:13-22.
[2]GAO H B,CHENG B,WANG J Q,et al.Object Classification Using CNN-Based Fusion of Vision and LIDAR in Autonomous Vehicle Environment[J].IEEE Transactions on Industrial Informatics,2018,14(9):4224-4231.
[3]ZHOU Z,WANG B,LI S,et al.Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters[J].Information Fusion,2016,30:15-26.
[4]LI H,QI X,XIE W.Fast infrared and visible image fusion with structural decomposition[J].Knowledge-Based Systems,2020,204:106182.
[5]MA J,ZHOU Y.Infrared and visible image fusion via gradientlet filter[J].Computer Vision and Image Understanding,2020,197/198:103016.
[6]LI H,WU X J,KITTLER J.MDLatLRR:A novel decomposition method for infrared and visible image fusion[J].IEEE Transactions on Image Processing,2020,29:4733-4746.
[7]CHEN J,LI X,LUO L,et al.Infrared and visible image fusion based on target-enhanced multiscale transform decomposition[J].Information Sciences,2020:508:64-78.
[8]CVEJIC N,BULL D,CANAGARAJAH N.Region-Based Multimodal Image Fusion Using ICA Bases[J].IEEE Sensors Journal,2007,7(5):743-751.
[9]LIU Y,CHEN X,WARD K R,et al.Image Fusion With Convolutional Sparse Representation[J].IEEE Signal Processing Letters,2016,23(12):1882-1886.
[10]MA J,CHEN C,LI C,et al.Infrared and visible image fusion via gradient transfer and total variation minimization[J].Information Fusion,2016,31:100-109.
[11]LI S,KANG X,FANG L,et al.Pixel-level image fusion:A survey of the state of the art[J].Information Fusion,2017,33:100-112.
[12]LI H,WU X.DenseFuse:A Fusion Approach to Infrared andVisible Images[J].IEEE Transactions on Image Processing,2019,28(5):2614-2623.
[13]LI H,WU J X,DURRANI T.NestFuse:An Infrared and Visible Image Fusion Architecture based on Nest Connection and Spatial/Channel Attention Models[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9645-9656.
[14]LI H,WU X J,KITTLER J.RFN-Nest:An end-to-end residual fusion network for infrared and visible images[J].Information Fusion,2021,73:72-86.
[15]ZHANG Y,LIU Y,SUN P,et al.IFCNN:A general image fusion framework based on convolutional neural network[J].Information Fusion,2020,54:99-118.
[16]TANG L,ZHANG H,XU H,et al.Rethinking the necessity of image fusion in high-level vision tasks:A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity[J].Information Fusion,2023,99:101870.
[17]LI D Y,NIE R C,PAN L N,et al.UMGN:An Infrared and Visible Image Fusion Network Based on Unsupervised Significance MaskGuidance[J].Computer Science,2024,51(6A):230600170-5.
[18]TANG L,XIANG X,ZHANG H,et al.DIVFusion:Darkness-free infrared and visible image fusion[J].Information Fusion,2023,91:477-493.
[19]MA J,YU W,LIANG P,et al.FusionGAN:A generative adversarial network for infrared and visible image fusion[J].Information Fusion,2018,48:11-26.
[20]MA J,XU H,JIANG J,et al.DDcGAN:A Dual-discriminator Conditional Generative Adversarial Network for Multi-resolution Image Fusion.[J].IEEE Transactions on Image Proces-sing,2020,29:4980-4995.
[21]ZHANG G D,CHEN Z H,SHENG B.Infrared Small Target Detection Based on Dilated Convolutional Conditional GenerativeAdversarial Networks[J].Computer Science,2024,51(2):151-160.
[22]MA J,ZHANG H,SHAO Z,et al.GANMcC:A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-14.
[23]TOET A.The TNO Multiband Image Data Collection[J].Data in Brief,2017,15:249-251.
[24]XU H,MA J,JIANG J,et al.U2Fusion:A Unified Unsupervised Image Fusion Network[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(1):502-518.
[25]LIU J,FAN X,HUANG Z,et alTarget-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5802-5811.
[26]LIU Y,JIN J,WANG Q,et al.Region level based multi-focus image fusion using quaternion wavelet and normalized cut[J].Signal Processing,2014,97:9-30.
[27]TOET A.Image fusion by a ratio of low-pass pyramid[J].Pattern Recognition Letters,1989,9(4):245-253.
[28]BHATNAGAR G,WU Q M J,LIU Z.Directive contrast based multimodal medical image fusion in NSCT domain[J].IEEE Transactions on Multimedia,2013,15(5):1014-1024.
[29]SURYANARAYANA G,KONDAMURI R S,YANG J.Lear-ning infrared degradations for coherent visible image fusion in the undecimated dual-tree complex wavelet domain[J].Infrared Physics and Technology,2024:143:105596.
[30]LIU Z,TSUKADA K,HANASAKI K,et al.Image fusion by using steerable pyramid[J].Pattern Recognition Letters,2001,22(9):929-939.
[31]LI L,SHI Y,LV M,et al.Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain[J].Remote Sensing,2024,16(20):3804.
[32]XIANG T,YAN L,GAO R.A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain[J].Infrared Physics & Technology,2015,69:53-61.
[33]YIGUO Y,DAN L,YANYAN L,et al.Multispectral and hyperspectral images fusion based on subspace representation and nonlocal low-rank regularization[J].International Journal of Remote Sensing,2024,45(9):2965-2984.
[34]CHEN Y,ZHOU F C,DONG K.Dual distinguisher infrared and visible light image fusion with enhanced visual significance [J/OL].Journal of Beijing University of Aeronautics and Astronautics,1-12[2024-11-19].https://doi.org/10.13700/j.bh.1001-5965.2024.0072.
[35]LIU C H,QI Y,DING W R.Infrared and visible image fusion method based on saliency detection in sparse domain[J].Infrared Physics & Technology,2017,83:94-102.
[36]NAIDU V P S.Hybrid DDCT-PCA based multi sensor image fusion[J].Journal of Optics,2014,43:48-61.
[37]PRABHAKAR K R,SRIKAR V S,BABU R V.Deepfuse:Adeep unsupervised approach for exposure fusion with extreme exposure image pairs[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:4714-4722.
[38]SHI Y,GANG J,XI L,et al.TCPMFNet:An infrared and visible image fusion network with composite auto encoder and transformer-convolutional parallel mixed fusion strategy[J].Infrared Physics and Technology,2022,127:104405.
[39]ZHANG H,XU H,XIAO Y,et al.Rethinking the image fusion:A fast unified image fusion network based on proportional maintenance of gradient and intensity[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12797-12804.
[40]MA J,TANG L,XU M,et al.STDFusionNet:An infrared and visible image fusion network based on salient target detection[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-13.
[41]WANG W,WEI C,YANG W,et al.Gladnet:Low-light en-hancement network with global awareness[C]//2018 13th IEEE International Conference on Automatic Face & Gesture recognition(FG 2018).IEEE,2018:751-755.
[42]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Advances in Neural Information Processing Systems.2014.
[43]ZHANG Y,YU J,WANG Y,et al.Small object detection based on hierarchical attention mechanism and multi-scale separable detection[J].IET Image Processing,2023,17(14):3986-3999.
[44]DENG Z,HU X,ZHU L,et al.R3net:Recurrent residual refinement network for saliency detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.Menlo Park,CA:AAAI,2018:684-690.
[45]LI A,NI S,CHEN Y,et al.Cross-modal object detection viaUAV[J].IEEE Transactions on Vehicular Technology,2023,72(8):10894-10905.
[1] QIAN Qing, CHEN Huicheng, CUI Yunhe, TANG Ruixue, FU Jinmei. Joint Entity and Relation Extraction Method with Multi-scale Collaborative Aggregation and Axial-semantic Guidance [J]. Computer Science, 2026, 53(3): 97-106.
[2] GE Zeqing, HUANG Shengjun. Semi-supervised Learning Method for Multi-label Tabular Data [J]. Computer Science, 2026, 53(3): 151-157.
[3] ZHAO Binbei, ZHU Li, ZHAO Hongli, LI Yutong. Computer Vision Applications in Rail Transit Systems [J]. Computer Science, 2026, 53(3): 214-224.
[4] WANG Xinyu, GAO Donghuai, NING Yuwen, XU Hao, QI Haonan. Student Behavior Detection Method Based on Improved YOLO Algorithm [J]. Computer Science, 2026, 53(3): 246-256.
[5] ZHUO Tienong, YING Di, ZHAO Hui. Research on Student Classroom Concentration Integrating Cross-modal Attention and Role
Interaction
[J]. Computer Science, 2026, 53(2): 67-77.
[6] XU Jingtao, YANG Yan, JIANG Yongquan. Time-Frequency Attention Based Model for Time Series Anomaly Detection [J]. Computer Science, 2026, 53(2): 161-169.
[7] HUANG Jing, WANG Teng, LIU Jian, HU Kai, PENG Xin, HUANG Yamin, WEN Yuanqiao. Multimodal Visual Detection for Underwater Sonar Target Images [J]. Computer Science, 2026, 53(2): 227-235.
[8] HAN Lei, SHANG Haoyu, QIAN Xiaoyan, GU Yan, LIU Qingsong, WANG Chuang. Constrained Multi-loss Video Anomaly Detection with Dual-branch Feature Fusion [J]. Computer Science, 2026, 53(2): 236-244.
[9] GUO Xingxing, XIAO Yannan, WEN Peizhi, XU Zhi, HUANG Wenming. Attention-based Audio-driven Digital Face Video Generation Method [J]. Computer Science, 2026, 53(2): 245-252.
[10] JI Sai, QIAO Liwei, SUN Yajie. Semantic-guided Hybrid Cross-feature Fusion Method for Infrared and Visible Light Images [J]. Computer Science, 2026, 53(2): 253-263.
[11] CHANG Xuanwei, DUAN Liguo, CHEN Jiahao, CUI Juanjuan, LI Aiping. Method for Span-level Sentiment Triplet Extraction by Deeply Integrating Syntactic and Semantic
Features
[J]. Computer Science, 2026, 53(2): 322-330.
[12] ZHANG Jing, PAN Jinghao, JIANG Wenchao. Background Structure-aware Few-shot Knowledge Graph Completion [J]. Computer Science, 2026, 53(2): 331-341.
[13] ZHOU Bingquan, JIANG Jie, CHEN Jiangmin, ZHAN Lixin. EvR-DETR:Event-RGB Fusion for Lightweight End-to-End Object Detection [J]. Computer Science, 2026, 53(1): 153-162.
[14] LI Fangfang, KONG Yuqiu, LIU Yang , LI Pengyue. Co-salient Object Detection Guided by Category Labels [J]. Computer Science, 2026, 53(1): 163-172.
[15] LI Ang, ZHANG Jieyuan, LIU Xunyun. Camouflaged Object Detection for Aerial Images Based on Bidirectional Cross-attentionCross-domain Fusion [J]. Computer Science, 2026, 53(1): 173-179.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!