Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 241100137-10.doi: 10.11896/jsjkx.241100137

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Multi-modal Fusion Based Object Detection for All-day and Multi-scenario Environments

ZHANG Fan1, LI Ang1,2   

  1. 1 School of Communication and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
    2 Institute of Space-air-ground-sea Integrated Communication Technology,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    National Natural Science Foundation of China(62306151).

Abstract: Traditional object detection methods have limitations in handling complex scenes,especially in low light conditions at night and shaded environments during the day,making it difficult to achieve ideal results.Existing multimodal image fusion techniques tend to emphasize the importance of infrared images in low light scenes,while neglecting the need for a balance between infrared and visible light fusion in complex daytime environments.Therefore,in response to the object detection demand for all-day and multi-scenario environments,this paper proposes a multi-modal fusion object detection method based on feature map classification and GAN.Unlike previous fusion methods that emphasize visual quality of images,this paper focuses on improving the object detection performance of fused images.By using a multi-scale attention mechanism to classify feature maps into saliency and detail feature maps,and optimizing the fusion effect through a generator and saliency and detail discriminators in a cross adversa-rial training network,key information of each modality is captured to meet the detection needs of different scenarios.The experimental results show that the proposed method performs well on TNO,RoadScene,and M3FD datasets,significantly improving the performance of multimodal fusion object detection.

Key words: Multimodal fusion, Object detection, Generate adversarial networks, Attention mechanism

CLC Number: 

  • TP391
[1]PARAMANANDHAM N,RAJENDIRAN K.Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications[J].Infrared Physics and Technology,2018,88:13-22.
[2]GAO H B,CHENG B,WANG J Q,et al.Object Classification Using CNN-Based Fusion of Vision and LIDAR in Autonomous Vehicle Environment[J].IEEE Transactions on Industrial Informatics,2018,14(9):4224-4231.
[3]ZHOU Z,WANG B,LI S,et al.Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with Gaussian and bilateral filters[J].Information Fusion,2016,30:15-26.
[4]LI H,QI X,XIE W.Fast infrared and visible image fusion with structural decomposition[J].Knowledge-Based Systems,2020,204:106182.
[5]MA J,ZHOU Y.Infrared and visible image fusion via gradientlet filter[J].Computer Vision and Image Understanding,2020,197/198:103016.
[6]LI H,WU X J,KITTLER J.MDLatLRR:A novel decomposition method for infrared and visible image fusion[J].IEEE Transactions on Image Processing,2020,29:4733-4746.
[7]CHEN J,LI X,LUO L,et al.Infrared and visible image fusion based on target-enhanced multiscale transform decomposition[J].Information Sciences,2020:508:64-78.
[8]CVEJIC N,BULL D,CANAGARAJAH N.Region-Based Multimodal Image Fusion Using ICA Bases[J].IEEE Sensors Journal,2007,7(5):743-751.
[9]LIU Y,CHEN X,WARD K R,et al.Image Fusion With Convolutional Sparse Representation[J].IEEE Signal Processing Letters,2016,23(12):1882-1886.
[10]MA J,CHEN C,LI C,et al.Infrared and visible image fusion via gradient transfer and total variation minimization[J].Information Fusion,2016,31:100-109.
[11]LI S,KANG X,FANG L,et al.Pixel-level image fusion:A survey of the state of the art[J].Information Fusion,2017,33:100-112.
[12]LI H,WU X.DenseFuse:A Fusion Approach to Infrared andVisible Images[J].IEEE Transactions on Image Processing,2019,28(5):2614-2623.
[13]LI H,WU J X,DURRANI T.NestFuse:An Infrared and Visible Image Fusion Architecture based on Nest Connection and Spatial/Channel Attention Models[J].IEEE Transactions on Instrumentation and Measurement,2020,69(12):9645-9656.
[14]LI H,WU X J,KITTLER J.RFN-Nest:An end-to-end residual fusion network for infrared and visible images[J].Information Fusion,2021,73:72-86.
[15]ZHANG Y,LIU Y,SUN P,et al.IFCNN:A general image fusion framework based on convolutional neural network[J].Information Fusion,2020,54:99-118.
[16]TANG L,ZHANG H,XU H,et al.Rethinking the necessity of image fusion in high-level vision tasks:A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity[J].Information Fusion,2023,99:101870.
[17]LI D Y,NIE R C,PAN L N,et al.UMGN:An Infrared and Visible Image Fusion Network Based on Unsupervised Significance MaskGuidance[J].Computer Science,2024,51(6A):230600170-5.
[18]TANG L,XIANG X,ZHANG H,et al.DIVFusion:Darkness-free infrared and visible image fusion[J].Information Fusion,2023,91:477-493.
[19]MA J,YU W,LIANG P,et al.FusionGAN:A generative adversarial network for infrared and visible image fusion[J].Information Fusion,2018,48:11-26.
[20]MA J,XU H,JIANG J,et al.DDcGAN:A Dual-discriminator Conditional Generative Adversarial Network for Multi-resolution Image Fusion.[J].IEEE Transactions on Image Proces-sing,2020,29:4980-4995.
[21]ZHANG G D,CHEN Z H,SHENG B.Infrared Small Target Detection Based on Dilated Convolutional Conditional GenerativeAdversarial Networks[J].Computer Science,2024,51(2):151-160.
[22]MA J,ZHANG H,SHAO Z,et al.GANMcC:A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-14.
[23]TOET A.The TNO Multiband Image Data Collection[J].Data in Brief,2017,15:249-251.
[24]XU H,MA J,JIANG J,et al.U2Fusion:A Unified Unsupervised Image Fusion Network[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(1):502-518.
[25]LIU J,FAN X,HUANG Z,et alTarget-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5802-5811.
[26]LIU Y,JIN J,WANG Q,et al.Region level based multi-focus image fusion using quaternion wavelet and normalized cut[J].Signal Processing,2014,97:9-30.
[27]TOET A.Image fusion by a ratio of low-pass pyramid[J].Pattern Recognition Letters,1989,9(4):245-253.
[28]BHATNAGAR G,WU Q M J,LIU Z.Directive contrast based multimodal medical image fusion in NSCT domain[J].IEEE Transactions on Multimedia,2013,15(5):1014-1024.
[29]SURYANARAYANA G,KONDAMURI R S,YANG J.Lear-ning infrared degradations for coherent visible image fusion in the undecimated dual-tree complex wavelet domain[J].Infrared Physics and Technology,2024:143:105596.
[30]LIU Z,TSUKADA K,HANASAKI K,et al.Image fusion by using steerable pyramid[J].Pattern Recognition Letters,2001,22(9):929-939.
[31]LI L,SHI Y,LV M,et al.Infrared and Visible Image Fusion via Sparse Representation and Guided Filtering in Laplacian Pyramid Domain[J].Remote Sensing,2024,16(20):3804.
[32]XIANG T,YAN L,GAO R.A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain[J].Infrared Physics & Technology,2015,69:53-61.
[33]YIGUO Y,DAN L,YANYAN L,et al.Multispectral and hyperspectral images fusion based on subspace representation and nonlocal low-rank regularization[J].International Journal of Remote Sensing,2024,45(9):2965-2984.
[34]CHEN Y,ZHOU F C,DONG K.Dual distinguisher infrared and visible light image fusion with enhanced visual significance [J/OL].Journal of Beijing University of Aeronautics and Astronautics,1-12[2024-11-19].https://doi.org/10.13700/j.bh.1001-5965.2024.0072.
[35]LIU C H,QI Y,DING W R.Infrared and visible image fusion method based on saliency detection in sparse domain[J].Infrared Physics & Technology,2017,83:94-102.
[36]NAIDU V P S.Hybrid DDCT-PCA based multi sensor image fusion[J].Journal of Optics,2014,43:48-61.
[37]PRABHAKAR K R,SRIKAR V S,BABU R V.Deepfuse:Adeep unsupervised approach for exposure fusion with extreme exposure image pairs[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:4714-4722.
[38]SHI Y,GANG J,XI L,et al.TCPMFNet:An infrared and visible image fusion network with composite auto encoder and transformer-convolutional parallel mixed fusion strategy[J].Infrared Physics and Technology,2022,127:104405.
[39]ZHANG H,XU H,XIAO Y,et al.Rethinking the image fusion:A fast unified image fusion network based on proportional maintenance of gradient and intensity[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12797-12804.
[40]MA J,TANG L,XU M,et al.STDFusionNet:An infrared and visible image fusion network based on salient target detection[J].IEEE Transactions on Instrumentation and Measurement,2021,70:1-13.
[41]WANG W,WEI C,YANG W,et al.Gladnet:Low-light en-hancement network with global awareness[C]//2018 13th IEEE International Conference on Automatic Face & Gesture recognition(FG 2018).IEEE,2018:751-755.
[42]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//Advances in Neural Information Processing Systems.2014.
[43]ZHANG Y,YU J,WANG Y,et al.Small object detection based on hierarchical attention mechanism and multi-scale separable detection[J].IET Image Processing,2023,17(14):3986-3999.
[44]DENG Z,HU X,ZHU L,et al.R3net:Recurrent residual refinement network for saliency detection[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.Menlo Park,CA:AAAI,2018:684-690.
[45]LI A,NI S,CHEN Y,et al.Cross-modal object detection viaUAV[J].IEEE Transactions on Vehicular Technology,2023,72(8):10894-10905.
[1] LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion [J]. Computer Science, 2025, 52(9): 259-268.
[2] PENG Jiao, HE Yue, SHANG Xiaoran, HU Saier, ZHANG Bo, CHANG Yongjuan, OU Zhonghong, LU Yanyan, JIANG dan, LIU Yaduo. Text-Dynamic Image Cross-modal Retrieval Algorithm Based on Progressive Prototype Matching [J]. Computer Science, 2025, 52(9): 276-281.
[3] GAO Long, LI Yang, WANG Suge. Sentiment Classification Method Based on Stepwise Cooperative Fusion Representation [J]. Computer Science, 2025, 52(9): 313-319.
[4] SHEN Tao, ZHANG Xiuzai, XU Dai. Improved RT-DETR Algorithm for Small Object Detection in Remote Sensing Images [J]. Computer Science, 2025, 52(8): 214-221.
[5] LIU Jian, YAO Renyuan, GAO Nan, LIANG Ronghua, CHEN Peng. VSRI:Visual Semantic Relational Interactor for Image Caption [J]. Computer Science, 2025, 52(8): 222-231.
[6] LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[7] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[8] ZHUANG Jianjun, WAN Li. SCF U2-Net:Lightweight U2-Net Improved Method for Breast Ultrasound Lesion SegmentationCombined with Fuzzy Logic [J]. Computer Science, 2025, 52(7): 161-169.
[9] XU Yongwei, REN Haopan, WANG Pengfei. Object Detection Algorithm Based on YOLOv8 Enhancement and Its Application Norms [J]. Computer Science, 2025, 52(7): 189-200.
[10] ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[11] WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[12] KONG Yinling, WANG Zhongqing, WANG Hongling. Study on Opinion Summarization Incorporating Evaluation Object Information [J]. Computer Science, 2025, 52(7): 233-240.
[13] GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
[14] TAN Jiahui, WEN Chenyan, HUANG Wei, HU Kai. CT Image Segmentation of Intracranial Hemorrhage Based on ESC-TransUNet Network [J]. Computer Science, 2025, 52(6A): 240700030-9.
[15] CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!