Computer Science ›› 2026, Vol. 53 ›› Issue (2): 253-263.doi: 10.11896/jsjkx.250100123

• Computer Grapnics & Multimedia • Previous Articles     Next Articles

Semantic-guided Hybrid Cross-feature Fusion Method for Infrared and Visible Light Images

JI Sai1,2, QIAO Liwei1, SUN Yajie1   

  1. 1 College of Computer Science,Cyber Science and Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China
    2 School of Information Engineering,Taizhou University,Taizhou,Jiangsu 225300,China
  • Received:2025-01-20 Revised:2025-05-23 Published:2026-02-10
  • About author:JI Sai,born in 1976,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.32528M).His main research interests include image processing and wireless sensor networks.
    SUN Yajie,born in 1980,Ph.D,professor.Her main research interests include information processing,sensors and test systems.
  • Supported by:
    National Natural Science Foundation of China(62172292).

Abstract: To address the difficulty of self-encoder image fusion algorithms in highlighting infrared(IR) salient targets and the challenge of simultaneously considering global structure and local detail information in existing fusion strategies-while most algorithms overly prioritize statistical metrics and overlook support for advanced visual tasks-a semantic segmentation-guided image fusion method with a hybrid cross-feature mechanism is proposed.Shallow and deep skip connections are introduced between the encoder and decoder,employing a maximum value selection strategy to emphasize salient targets and reduce redundancy.The fusion strategy integrates global context and local fine-grained information through cross-attention and convolutional operations,combining different modal features within a single frame.The fused image is then fed into a segmentation network,where semantic loss guides high-level semantic information back to the fusion network,enabling the generation of a fused image rich in semantic detail.Experimental results demonstrate that the proposed method achieves average improvements of 33.93%,112.81%,49.89%,27.64%,and 23.87% in SD,MI,VIFF,Qabf,and AG metrics on the RoadScene dataset compared to seven baseline algorithms.Additionally,the intersection and concurrency ratios for car,person,and bicycle categories in the semantic segmentation task on the MSRS dataset increase by 3.47%,6.37%,and 9.57% on average,outperforming other state-of-the-art methods.

Key words: Image fusion, Infrared and visible image, Cross attention mechanism, Convolution, Semantic segmentation

CLC Number: 

  • TP391.4
[1]LIU Y,WANG L,CHENG J,et al.Multi-focus image fusion:A survey of the state of the art[J].Information Fusion,2020,64:71-91.
[2]JIANG S,WANG P L,DENG Z J,et al.Image fusion algorithm for traffic accident rescue based on deep learning[J]Journal of Jilin University(Engineering and Technology Edition),2023,53(12):3472-3480.
[3]ZHOU H,WU W,ZHANG Y,et al.Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network[J].IEEE Transactions on Multimedia,2023,25:635-648.
[4]JIAN L,RAYHANA R,MA L,et al.Infrared and visible image fusion based on deep decomposition network and saliency analysis[J].IEEE Transactions on Multimedia,2022,24:3314-3326.
[5]HA Q,WATANABE K,KARASAWA T,et al.Mfnet:Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes[C]//Proceedings of the IEEE Internatio-nal Conference Intelligent Robots and Systems.2017:5108-5115.
[6]ZHANG X,YE P,LEUNG H,et al.Object fusion trackingbased on visible and infrared images:A comprehensive review[J].Information Fusion,2020,63:166-187.
[7]ZHOU T,LI Q,LU H,et al.GAN review:Models and medical image fusion applications[J].Information Fusion,2023,91:134-148.
[8]MA M,MA W,JIAO L,et al.A multimodal hyper-fusion transformer for remote sensing image classification[J].Information Fusion,2023,96:66-79.
[9]BAVIRISETTI D P,XIAO G,LIU G.Multi-sensor image fusionbased on fourth order partial differential equations[C]//Proceedings of the International Conference on Information Fusion.2017:1-9.
[10]LI G,LIN Y,QU X.An infrared and visible image fusion me-thod based on multi-scale transformation and norm optimization[J].Information Fusion,2021,71:109-129.
[11]XU H,ZHANG H,MA J.Classification saliency-based rule for visible and infrared image fusion[J].IEEE Transactions on Computer Imaging,2021,7(7):824-836.
[12]ZHANG Q,FU Y,LI H,et al.Dictionary learning method for joint sparse representation-based image fusion[J].Optical Enginneering,2013,52(5):057006.
[13]ZHOU Z,WANG B,LI S,et al.Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with gaussian and bilateral filters[J].Information Fusion,2016,30:15-26.
[14]ZHAO F,ZHAO W,YAO L,et al.Self-supervised feature adaption for infrared and visible image fusion[J].Information Fusion,2021,76:189-203.
[15]LI H,WU X J.CrossFuse:A novel cross attention mechanism based infrared and visible image fusion approach[J].Information Fusion,2023,96:66-79.
[16]LIU J,FAN X,HUANG Z,et al.Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2022:5802-5811.
[17]TANG W,HE F,LIU Y.YDTR:Infrared and visible image fusion via Y-shape dynamic transformer[J].IEEE Transactions on Multimedia,2022,25:5413-5428.
[18]PRABHAKAR K R,SRIKAR V S,BABU R V.DeepFuse:A deep unsupervised approach for exposure fusion with extreme exposure image pairs[C]//Proceedings of IEEE Internation Conference on Computer Vision(ICCV).2017:4724-4732.
[19]LI H,WU X.Densefuse:A fusion approach to infrared and visible images[J].IEEE Transactions on Image Processing,2019,28:2614-2623.
[20]LI H,WU X,KITTLER J.RFN-NEST:An end-to-end residual fusion network for infrared and visible images[J].Information Fusion,2021,73:72-86.
[21]WANG Z,WANG J,WU Y,et al.UNFusion:A unified multi-scale densely connected network for infrared and visible image fusion[J].IEEE Transsctions on Circuits Systems and Video Technology,2021,32(6):3360-3374.
[22]MA J,TANG L,FAN F,et al.Swinfusion:Cross-domain long-range learning for general image fusion via Swin transformer[J].IEEE/CAA Journal of Automatica Sinica,2022,9(7):1200-1217.
[23]PENG C,TIAN T,CHEN C,et al.Bilateral attention decoder:A lightweight decoder for real-time semantic segmentation[J].Neural Networks,2021,137:188-199.
[24]CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223.
[25]WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE TIP,2004,13(4):600-612.
[26]TANG L,YUAN J,ZHANG H,et al.PIAfusion:A progressive infrared and visible image fusion network based on illumination aware[J].Information Fusion,2022,83/84:79-92.
[27]XU H,MA J,LE Z,et al.FusionDN:A unified densely connec-ted network for image fusion[C]//Proceedings of the AAAI Conference on Aritificial Intelligence.AAAI,2020:12484-12491.
[28]TOET A,HOGERVORST M A.Progress in color night vision[J].Optical Engineering,2012,51(1):1-20.
[29]ZHANG X,DEMIRIS Y.Visible and infrared image fusion using deep learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(8):10535-10554.
[30]TANG L,YUAN J,MA J.Image fusion in the loop of high-level vision tasks:A semantic-aware real-time infrared and visible image fusion network[J].Information Fusion,2022,82:28-42.
[31]WANG D,LIU J,LIU R,FAN X.An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection[J].Information Fusion,2023,98:101828.
[32]ZHAO Z,BAI H,ZHANG J,et al.Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion[C]//Proceedings of the IEEE/CVF ConfErence on Computer Vision and Pattern Recognition.CVPR.2023:5906-5916.
[33]CHEN J,DING J,MA J.HitFusion:Infrared and Visible Image Fusion for High-Level Vision Tasks Using Transformer[J].IEEE Transactions on Multimedia,2024,26:10145-10159.
[34]TANG L,DENG Y,MA Y,et al.SuperFusion:A VersatileImage Registration and Fusion Network with Semantic Awareness[J].IEEE/CAA Journal of Automatica Sinica,2022,9(12):2121-2137.
[35]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//Computer vision-ECCV 2014.Springer,2014:740-755.
[1] CHEN Haitao, LIANG Junwei, CHEN Chen, WANG Yufan, ZHOU Yu. Multimodal Physical Education Data Fusion via Graph Alignment for Action Recognition [J]. Computer Science, 2026, 53(2): 89-98.
[2] CHANG Xuanwei, DUAN Liguo, CHEN Jiahao, CUI Juanjuan, LI Aiping. Method for Span-level Sentiment Triplet Extraction by Deeply Integrating Syntactic and Semantic
Features
[J]. Computer Science, 2026, 53(2): 322-330.
[3] ZHAI Jie, LI Yanhao, CHEN Lexuan, GUO Weibin. Dynamic Recommendation of Personalized Hands-on Learning Materials Based on LightweightEducational LLMs [J]. Computer Science, 2026, 53(2): 48-56.
[4] ZHANG Haopeng, SHI Zheng, LIU Feng, SONG Wanru. CPViG-Net:Students’ Classroom Behavior Recognition Based on Cross-stage Visual GraphConvolution [J]. Computer Science, 2026, 53(2): 57-66.
[5] WANG Yongquan, SU Mengqi, SHI Qinglei, MA Yining, SUN Yangfan, WANG Changmiao, WANG Guoyou, XI Xiaoming, YIN Yilong, WAN Xiang. Research Progress of Machine Learning in Diagnosis and Treatment of Esophageal Cancer [J]. Computer Science, 2025, 52(9): 4-15.
[6] DENG Hong, CHEN Yan, YANG Hongbo, ZHAO Feng, JIANG Yongzhuo, GUO Tao, WANG Weilian. Application of End-to-End Convolutional Kolmogorov-Arnold Networks in Atrial Fibrillation Heart Sound Recognition [J]. Computer Science, 2025, 52(9): 62-70.
[7] HU Hailong, XU Xiangwei, LI Yaqian. Drug Combination Recommendation Model Based on Dynamic Disease Modeling [J]. Computer Science, 2025, 52(9): 96-105.
[8] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[9] ZENG Xinran, LI Tianrui, LI Chongshou. Active Learning for Point Cloud Semantic Segmentation Based on Dynamic Balance and DistanceSuppression [J]. Computer Science, 2025, 52(8): 180-187.
[10] DING Zhengze, NIE Rencan, LI Jintao, SU Huaping, XU Hang. MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer [J]. Computer Science, 2025, 52(8): 188-194.
[11] WANG Fengling, WEI Aimin, PANG Xiongwen, LI Zhi, XIE Jingming. Video Super-resolution Model Based on Implicit Alignment [J]. Computer Science, 2025, 52(8): 232-239.
[12] YANG Feixia, LI Zheng, MA Fei. Research on Hyperspectral Image Super-resolution Methods Based on Tensor Ring SubspaceSmoothing and Graph Regularization [J]. Computer Science, 2025, 52(8): 240-250.
[13] LI Mengxi, GAO Xindan, LI Xue. Two-way Feature Augmentation Graph Convolution Networks Algorithm [J]. Computer Science, 2025, 52(7): 127-134.
[14] ZHUANG Jianjun, WAN Li. SCF U2-Net:Lightweight U2-Net Improved Method for Breast Ultrasound Lesion SegmentationCombined with Fuzzy Logic [J]. Computer Science, 2025, 52(7): 161-169.
[15] ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!