计算机科学 ›› 2026, Vol. 53 ›› Issue (2): 253-263.doi: 10.11896/jsjkx.250100123

• 计算机图形学&多媒体 • 上一篇    下一篇

语义引导的红外与可见光图像混合交叉特征融合方法

季赛1,2, 乔礼维1, 孙亚杰1   

  1. 1 南京信息工程大学计算机学院、网络空间安全学院 南京 210044
    2 泰州学院信息工程学院 江苏 泰州 225300
  • 收稿日期:2025-01-20 修回日期:2025-05-23 发布日期:2026-02-10
  • 通讯作者: 孙亚杰(syj@nuist.edu.cn)
  • 作者简介:(jisai@nuist.edu.cn)
  • 基金资助:
    国家自然科学基金(62172292)

Semantic-guided Hybrid Cross-feature Fusion Method for Infrared and Visible Light Images

JI Sai1,2, QIAO Liwei1, SUN Yajie1   

  1. 1 College of Computer Science,Cyber Science and Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China
    2 School of Information Engineering,Taizhou University,Taizhou,Jiangsu 225300,China
  • Received:2025-01-20 Revised:2025-05-23 Online:2026-02-10
  • About author:JI Sai,born in 1976,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.32528M).His main research interests include image processing and wireless sensor networks.
    SUN Yajie,born in 1980,Ph.D,professor.Her main research interests include information processing,sensors and test systems.
  • Supported by:
    National Natural Science Foundation of China(62172292).

摘要: 对于自编码器图像融合算法难以突出红外显著目标,现有融合策略难以同时考虑全局结构与局部细节信息,以及大多数融合算法过度关注统计指标,而忽视了高级视觉任务的支持需求的问题,提出了一种基于语义分割网络引导的图像融合方法,并设计了混合交叉特征机制作为融合策略。首先,在编码器和解码器之间引入浅层和深层的跳跃连接,通过最大值选择策略融合特征,以突出显著目标并减少冗余信息。其次,融合策略采用混合交叉特征机制,在单一框架内通过交叉注意力和卷积操作融合不同模态特征,来整合全局上下文与局部细粒度信息。最后,将生成的融合图像输入到分割网络中,利用语义损失引导高级语义信息回流至融合网络,以生成具有丰富语义信息的融合图像。结果表明,所提方法在RoadScene数据集的SD,MI,VIFF,Qabf和AG等客观评价指标上,相较于7种对比算法分别平均提高了33.93%,112.81%,49.89%,27.64%,23.87%。在MSRS数据集的语义分割任务中,该方法在car,person和bicycle这3个类别上交并比超越了7种先进算法,分别平均提高了3.47%,6.37%和9.57%。

关键词: 图像融合, 红外与可见光图像, 交叉注意力机制, 卷积, 语义分割

Abstract: To address the difficulty of self-encoder image fusion algorithms in highlighting infrared(IR) salient targets and the challenge of simultaneously considering global structure and local detail information in existing fusion strategies-while most algorithms overly prioritize statistical metrics and overlook support for advanced visual tasks-a semantic segmentation-guided image fusion method with a hybrid cross-feature mechanism is proposed.Shallow and deep skip connections are introduced between the encoder and decoder,employing a maximum value selection strategy to emphasize salient targets and reduce redundancy.The fusion strategy integrates global context and local fine-grained information through cross-attention and convolutional operations,combining different modal features within a single frame.The fused image is then fed into a segmentation network,where semantic loss guides high-level semantic information back to the fusion network,enabling the generation of a fused image rich in semantic detail.Experimental results demonstrate that the proposed method achieves average improvements of 33.93%,112.81%,49.89%,27.64%,and 23.87% in SD,MI,VIFF,Qabf,and AG metrics on the RoadScene dataset compared to seven baseline algorithms.Additionally,the intersection and concurrency ratios for car,person,and bicycle categories in the semantic segmentation task on the MSRS dataset increase by 3.47%,6.37%,and 9.57% on average,outperforming other state-of-the-art methods.

Key words: Image fusion, Infrared and visible image, Cross attention mechanism, Convolution, Semantic segmentation

中图分类号: 

  • TP391.4
[1]LIU Y,WANG L,CHENG J,et al.Multi-focus image fusion:A survey of the state of the art[J].Information Fusion,2020,64:71-91.
[2]JIANG S,WANG P L,DENG Z J,et al.Image fusion algorithm for traffic accident rescue based on deep learning[J]Journal of Jilin University(Engineering and Technology Edition),2023,53(12):3472-3480.
[3]ZHOU H,WU W,ZHANG Y,et al.Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network[J].IEEE Transactions on Multimedia,2023,25:635-648.
[4]JIAN L,RAYHANA R,MA L,et al.Infrared and visible image fusion based on deep decomposition network and saliency analysis[J].IEEE Transactions on Multimedia,2022,24:3314-3326.
[5]HA Q,WATANABE K,KARASAWA T,et al.Mfnet:Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes[C]//Proceedings of the IEEE Internatio-nal Conference Intelligent Robots and Systems.2017:5108-5115.
[6]ZHANG X,YE P,LEUNG H,et al.Object fusion trackingbased on visible and infrared images:A comprehensive review[J].Information Fusion,2020,63:166-187.
[7]ZHOU T,LI Q,LU H,et al.GAN review:Models and medical image fusion applications[J].Information Fusion,2023,91:134-148.
[8]MA M,MA W,JIAO L,et al.A multimodal hyper-fusion transformer for remote sensing image classification[J].Information Fusion,2023,96:66-79.
[9]BAVIRISETTI D P,XIAO G,LIU G.Multi-sensor image fusionbased on fourth order partial differential equations[C]//Proceedings of the International Conference on Information Fusion.2017:1-9.
[10]LI G,LIN Y,QU X.An infrared and visible image fusion me-thod based on multi-scale transformation and norm optimization[J].Information Fusion,2021,71:109-129.
[11]XU H,ZHANG H,MA J.Classification saliency-based rule for visible and infrared image fusion[J].IEEE Transactions on Computer Imaging,2021,7(7):824-836.
[12]ZHANG Q,FU Y,LI H,et al.Dictionary learning method for joint sparse representation-based image fusion[J].Optical Enginneering,2013,52(5):057006.
[13]ZHOU Z,WANG B,LI S,et al.Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with gaussian and bilateral filters[J].Information Fusion,2016,30:15-26.
[14]ZHAO F,ZHAO W,YAO L,et al.Self-supervised feature adaption for infrared and visible image fusion[J].Information Fusion,2021,76:189-203.
[15]LI H,WU X J.CrossFuse:A novel cross attention mechanism based infrared and visible image fusion approach[J].Information Fusion,2023,96:66-79.
[16]LIU J,FAN X,HUANG Z,et al.Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2022:5802-5811.
[17]TANG W,HE F,LIU Y.YDTR:Infrared and visible image fusion via Y-shape dynamic transformer[J].IEEE Transactions on Multimedia,2022,25:5413-5428.
[18]PRABHAKAR K R,SRIKAR V S,BABU R V.DeepFuse:A deep unsupervised approach for exposure fusion with extreme exposure image pairs[C]//Proceedings of IEEE Internation Conference on Computer Vision(ICCV).2017:4724-4732.
[19]LI H,WU X.Densefuse:A fusion approach to infrared and visible images[J].IEEE Transactions on Image Processing,2019,28:2614-2623.
[20]LI H,WU X,KITTLER J.RFN-NEST:An end-to-end residual fusion network for infrared and visible images[J].Information Fusion,2021,73:72-86.
[21]WANG Z,WANG J,WU Y,et al.UNFusion:A unified multi-scale densely connected network for infrared and visible image fusion[J].IEEE Transsctions on Circuits Systems and Video Technology,2021,32(6):3360-3374.
[22]MA J,TANG L,FAN F,et al.Swinfusion:Cross-domain long-range learning for general image fusion via Swin transformer[J].IEEE/CAA Journal of Automatica Sinica,2022,9(7):1200-1217.
[23]PENG C,TIAN T,CHEN C,et al.Bilateral attention decoder:A lightweight decoder for real-time semantic segmentation[J].Neural Networks,2021,137:188-199.
[24]CORDTS M,OMRAN M,RAMOS S,et al.The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2016:3213-3223.
[25]WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE TIP,2004,13(4):600-612.
[26]TANG L,YUAN J,ZHANG H,et al.PIAfusion:A progressive infrared and visible image fusion network based on illumination aware[J].Information Fusion,2022,83/84:79-92.
[27]XU H,MA J,LE Z,et al.FusionDN:A unified densely connec-ted network for image fusion[C]//Proceedings of the AAAI Conference on Aritificial Intelligence.AAAI,2020:12484-12491.
[28]TOET A,HOGERVORST M A.Progress in color night vision[J].Optical Engineering,2012,51(1):1-20.
[29]ZHANG X,DEMIRIS Y.Visible and infrared image fusion using deep learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(8):10535-10554.
[30]TANG L,YUAN J,MA J.Image fusion in the loop of high-level vision tasks:A semantic-aware real-time infrared and visible image fusion network[J].Information Fusion,2022,82:28-42.
[31]WANG D,LIU J,LIU R,FAN X.An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection[J].Information Fusion,2023,98:101828.
[32]ZHAO Z,BAI H,ZHANG J,et al.Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion[C]//Proceedings of the IEEE/CVF ConfErence on Computer Vision and Pattern Recognition.CVPR.2023:5906-5916.
[33]CHEN J,DING J,MA J.HitFusion:Infrared and Visible Image Fusion for High-Level Vision Tasks Using Transformer[J].IEEE Transactions on Multimedia,2024,26:10145-10159.
[34]TANG L,DENG Y,MA Y,et al.SuperFusion:A VersatileImage Registration and Fusion Network with Semantic Awareness[J].IEEE/CAA Journal of Automatica Sinica,2022,9(12):2121-2137.
[35]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//Computer vision-ECCV 2014.Springer,2014:740-755.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!