MTFuse:基于Mamba和Transformer的红外与可见光图像融合网络

doi:10.11896/jsjkx.240600106

Abstract

Abstract: Infrared and visible image fusion aims to retain the thermal radiation information from infrared images and the texture details from visible images to represent the imaging scene and comprehensively promote downstream visual tasks.Fusion models based on convolutional neural networks(CNNs) encounter limitations in capturing global image features due to their focus on local convolutional operations.Although Transformer-based models excel in global feature modeling,they also face computational challenges posed by quadratic complexity.Recently,the selective structured state-space model(Mamba) has shown great potential in modeling long-range dependencies with linear complexity,providing a promising path to address the aforementioned issues.To efficiently model long-range dependencies in images,this paper designs a residual selective structured state space module(RMB) for extracting global features.Simultaneously,to model the relationship between multimodal images,a cross-modal query fusion attention module(CQAM) is designed for adaptive feature fusion.Furthermore,a loss function consisting of two terms,including gradient loss and brightness loss,is designed to train the proposed model in an unsupervised manner.Comparative experiments on fusion quality and efficiency with numerous other state-of-the-art methods and ablation studies demonstrate the effectiveness of the proposed MTFuse method.

Key words: Selective structured state space model, Transformer, Unsupervised learning, Infrared and visible image fusion

CLC Number:

TP391

DING Zhengze, NIE Rencan, LI Jintao, SU Huaping, XU Hang. MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer[J].Computer Science, 2025, 52(8): 188-194.

References

[1]CHEN H,DENG L,ZHU L,et al.ECFuse:Edge-Consistent and Correlation-Driven Fusion Framework for Infrared and Visible Image Fusion [J].Sensors,2023,23(19):8071.
[2]KAUR H,KOUNDAL D,KADYAN V.Image fusion tech-niques:a survey [J].Archives of Computational Methods in Engineering,2021,28(7):4425-4447.
[3]ZHAO W,XIE S,ZHAO F,et al.Metafusion:Infrared and visible image fusion via meta-feature embedding from object detection[C]//Proceeding of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2023.
[4]ZHAO Z,XU S,ZHANG J,et al.Efficient and model-based infrared and visible image fusion via algorithm unrolling [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(3):1186-1196.
[5]MA J,MA Y,LI C.Infrared and visible image fusion methods and applications:A survey [J].Information Fusion,2019,45:153-178.
[6]TANG W,LIU Y,CHENG J,et al.A phase congruency-based green fluorescent protein and phase contrast image fusion me-thod in nonsubsampled shearlet transform domain [J].Microscopy Research and Technique,2020,83(10):1225-1234.
[7]ZHANG Q,LIU Y,BLUM R S,et al.Sparse representationbased multi-sensor image fusion for multi-focus and multi-modality images:A review [J].Information Fusion,2018,40:57-75.
[8]KONG W,LEI Y,ZHAO H.Adaptive fusion method of visible light and infrared images based on non-subsampled shearlet transform and fast non-negative matrix factorization [J].Infrared Physics & Technology,2014,67:161-172.
[9]MA J,TANG L,FAN F,et al.SwinFusion:Cross-domain long-range learning for general image fusion via swin transformer [J].IEEE/CAA Journal of Automatica Sinica,2022,9(7):1200-1217.
[10]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [C]//Advances in Neural Information Processing Systems.2017.
[11]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale [J].arXiv:2010.11929,2020.
[12]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021.
[13]ZAMIR S W,ARORA A,KHAN S,et al.Restormer:Efficient transformer for high-resolution image restoration[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022.
[14]LU J,BATRA D,PARIKH D,et al.Vilbert:Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks [C]//Advances in Neural Information Processing systems.2019.
[15]SUN Y,DONG L,HUANG S,et al.Retentive network:A successor to transformer for large language models [J].arXiv:2307.08621,2023.
[16]GU A,DAO T.Mamba:Linear-time sequence modeling with selective state spaces [J].arXiv:2312.00752,2023.
[17]LIU Y,TIAN Y,ZHAO Y,et al.Vmamba:Visual state space model [J].arXiv:2401.10166,2024.
[18]HAMILTON J D.State-space models [J].Handbook of Econometrics,1994,4:3039-3080.
[19]ZHAO D,SHU X,ZHANG L,et al.Sensor interrogation technique using chirped fibre grating based Sagnac loop [J].Electronics Letters,2002,38(7):312-313.
[20]HAN Y,CAI Y,CAO Y,et al.A new image fusion performance metric based on visual information fidelity [J].Information Fusion,2013,14(2):127-135.
[21]XYDEAS C S,PETROVIC V.Objective image fusion perfor-mance measure [J].Electronics Letters,2000,36(4):308-309.
[22]WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity [J].IEEE Transactions on Image Processing,2004,13(4):600-612.
[23]ESKICIOGLU A M,FISHER P S.Image quality measures and their performance [J].IEEE Transactions on Communications,1995,43(12):2959-2965.
[24]CUI G,FENG H,XU Z,et al.Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition [J].Optics Communications,2015,341:199-209.
[25]JAGALINGAM P,HEGDE A V.A review of quality metrics for fused image [J].Aquatic Procedia,2015,4:133-142.
[26]ZHAO Z,XU S,ZHANG C,et al.DIDFuse:Deep image decomposition for infrared and visible image fusion [J].arXiv:2003.09210,2020.
[27]XU H,MA J,JIANG J,et al.U2Fusion:A unified unsupervised image fusion network [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(1):502-518.
[28]MA J,ZHANG H,SHAO Z,et al.GANMcC:A generative adversarial network with multiclassification constraints for infrared and visible image fusion [J].IEEE Transactions on Instrumentation and Measurement,2020,70:1-14.
[29]ZHANG H,MA J.SDNet:A versatile squeeze-and-decomposi-tion network for real-time image fusion [J].International Journal of Computer Vision,2021,129(10):2761-2785.
[30]TANG W,HE F,LIU Y.YDTR:Infrared and visible image fusion via Y-shape dynamic transformer [J].IEEE Transactions on Multimedia,2022,25:5413-5428.
[31]LIU J,FAN X,HUANG Z,et al.Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022.
[32]LIANG P,JIANG J,LIU X,et al.Fusion from decomposition:A self-supervised decomposition approach for image fusion[C]//European Conference on Computer Vision.Springer,2022.
[33]HUANG Z,LIU J,FAN X,et al.Reconet:Recurrent correction network for fast and efficient multi-modality image fusion[C]//European Conference on Computer Vision.Springer,2022.
[34]TANG W,HE F,LIU Y,et al.DATFuse:Infrared and visible image fusion via dual attention transformer [J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(7):3159-3172.
[35]LI H,XU T,WU X J,et al.LRRNet:A novel representationlearning guided fusion network for infrared and visible images [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(9):11040-11052.

Related Articles 15

[1]	JIANG Rui, FAN Shuwen, WANG Xiaoming, XU Youyun. Clustering Algorithm Based on Improved SOM Model [J]. Computer Science, 2025, 52(8): 162-170.
[2]	LIU Huayong, XU Minghui. Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss [J]. Computer Science, 2025, 52(8): 204-213.
[3]	LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[4]	HUANG Xingyu, WANG Lihui, TANG Kun, CHENG Xinyu, ZHANG Jian, YE Chen. EFormer:Efficient Transformer for Medical Image Registration Based on Frequency Division and Board Attention [J]. Computer Science, 2025, 52(7): 151-160.
[5]	WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[6]	LONG Xiao, HUANG Wei, HU Kai. Bi-MI ViT:Bi-directional Multi-level Interaction Vision Transformer for Lung CT ImageClassification [J]. Computer Science, 2025, 52(6A): 240700183-6.
[7]	CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[8]	PIAO Mingjie, ZHANG Dongdong, LU Hu, LI Rupeng, GE Xiaoli. Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer [J]. Computer Science, 2025, 52(6A): 240500054-10.
[9]	LI Yang, LIU Yi, LI Hao, ZHANG Gang, XU Mingfeng, HAO Chongqing. Human Pose Estimation Using Millimeter Wave Radar Based on Transformer and PointNet++ [J]. Computer Science, 2025, 52(6A): 240400169-9.
[10]	WANG Xuejian, WANG Yiheng, SUN Xinpo, LIU Chuan, JIA Ming, ZHAO Chao, YANG Chao. Extraction of Crustal Deformation Anomalies Based on Transformer-Isolation Forest [J]. Computer Science, 2025, 52(6A): 240600155-6.
[11]	CHEN Jiajun, LIU Bo, LIN Weiwei, ZHENG Jianwen, XIE Jiachen. Survey of Transformer-based Time Series Forecasting Methods [J]. Computer Science, 2025, 52(6): 96-105.
[12]	WANG Teng, XIAN Yunting, XU Hao, XIE Songqi, ZOU Quanyi. Ship License Plate Recognition Network Based on Pyramid Transformer in Transformer [J]. Computer Science, 2025, 52(6): 179-186.
[13]	CUI Kebin, HU Zhenzhen. Few-shot Insulator Defect Detection Based on Local and Global Feature Representation [J]. Computer Science, 2025, 52(6): 286-296.
[14]	HAN Daojun, LI Yunsong, ZHANG Juntao, WANG Zemin. Knowledge Graph Completion Method Fusing Entity Descriptions and Topological Structure [J]. Computer Science, 2025, 52(5): 260-269.
[15]	AN Rui, LU Jin, YANG Jingjing. Deep Clustering Method Based on Dual-branch Wavelet Convolutional Autoencoder and DataAugmentation [J]. Computer Science, 2025, 52(4): 129-137.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0