计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 142-150.doi: 10.11896/jsjkx.240600033

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多模态特征对齐的弱对齐RGBT显著目标检测

刘成壮1, 翟素兰1, 刘海庆2, 王鲲鹏3   

  1. 1 安徽大学数学科学学院 合肥 230601
    2 中国科学院合肥物质科学研究院 合肥 230031
    3 安徽大学计算机科学与技术学院 合肥 230601
  • 收稿日期:2024-06-04 修回日期:2024-10-14 发布日期:2025-07-17
  • 通讯作者: 翟素兰(01044@ahu.edu.cn)
  • 作者简介:(chengzhuangliu@163.com)
  • 基金资助:
    国家自然科学基金面上项目(62376005);安徽省高校协同创新项目(GXXT-2022-014);安徽大学数学学院开放课题(KF2019A03)

Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment

LIU Chengzhuang1, ZHAI Sulan1, LIU Haiqing2, WANG Kunpeng3   

  1. 1 School of Mathematics Sciences, Anhui University, Hefei 230601, China
    2 Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
    3 School of Computer Science and Technology, Anhui University, Hefei 230601, China
  • Received:2024-06-04 Revised:2024-10-14 Published:2025-07-17
  • About author:LIU Chengzhuang,born in 1999,postgraduate.His main research interests include computer vision and pattern recognition.
    ZHAI Sulan,born in 1977,Ph.D,asso-ciate professor.Her main research in-terests include computer vision and pattern recognition.
  • Supported by:
    Surface Project of National Natural Science Foundation of China(62376005),Anhui Province Higher Education Synergy Innovation Project(GXXT-2022-014) and Open Project of the School of Mathematics,Anhui University(KF2019A03).

摘要: 可见光和热红外(RGBT)显著目标检测(SOD)旨在从可见光和热红外图像中识别共同的显著物体。然而,现有技术大多在完全对齐的图像对上进行训练,忽略了实际成像过程中由传感器差异造成的“弱对齐”问题,即同一物体在不同模态中虽然结构相关,但是它们的位置、尺度存在差异。因此,如果不经对齐处理,直接使用弱对齐RGBT图像训练模型,会导致检测性能严重下降。为应对这一挑战,提出了一个多模态特征对齐融合网络(AFNet),专门针对弱对齐RGBT SOD。该网络由3个主要模块组成:分布对齐模块(DAM)、注意力引导的可变形卷积对齐模块(AGDCM)和交叉融合模块(CAM)。DAM基于最优传输理论,使热红外和RGB特征的分布尽可能接近,实现特征的初步对齐。AGDCM基于可变形卷积,在学习特征偏移量的过程中引入注意力权重,使不同的区域可以学习到适合自身的偏移量,实现多模态特征的精准对齐。CAM通过交叉注意力机制融合对齐后的特征,增强融合特征的判别能力并提高计算效率。通过在对齐和弱对齐数据集上进行大量实验,证明了所提方法的高效性。

关键词: 弱对齐RGBT图像, 显著目标检测, 多模态特征对齐, 多模态特征融合, 注意力机制

Abstract: Visible and thermal(RGBT) salient object detection(SOD) aims to identify common salient objects from RGB and thermal infrared images.However,existing methods are predominantly trained on well-aligned image pairs,overlooking the “weak alignment” issue caused by sensor discrepancies during actual imaging.This issue refers to the fact that the same object,while structurally relates in different modalities,exhibits differences in position and scale.Therefore,training models with weakly-aligned RGBT images without alignment processing will lead to a significant reduction in detection performance.To address this challenge,a multi-modal alignment and fusion network(AFNet) is specifically designed for weakly-aligned RGBT SOD.The network comprises three main modules:the distribution alignment module(DAM),the attention-guided deformable convolution alignment module(AGDCM),and the cross-attention fusion module(CAM).DAM is based on optimal transport theory,aims to make the distribution of thermal infrared and RGB features as close as possible,achieving initial feature alignment.AGDCM utilizes deformable convolution and incorporates attention weights in the process of learning feature offsets,allowing different regions to learn suitable offsets for themselves,thereby achieving precise multi-modal feature alignment.CAM employs a cross-attention mechanism to fuse the aligned features,enhancing the discriminative capability of the fused features and improving computational efficiency.Extensive experiments on both aligned and weakly-aligned datasets demonstrate the effectiveness of the proposed method.

Key words: Weakly-aligned RGBT image, Salient object detection, Multi-modal feature alignment, Multi-modal feature fusion, Attention mechanism

中图分类号: 

  • TP181
[1]LI Y,YANG X L,ZHANG L,et al.Combined Road Segmentation and Contour Extraction for Remote Sensing Images Based on Cascaded U-Net[J].Computer Science,2024,51(3):174-182.
[2]ZHANG Z H,WNAG J,ZANG Z L,et al.Review and Analysis of RGBT Single Object Tracking Methods:A Fusion Perspective[C]//ACM Transactions on Multimedia Computing,Communications and Applications.New York:ACM,2024:1551-6857.
[3]LU Y H,CHEN L Q,WANG Y,et al.Efficient EncryptedImage Content Retrieval System B-ased on SecureCNN[J].Computer Science,2023,50(9):26-34.
[4]TANG B,LIU Z Y,TAN Y C,et al.HRTransNet:HRFormer-driven Two-modality Salient Object Detection[J].IEEE Tran-sactions on Circuits and Systems for Video Technology,2022,33(2):728-742.
[5]TU Z Z,LI Z,LI C L,et al.Weakly Alignment-free RGBT Sal-ient Object Detection with Deep Correlation Network[J].IEEE Transactions on Image Processing,2022,31:3752-3764.
[6]KHAMIS A,TSUCHIDA R,TAREK M,et al.Scalable Optimal Transport Methods in Machine Learning:A Contemporary Survey[J].arXiv:2305.05080,2024.
[7]DAI J F,QI H Z,XIONG Y W,et al.Deformable Convolutional Networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2017:764-773.
[8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Red Hook,Curran Associates Inc.,2017:6000-6010.
[9]SHEN Z R,ZHANG M Y,ZHAO H Y,et al.Efficient Attention:Attention with Linear Complexities[C]//Proceedings of the IEEE/ CVF Winter Conference on Applications of Computer Vision.2021:3531-3539.
[10]WANG G Z,LI C L,MA Y P,et al.RGB-T Saliency Detection Benchmark:Dataset,Baselines,Analysis and A Novel Approach[C]//Proceedings of 13th Conference on Image and Graphics Technologies and Applications.Beijing:Springer,2018:359-369.
[11]TU Z Z,XIA T,LI C L,et al.RGB-T Image Saliency Detection Via Collaborative Graph Learning[J].IEEE Transactions on Multimedia,2019,22(1):160-173.
[12]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[13]TU Z Z,MA Y,LI Z,et al.RGBT Salient Object Detection:A Large-scale Dataset and Ben-chmark[J].IEEE Transactions on Multi-media,2023,25:4163-4176.
[14]ZHANG Q,XIAO T L,HUANG N C,et al.Revisiting Feature Fusion for RGB-T Salient Object Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(5):1804-1818.
[15]LIAO G L,GAO W,LI G,et al.Cross-collaborative Fusion-encoder Network for Robust RGB-thermal Salient Object Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(11):7646-7661.
[16]LIU Z Y,TAN Y C,QIAN H,et al.SwinNet:Swin Transformer Drives Edge-aware RGB-D and RGB-T Salient Object Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(7):4486-4497.
[17]LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//Proceedings of the IEEE /CVF International Conference on Computer Vision.2021:10012-10022.
[18]PANG Y W,ZHAO X Q,ZHANG L H,et al.Caver:Cross-modal View-mixed Transformer for Bimodal Salient Object Detection[J].IEEE Transactions on Image Processing,2023,32:892-904.
[19]ZHANG Z H,WANG J,HAN Y H,et al.Saliency Prototype for RGB-D and RGB-T Salient Object Detection[C]//ACM Tran-sactions on Multimedia Computing,Communications and Applications.2023:3696-3705.
[20]YE F,BORS A G.Continual Variational Autoencoder Learning Via Online Cooperative Memorization[C]//Proceedings of European Conference on Computer Vision.2022:531-549.
[21]DAI F,ZHANG S B,LIU H,et al.Global Boundary Refinement for Semantic Segmentation Via Optimal Transport[C]//Proceedings of Pacific Rim International Conference on Artificial Intelligence.2022:452-465.
[22]GE Z,LIU S T,LI Z M,et al.Ota:Optimal Transport Assignment for Object Detection[C]//Proceedings of the IEEE /CVF Conference on Computer Vision and Pattern Recognition.2021:303-312.
[23]ZHU W X,ZHAO C H,FENG S,et al.Multilevel FeatureAlignment Based on Spatial Attention Deformable Convolution for Cross-scene Hyperspectral Image Classification[J].IEEE Geoscience and Remote Sensing Letters,2022,19:1-5.
[24]HUANG S H,LU Z C,CHENG R,et al.Fapn:Feature-aligned Pyramid Network for Dense Image Prediction[C]//Proceedings of the IEEE/ CVF International Conference on Computer Vision.2021:864-873.
[25]LUO Z W,YU L,MO X,et al.Ebsr:Feature Enhanced Burst Super Resolution with Deformable Alignment[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2021:71-478.
[26]KNIGHT P A.The Sinkhorn-Knopp Algorithm:Convergenceand Applications[J].SIAM Journal on Matrix Analysis and Applications,2008,30(1):261-275.
[27]GODARD C,MAC A O,BROSTOW G J.Unsupervised Monocular Depth Estimation with Left-right Consistency[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2017:270-279.
[28]MILLERTARI F,NAVAB N,AHMADI S A.V-net:Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation[C]//Proceedings of 2016 4th International Conference on 3D Vision(3DV).2016:65-571.
[29]ACHANTA R,HEMAMI S,ESTRADA F,et al.Frequency-tuned Salient Region Detection[C]//Proceedings of 2009 IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2009:1597-1604.
[30]FAN D P,GONG C,CAO Y,et al.Enhanced-alignment measure for binary foreground map evaluation[C]/Proceedings of the 27th International Joint Conference on Artificial Intelligence.Stockholm:AAAI Press,2018:698-704.
[31]FAN D P,CHENG M M,LIU Y,et al.Structure-measure:ANew Way to Evaluate Foregr-ound Maps[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2017:4548-4557.
[32]PERAZZI F,KRÄHENBÜHL P,PRITCH Y,et al.SaliencyFilters:Contrast Based Filtering for Salient Region Detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.2012:733-740.
[33]MA S,SONG K C,DONG H W,et al.Modal Complementary Fusion Network for RGB-T Salient Object Detection[J].Applied Intelligence,2023,53(8):9038-9055.
[34]ZHOU W J,ZHU Y,LEI J S,et al.LSNet:Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-thermal Images[J].IEEE Transactions on Image Processing,2023,32:1329-1340.
[35]CONG R M,ZHANG K P,ZHANG C,et al.Does Thermal Really Always Matter for RGB-T Salient Object Detection?[J].IEEE Transactions on Multimedia,2022,25:6971-6982.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!