计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 142-150.doi: 10.11896/jsjkx.240600033
刘成壮1, 翟素兰1, 刘海庆2, 王鲲鹏3
LIU Chengzhuang1, ZHAI Sulan1, LIU Haiqing2, WANG Kunpeng3
摘要: 可见光和热红外(RGBT)显著目标检测(SOD)旨在从可见光和热红外图像中识别共同的显著物体。然而,现有技术大多在完全对齐的图像对上进行训练,忽略了实际成像过程中由传感器差异造成的“弱对齐”问题,即同一物体在不同模态中虽然结构相关,但是它们的位置、尺度存在差异。因此,如果不经对齐处理,直接使用弱对齐RGBT图像训练模型,会导致检测性能严重下降。为应对这一挑战,提出了一个多模态特征对齐融合网络(AFNet),专门针对弱对齐RGBT SOD。该网络由3个主要模块组成:分布对齐模块(DAM)、注意力引导的可变形卷积对齐模块(AGDCM)和交叉融合模块(CAM)。DAM基于最优传输理论,使热红外和RGB特征的分布尽可能接近,实现特征的初步对齐。AGDCM基于可变形卷积,在学习特征偏移量的过程中引入注意力权重,使不同的区域可以学习到适合自身的偏移量,实现多模态特征的精准对齐。CAM通过交叉注意力机制融合对齐后的特征,增强融合特征的判别能力并提高计算效率。通过在对齐和弱对齐数据集上进行大量实验,证明了所提方法的高效性。
中图分类号:
[1]LI Y,YANG X L,ZHANG L,et al.Combined Road Segmentation and Contour Extraction for Remote Sensing Images Based on Cascaded U-Net[J].Computer Science,2024,51(3):174-182. [2]ZHANG Z H,WNAG J,ZANG Z L,et al.Review and Analysis of RGBT Single Object Tracking Methods:A Fusion Perspective[C]//ACM Transactions on Multimedia Computing,Communications and Applications.New York:ACM,2024:1551-6857. [3]LU Y H,CHEN L Q,WANG Y,et al.Efficient EncryptedImage Content Retrieval System B-ased on SecureCNN[J].Computer Science,2023,50(9):26-34. [4]TANG B,LIU Z Y,TAN Y C,et al.HRTransNet:HRFormer-driven Two-modality Salient Object Detection[J].IEEE Tran-sactions on Circuits and Systems for Video Technology,2022,33(2):728-742. [5]TU Z Z,LI Z,LI C L,et al.Weakly Alignment-free RGBT Sal-ient Object Detection with Deep Correlation Network[J].IEEE Transactions on Image Processing,2022,31:3752-3764. [6]KHAMIS A,TSUCHIDA R,TAREK M,et al.Scalable Optimal Transport Methods in Machine Learning:A Contemporary Survey[J].arXiv:2305.05080,2024. [7]DAI J F,QI H Z,XIONG Y W,et al.Deformable Convolutional Networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2017:764-773. [8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Red Hook,Curran Associates Inc.,2017:6000-6010. [9]SHEN Z R,ZHANG M Y,ZHAO H Y,et al.Efficient Attention:Attention with Linear Complexities[C]//Proceedings of the IEEE/ CVF Winter Conference on Applications of Computer Vision.2021:3531-3539. [10]WANG G Z,LI C L,MA Y P,et al.RGB-T Saliency Detection Benchmark:Dataset,Baselines,Analysis and A Novel Approach[C]//Proceedings of 13th Conference on Image and Graphics Technologies and Applications.Beijing:Springer,2018:359-369. [11]TU Z Z,XIA T,LI C L,et al.RGB-T Image Saliency Detection Via Collaborative Graph Learning[J].IEEE Transactions on Multimedia,2019,22(1):160-173. [12]HE K M,ZHANG X Y,REN S Q,et al.Deep Residual Learning for Image Recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [13]TU Z Z,MA Y,LI Z,et al.RGBT Salient Object Detection:A Large-scale Dataset and Ben-chmark[J].IEEE Transactions on Multi-media,2023,25:4163-4176. [14]ZHANG Q,XIAO T L,HUANG N C,et al.Revisiting Feature Fusion for RGB-T Salient Object Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(5):1804-1818. [15]LIAO G L,GAO W,LI G,et al.Cross-collaborative Fusion-encoder Network for Robust RGB-thermal Salient Object Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(11):7646-7661. [16]LIU Z Y,TAN Y C,QIAN H,et al.SwinNet:Swin Transformer Drives Edge-aware RGB-D and RGB-T Salient Object Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(7):4486-4497. [17]LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:Hierarchical Vision Transformer Using Shifted Windows[C]//Proceedings of the IEEE /CVF International Conference on Computer Vision.2021:10012-10022. [18]PANG Y W,ZHAO X Q,ZHANG L H,et al.Caver:Cross-modal View-mixed Transformer for Bimodal Salient Object Detection[J].IEEE Transactions on Image Processing,2023,32:892-904. [19]ZHANG Z H,WANG J,HAN Y H,et al.Saliency Prototype for RGB-D and RGB-T Salient Object Detection[C]//ACM Tran-sactions on Multimedia Computing,Communications and Applications.2023:3696-3705. [20]YE F,BORS A G.Continual Variational Autoencoder Learning Via Online Cooperative Memorization[C]//Proceedings of European Conference on Computer Vision.2022:531-549. [21]DAI F,ZHANG S B,LIU H,et al.Global Boundary Refinement for Semantic Segmentation Via Optimal Transport[C]//Proceedings of Pacific Rim International Conference on Artificial Intelligence.2022:452-465. [22]GE Z,LIU S T,LI Z M,et al.Ota:Optimal Transport Assignment for Object Detection[C]//Proceedings of the IEEE /CVF Conference on Computer Vision and Pattern Recognition.2021:303-312. [23]ZHU W X,ZHAO C H,FENG S,et al.Multilevel FeatureAlignment Based on Spatial Attention Deformable Convolution for Cross-scene Hyperspectral Image Classification[J].IEEE Geoscience and Remote Sensing Letters,2022,19:1-5. [24]HUANG S H,LU Z C,CHENG R,et al.Fapn:Feature-aligned Pyramid Network for Dense Image Prediction[C]//Proceedings of the IEEE/ CVF International Conference on Computer Vision.2021:864-873. [25]LUO Z W,YU L,MO X,et al.Ebsr:Feature Enhanced Burst Super Resolution with Deformable Alignment[C]//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2021:71-478. [26]KNIGHT P A.The Sinkhorn-Knopp Algorithm:Convergenceand Applications[J].SIAM Journal on Matrix Analysis and Applications,2008,30(1):261-275. [27]GODARD C,MAC A O,BROSTOW G J.Unsupervised Monocular Depth Estimation with Left-right Consistency[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2017:270-279. [28]MILLERTARI F,NAVAB N,AHMADI S A.V-net:Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation[C]//Proceedings of 2016 4th International Conference on 3D Vision(3DV).2016:65-571. [29]ACHANTA R,HEMAMI S,ESTRADA F,et al.Frequency-tuned Salient Region Detection[C]//Proceedings of 2009 IEEE/ CVF Conference on Computer Vision and Pattern Recognition.2009:1597-1604. [30]FAN D P,GONG C,CAO Y,et al.Enhanced-alignment measure for binary foreground map evaluation[C]/Proceedings of the 27th International Joint Conference on Artificial Intelligence.Stockholm:AAAI Press,2018:698-704. [31]FAN D P,CHENG M M,LIU Y,et al.Structure-measure:ANew Way to Evaluate Foregr-ound Maps[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2017:4548-4557. [32]PERAZZI F,KRÄHENBÜHL P,PRITCH Y,et al.SaliencyFilters:Contrast Based Filtering for Salient Region Detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.2012:733-740. [33]MA S,SONG K C,DONG H W,et al.Modal Complementary Fusion Network for RGB-T Salient Object Detection[J].Applied Intelligence,2023,53(8):9038-9055. [34]ZHOU W J,ZHU Y,LEI J S,et al.LSNet:Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-thermal Images[J].IEEE Transactions on Image Processing,2023,32:1329-1340. [35]CONG R M,ZHANG K P,ZHANG C,et al.Does Thermal Really Always Matter for RGB-T Salient Object Detection?[J].IEEE Transactions on Multimedia,2022,25:6971-6982. |
|