Computer Science ›› 2026, Vol. 53 ›› Issue (6): 242-251.doi: 10.11896/jsjkx.250400143

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Primitive Dynamic Weighting for Multi-modal Salient Object Detection

LI Peng1, ZHANG Zihao2, HAN Yahong2   

  1. 1 Songshan Laboratory,Zhengzhou 450001,China
    2 College of Intelligence and Computing,Tianjin University,Tianjin 300000,China
  • Received:2025-04-29 Revised:2025-07-16 Online:2026-06-15 Published:2026-06-09
  • About author:LI Peng,born in 1978,postgraduate,senior engineer.His main research interests include digital government,network security,artificial intelligence,big data mining and public security.
    ZHANG Zihao,born in 2000,Ph.D candidate.His main research interests include computer vision and machine learning.

Abstract: To address the challenge that existing multimodal salient object detection methods are easily disrupted by low-quality auxiliary modalities,leading to poor model robustness,this paper proposes a multimodal salient object detection method based on saliency primitive dynamic weighting.Specifically,the process captures the common features of different salient objects and clusters them to obtain saliency primitives,thereby achieving clear definition of salient regions.These saliency primitives are then utilized to dynamically adjust the weighting of different modalities during the fusion stage,ensuring that semantic information from high-quality modalities is fully leveraged while effectively suppressing potential interference from low-quality auxiliary modalities.In addition,a primitive-guided feature alignment mechanism is introduced to effectively reduce the semantic gap between the primary and auxiliary modalities,further enhancing the model's detection performance.This mechanism also enables the model to more accurately capture cross-modal common features,thereby improving the accuracy and stability of detection results.To validate the effectiveness of the proposed method,extensive qualitative and quantitative evaluations are conducted on six RGB-D datasets and three RGB-T datasets.Experimental results demonstrate that the proposed method exhibits strong stability and robustness even in the presence of low-quality auxiliary modalities.

Key words: Multimodal salient object detection, Saliency primitive, Fusion weight, Semantic gap, Feature alignment

CLC Number: 

  • TP391
[1]CONG R M,ZHANG C,XU M.Research Progress on RGB-D Salient Object Detection in the Deep Learning Era[J].Journal of Software,2023,34(4):1711-1731.
[2]LI J Y,LIANG Y D,LI S J,et al.Research on Depth Map Super-Resolution Reconstruction Algorithm Guided by High-Frequency Information of Color Images[J].Computer Science,2024,51(7):197-205.
[3]ZHU Y,HAO Y G,WANG H Y.Infrared Video Salient Object Detection Based on Deep Learning[J].Computer Science,2023,50(9):227-234.
[4]LEE M,PARK C,CHO S,et al.Spsn:Superpixel prototypesampling network for rgb-d salient object detection [C]//Proceedings of the European Conference on Computer Vision.2022:630-647.
[5]PANG Y,ZHAO X,ZHANG L,et al.Caver:Cross-modal view-mixed transformer for bi-modal salient object detection [J].IEEE Transactions on Image Processing,2023,32:892-904.
[6]ZHANG C,CONG R,LIN Q,et al.Cross-modality discrepant interaction network for RGB-D salient object detection [C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:2094-2102.
[7]WANG K,TU Z,LI C,et al.Learning adaptive fusion bank for multi-modal salient object detection[J].arXiv:2406.01127,2024.
[8]ZHANG W,JI G P,WANG Z,et al.Depth quality-inspired feature manipulation for efficient RGB-D salient object detection [C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:731-740.
[9]SUN F,REN P,YIN B,et al.CATNet:A Cascaded and Aggregated Transformer Network for RGB-D Salient Object Detection[J].IEEE Transactions on Multimedia,2024,26:2249-2262.
[10]WEN H,YAN C,ZHOU X,et al.Dynamic selective network for RGB-D salient object detection [J].IEEE Transactions on Image Processing,2021,30:9179-9192.
[11]JIN X,YI K,XU J.MoADNet:Mobile asymmetric dual-stream networks for realtime and lightweight RGB-D salient object detection [J].IEEE Transactions on Circuits and Systems for Vi-deo Technology,2022,32(11):7632-7645.
[12]ZHOU J,WANG L,LU H,et al.Mvsalnet:Multi-view augmentation for rgb-d salient object detection [C]//Proceedings of the European Conference on Computer Vision.2022:270-287.
[13]CHEN Q,LIU Z,ZHANG Y,et al.RGB-D salient object detection via 3D convolutional neural networks [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:1063-1071.
[14]ZENG Z,LIU H,CHEN F,et al.AirSOD:A Lightweight Network for RGB-D Salient Object Detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2024,34(3):1656-1669.
[15]ZHANG Z,WANG J,HAN Y.Saliency prototype for RGB-D and RGB-T salient object detection[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:3696-3705.
[16]LIU Z,TAN Y,HE Q,et al.SwinNet:Swin transformer drives edge-aware RGB-D and RGB-T salient object detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(7):4486-4497.
[17]LIU Z,WANG Y,TU Z,et al.TriTransNet:RGB-D salient object detection with a triplet transformer embedding network [C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:4481-4490.
[18]LIU N,ZHANG N,WAN K,et al.Visual saliency transformer [C]//Proceedings of the IEEE International Conference on Computer Vision.2021:4722-4732.
[19]WEI L,ZHU Z.Modal-aware Interaction Network for RGB-D Salient Object Detection[J].IEEE Transactions on Instrumentation and Measurement,2025(74):1-12.
[20]LI G,LIU Z,CHEN M,et al.Hierarchical alternate interaction network for RGB-D salient object detection [J].IEEE Transactions on Image Processing,2021,30:3528-3542.
[21]HUANG L,SONG K,GONG A,et al.RGB-T saliency detection vialow-rank tensor learning and unified collaborative ranking [J].IEEE Signal Processing Letters,2020,27:1585-1589.
[22]TU Z,XIA T,LI C,et al.RGB-T image saliency detection via collaborative graph learning [J].IEEE Transactions on Multimedia,2019,22(1):160-173.
[23]TU Z,XIA T,LI C,et al.M3S-NIR:Multi-modal multi-scale noise-insensitive ranking for RGB-T saliency detection [C]//Proceedings of the Multimedia Information Processing and Retrieval(MIPR).2019:141-146.
[24]TU Z,LI Z,LI C,et al.Multi-interactive dual-decoder for RGB-thermal salient object detection [J].IEEE Transactions on Image Processing,2021,30:5678-5691.
[25]HUO F,ZHU X,ZHANG L,et al.Efficient context-guidedstacked refinement network for RGB-T salient object detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(5):3111-3124.
[26]LAN X,GU X,GU X.MMNet:Multi-modal multi-stage network for RGB-T image semantic segmentation [J].Applied Intelligence,2022,52(5):5817-5829.
[27]ZHOU W,GUO Q,LEI J,et al.ECFFNet:Effective and consistent feature fusion network for RGB-T salient object detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(3):1224-1235.
[28]ZHOU W,ZHU Y,LEI J,et al.APNet:Adversarial learning assistance and perceived importance fusion network for all-day RGB-T salient object detection [J].IEEE Transactions on Emerging Topics in Computational Intelligence,2021,6(4):957- 968.
[29]LIANG Y,QIN G,SUN M,et al.Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection [J].Neurocomputing,2022,490:132-145.
[30]CHEN G,SHAO F,CHAI X,et al.CGMDRNet:Cross-guided modality difference reduction network for RGB-T salient object detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(9):6308-6323.
[31]HUO F,ZHU X,ZHANG Q,et al.Real-time one-stream semantic-guided refinement network for RGB-Thermal salient object detection [J].IEEE Transactions on Instrumentation and Measurement,2022,71:1-12.
[32]SONG K,HUANG L,GONG A,et al.Multiple Graph Affinity Interactive Network and a Variable Illumination Dataset for RGBT Image Salient Object Detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(7):3104-3118.
[33]XIE Z,SHAO F,CHEN G,et al.Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(8):4149-4163.
[34]XIA C,DUAN S,GE B,et al.HDNet:Multi-Modality Hierarchy-Aware Decision Network for RGB-D Salient Object Detection [J].IEEE Signal Processing Letters,2022,29:2577-2581.
[35]SONG K,BAO Y,WANG H,et al.A potential vision-based measurements technology:Information flow fusion detection method using RGB-thermal infrared images [J].IEEE Transactions on Instrumentation and Measurement,2023,72:1-13.
[36]ZHOU T,FU H,CHEN G,et al.Specificity-preserving RGB-D saliency detection [C]//Proceedings of the IEEE International Conference on Computer Vision.2021:4681-4691.
[37]LEE M,PARK C,CHO S,et al.Spsn:Superpixel prototypesampling network for rgb-d salient object detection [C]//Proceedings of the European Conference on Computer Vision.2022:630-647
[38]JI W,LI J,YU S,et al.Calibrated RGB-D salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2021:9471-9481.
[39]HUANG L,SONG K,WANG J,et al.Multi-graph fusion and learning for RGBT image saliency detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(3):1366-1377.
[40]WANG J,SONG K,BAO Y,et al.CGFNet:Cross-guided fusion network for RGB-T salient object detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(5):2949-2961.
[41]LIU Z,TAN Y,HE Q,et al.SwinNet:Swin transformer drives edge-aware RGB-D and RGB-T salient object detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,32(7):4486-4497.
[42]TANG B,LIU Z,TAN Y,et al.HRTransNet:HRFormer-Dri-ven Two-Modality Salient Object Detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2022,33(2):728-742.
[43]CHEN G,SHAO F,CHAI X,et al.Modality-induced transfer-fusion network for RGB-D and RGB-T salient object detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2022,33(4):1787-1801.
[44]YUAN Y,HE X G,ZHU D K,et al.A Survey of Visual Image Saliency Detection[J].Computer Science,2020,47(7):84-91.
[1] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[2] XIAO Yahui, ZHANG Zili, HU Xinrong, PENG Tao, ZHANG Jun. Clothing Image Segmentation Method Based on Deeplabv3+ Fused with Attention Mechanism [J]. Computer Science, 2024, 51(6A): 230900153-7.
[3] ZHANG Shuaiyu, PENG Li, DAI Feifei. Person Re-identification Method Based on Progressive Attention Pyramid [J]. Computer Science, 2023, 50(6A): 220200084-8.
[4] DENG Jian-hua, WANG Wei. Multi-source Cross-project Defect Prediction with Data Selection [J]. Computer Science, 2022, 49(11A): 210800160-7.
[5] ZHANG Wen-hua, LIU Xiao-ge, WANG Pei-pei, LIU Jing-jing, CHENG Jing-liang. 3D Registration for Multi-b-value Diffusion Weighted Images of Liver [J]. Computer Science, 2020, 47(11A): 241-243.
[6] LI Chang-xing, LEI Liu, ZHANG Xiao-lu. Brain CT and MRI Image Fusion Based on Morphological Image Enhancement and PCNN [J]. Computer Science, 2020, 47(10): 194-199.
[7] YUAN Ding, WANG Qian, DENG Li-wei. Clustering Assist Feature Alignment for Unsupervised Domain Adaptation [J]. Computer Science, 2019, 46(3): 221-226.
[8] WANG Nan, LI Zhi, CHENG Xin-yu, CHEN Yi. Reversible Visible Watermarking Algorithm for Medical Image Based on Support Vector Regression [J]. Computer Science, 2018, 45(9): 195-201.
[9] ZHANG Wen-hua, ZHANG Ming-hui, GUO Yi-hao, LU Zhen-tai and LIU Ying. Fitting Accuracy Guided Registration for Diffusion Weighted Images [J]. Computer Science, 2018, 45(5): 243-249.
[10] ZHU Yu-guang,YAN Ting,ZHANG Jian-ming,YANG Xiong and HU Wei-li. Video Multi-semantic Annotation Algorithm Based on Feedback Fuzzy Graph Theory [J]. Computer Science, 2013, 40(12): 270-275.
[11] . Relevance Feedback Algorithm Based on Fuzzy Semantic Relevance Matrix in Image Retrieval [J]. Computer Science, 2012, 39(Z6): 540-542.
[12] . [J]. Computer Science, 2008, 35(7): 206-212.
[13] . [J]. Computer Science, 2006, 33(2): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!