Computer Science ›› 2026, Vol. 53 ›› Issue (1): 163-172.doi: 10.11896/jsjkx.250100071

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Co-salient Object Detection Guided by Category Labels

LI Fangfang1, KONG Yuqiu2, LIU Yang3 , LI Pengyue1   

  1. 1 School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116000, China;
    2 School of Information and Communication Engineering, Dalian Minzu University, Dalian, Liaoning 116000, China;
    3 School of Future Technology, School of Artificial Intelligence, Dalian University of Technology, Dalian, Liaoning 116000, China
  • Received:2025-01-13 Revised:2025-05-08 Published:2026-01-08
  • About author:LI Fangfang,born in 2000,postgra-duate,is a member of CCF(No.Z3690G).Her main research interests include deep learning and computer vision.
    KONG Yuqiu,born in 1992,Ph.D,associate professor,is a member of CCF(No.C0194M).Her main research interests include computer vision and cross-media analysis.
  • Supported by:
    National Natural Science Foundation of China(62406055) and Liaoning Provincial Science and Technology Plan Joint Program(Technology R&D Program Project)(2024JH2/102600090).

Abstract: Acquiring pixel-level labels is laborious and time-consuming,whereas image-level labels can be obtained much more easily.However,the use of image-level labels for co-salient object detection(CoSOD) remains underexplored.This paper presents a two-stage approach for weakly supervised CoSOD,relying solely on image-level labels(class labels) for model training.By utilizing the semantic information of class labels,this approach enables the localization and segmentation of co-salient objects.In the first stage,a pseudo-label generation network is proposed to generate saliency maps for input images,supervised by class labels.In the second stage,a co-salient object segmentation network is trained using these saliency maps as pseudo-labels.A self-corrective learning strategy is also incorporated to enhance model performance.For the first time,this paper proposes using image-level labelsbased training approach for CoSOD.Experiments on three representative datasets demonstrate the effectiveness and feasibility of the proposed method.

Key words: Co-salient object detection, Weakly supervised, Self-corrective learning strategy, Class labels, Two-stage approach

CLC Number: 

  • TP391
[1]YU S,ZHANG B,XIAO J,et al.Structure-Consistent WeaklySupervised Salient Object Detection with Local Saliency Cohe-rence[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Menlo Park,CA:AAAI,2021:3234-3242.
[2]TANG L,LI B,ZHONG Y,et al.Disentangled High Quality Salient Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2021:3580-3590.
[3]ZHENG P,FU H,FAN D P,et al.GCoNet+:A StrongerGroup Collaborative Co-Salient Object Detector[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(9):10929-10946.
[4]LI L,HAN J,ZHANG N,et al.Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2023:7247-7256.
[5]TSAI C C,LI W,HSU K J,et al.Image Co-Saliency Detection and Co-Segmentation via Progressive Joint Optimization[J].IEEE Transactions on Image Processing,2018,28(1):56-71.
[6]HSU K J,TSAI C C,LIN Y Y,et al.Unsupervised CNN-based co-saliency detection with graphical optimization[C]//Procee-dings of the European Conference on Computer Vision.Berlin:Springer,2018:485-501.
[7]NGUYEN T,DAX M,MUMMADI C K,et al.Deepusps:Deep robust unsupervised saliency prediction via self-supervision[J].Advances in Neural Information Processing Systems,2019,32:204-214.
[8]ZHOU H,QIAO B,YANG L,et al.Texture-guided saliency dis-tilling for unsupervised salient object detection[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2023:7257-7267.
[9]XIE J,XIANG J,CHEN J,et al.C2am:Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2022:989-998.
[10]XIE J,HOU X,YE K,et al.Clims:Cross language image ma-tching for weakly supervised semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2022:4483-4492.
[11]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision[C]//Proceedings of the International Conference on Machine Lear-ning.New York:PMLR,2021:8748-8763.
[12]CARON M,TOUVRON H,MISRA I,et al.Emerging proper-ties in self-supervised vision transformers[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society,2021:9650-9660.
[13]FAN D P,LI T,LIN Z,et al.Re-thinking co-salient object detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(8):4339-4354.
[14]ZHANG D,HAN J,LI C,et al.Detection of co-salient objects by looking deep and wide[J].International Journal of Computer Vision,2016,120:215-232.
[15]ZHANG Z,JIN W,XU J,et al.Gradient-Induced Co-SaliencyDetection[C]//Proceedings of the Computer Vision-ECCV 2020:16th European Conference.Berlin:Springer,2020:455-472.
[16]WEI L,ZHAO S,BOURAHLA O E F,et al.Deep group-wise fully convolutional network for co-saliency detection with graph propagation[J].IEEE Transactions on Image Processing,2019,28(10):5052-5063.
[17]HU R,DENG Z,ZHU X.Multi-scale graph fusion for co-saliency detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Menlo Park,CA:AAAI,2021:7789-7796.
[18]JIN W,XU J,CHENG M M,et al.Icnet:Intra-saliency correlation network for co-saliency detection[J].Advances in Neural Information Processing Systems,2020,33:18749-18759.
[19]ZHANG N,HAN J,LIU N,et al.Summarize and search:Lear-ning consensus-aware dynamic convolution for co-saliency detection[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.Los Alamitos:IEEE Computer Society,2021:4167-4176.
[20]TSAI C C,HSU K J,LIN Y Y,et al.Deep co-saliency detection via stacked autoencoder-enabled fusion and self-trained cnns[J].IEEE Transactions on Multimedia,2019,22(4):1016-1031.
[21]FAN Q,FAN D P,FU H,et al.Group collaborative learning for co-salient object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2021:12288-12298.
[22]ZHANG K,DONG M,LIU B,et al.DeepACG:Co-Saliency Detection via Semantic-aware Contrast Gromov-Wasserstein Distance[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2021:13703-13712.
[23]YE M,ZHANG X,YUEN P C,et al.Unsupervised Embedding Learning via Invariant and Spreading Instance Feature[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2019:6210-6219.
[24]HE K,FAN H,WU Y,et al.Momentum Contrast for Unsupervised Visual Representation Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2020:9729-9738.
[25]DAI J,HE K,SUN J.BoxSup:Exploiting Bounding Boxes toSupervise Convolutional Networks for Semantic Segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society,2015:1635-1643.
[26]ZHANG J,YU X,LI A,et al.Weakly-Supervised Salient ObjectDetection via Scribble Annotations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2020:12546-12555.
[27]GAO S,ZHANG W,WANG Y,et al.Weakly-supervised salient object detection using point supervision[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:670-678.
[28]SHAHARABANY T,TEWEL Y,WOLF L.What is Where by Looking:Weakly-Supervised Open-World Phrase-Grounding without Text Inputs[J].Advances in Neural Information Processing Systems,2022,35:28222-28237.
[29]CHEN T,KORNBLITH S,NOROUZI M,et al.A SimpleFramework for Contrastive Learning of Visual Representations[C]//Proceedings of the International Conference on Machine Learning.New York:PMLR,2020:1597-1607.
[30]CHEN X,HE K.Exploring Simple Siamese RepresentationLearning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2021:15750-15758.
[31]SIMEONI O,PUY G,VO H V,et al.Localizing objects with self-supervised transformers and no labels[C]//Proceedings of the 32nd British Machine Vision Conference 2021.London:BMVA,2021:310.
[32]FENG C,ZHONG Y,JIE Z,et al.Promptdet:Expand your detector vocabulary with uncurated images[C]//Computer Vision-ECCV 2022:17th European Conference.Berlin:Springer-Verlag,2022:701-717.
[33]WANG Z,LU Y,LI Q,et al.CRIS:CLIP-Driven ReferringImage Segmentation[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2022:11686-11695.
[34]WEI L,ZHAO S,EL FAROUK BOURAHLA O,et al.Group-wise deep co-saliency detection[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.AAAI,2017:3041-3047.
[35]WANG L,LU H,WANG Y,et al.Learning to Detect Salient Objects with Image-Level Supervision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2017:136-145.
[36]LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft coco:Common objects in context[C]//Proceedings of the Computer Vision-ECCV 2014:13th European Conference.Berlin:Springer,2014:740-755.
[37]CHENG M M,WARRELL J,LIN W Y,et al.Efficient Salient Region Detection with Soft Image Abstraction[C]//Proceedings of IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society,2013:1529-1536.
[38]ACHANTA R,HEMAMI S,ESTRADA F,et al.Frequency-tuned salient region detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2009:1597-1604.
[39]FAN D P,GONG C,CAO Y,et al.Enhanced-alignment measure for binary foreground map evaluation[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.AAAI,2018:698-704.
[40]FAN D P,CHENG M M,LIU Y,et al.Structure-Measure:ANew Way to Evaluate Foreground Maps[C]//Proceedings of the IEEE International Conference on Computer Vision.Los Alamitos:IEEE Computer Society,2017:4548-4557.
[41]SHEN X,EFROS A A,JOULIN A,et al.Learning Co-segmentation by Segment Swapping for Retrieval and Discovery[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2022:5082-5092.
[42]LIU Y,LI T,WU Y,et al.Self-supervised image co-saliency detection[J].Computers and Electrical Engineering,2023,105:108533.
[43]WANG Y,SHEN X,HU S X,et al.Self-Supervised Transfor-mers for Unsupervised Object Discovery using Normalized Cut[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2022:14543-14553.
[44]AMIR S,GANDELSMAN Y,BAGON S,et al.Deep vit features as dense visual descriptors[J].arXiv:2112.05814,2021.
[45]SIMÉONI O,SEKKAT C,PUY G,et al.Unsupervised objectlocalization:Observing the background to discover objects[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos:IEEE Computer Society,2023:3176-3186.
[46]YUAN Y,LIU W,GAO P,et al.Unified unsupervised salientobject detection via knowledge transfer[C]//Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence.2024:1616-1624.
[47]CONG R,QIN Q,ZHANG C,et al.A weakly supervised lear-ning framework for salient object detection via hybrid labels[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,33(2):534-548.
[48]XU B,LIANG H,GONG W,et al.A visual representation-guided framework with global affinity for weakly supervised salient object detection[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,34(1):248-259.
[1] ZHOU Wenhao, HU Hongtao, CHEN Xu, ZHAO Chunhui. Weakly Supervised Video Anomaly Detection Based on Dual Dynamic Memory Network [J]. Computer Science, 2024, 51(1): 243-251.
[2] ZHANG Wen-xuan, WU Qin. Fine-grained Image Classification Based on Multi-branch Attention-augmentation [J]. Computer Science, 2022, 49(5): 105-112.
[3] ZHOU Xiao-long, CHEN Xiao-jia, CHEN Sheng-yong, LEI Bang-jun. Weakly Supervised Learning-based Object Detection:A Survey [J]. Computer Science, 2019, 46(11): 49-57.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!