计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 217-228.doi: 10.11896/jsjkx.231000051

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多任务学习的视频和图像显著目标检测方法

刘泽宇, 刘建伟   

  1. 中国石油大学(北京)信息科学与工程学院 北京102249
  • 收稿日期:2023-10-09 修回日期:2024-01-04 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 刘建伟(2236677012@qq.com)
  • 作者简介:(2275045480@qq.com)

Video and Image Salient Object Detection Based on Multi-task Learning

LIU Zeyu, LIU Jianwei   

  1. College of Information Science and Engineering,China University of Petroleum,Beijing 102249,China
  • Received:2023-10-09 Revised:2024-01-04 Online:2024-04-15 Published:2024-04-10

摘要: 显著目标检测(Salient Object Detection,SOD)能够模拟人类的注意力机制,在复杂的场景中快速发现高价值的显著目标,为进一步的视觉理解任务奠定了基础。当前主流的图像显著目标检测方法通常基于DUTS-TR数据集进行训练,而视频显著目标检测方法(Video Salient Object Detection,VSOD)基于DAVIS,DAVSOD以及DUTS-TR数据集进行训练。图像和视频显著目标检测任务既有共性又有特性,因此需要部署独立的模型进行单独训练,这大大增加了运算资源和训练时间的开销。当前研究大多针对单个任务提出独立的解决方案,而缺少统一的图像和视频显著目标检测方法。针对上述问题,提出了一种基于多任务学习的图像和视频显著目标检测方法,旨在构建一种通用的模型框架,通过一次训练同时适配两种任务,并进一步弥合图像和视频显著目标检测方法之间的性能差异。12个数据集上的定性和定量实验结果表明,所提方法不仅能够同时适配两种任务,而且取得了比单任务模型更好的检测结果。

关键词: 视频显著目标检测, 图像显著目标检测, 多任务学习, 性能差异

Abstract: Salient object detection(SOD) can quickly identify high-value salient objects in complex scenes,which simulates human attention and lays the foundation for further vision understanding tasks.Currently,the mainstream methods for image-based salient object detection are usually trained on DUTS-TR dataset,while video-based salient object detection(VSOD) methods are trained on DAVIS,DAVSOD,and DUTS-TR datasets.Because image and video salient object detection tasks have general and specific characteristics,independent models need to be deployed for separate training,which greatly increases computational resources and training time.Current research typically focuses on independent solution for a single task.However,a unified method for both image and video salient object detection is lack of research.To address on aforementioned issues,this paper proposes a multi-task learning-based method for image and video salient object detection,aiming to build a universal framework which simultaneously adapts to both tasks with a single training process,and further bridges the performance gaps between image and video salient object detection models.Qualitative and quantitative experimental results on 12 datasets show that the proposed method can not only adapt to both tasks,but also achieve better detection results than single-task models.

Key words: Video-based salient object detection, Image-based salient object detection, Multi-task learning, Performance gaps

中图分类号: 

  • TP391
[1]TANG X,CHEN K,HAN L,et al.Salient object detection method for breast tumor in ultrasound images based on absor-bing Markov chain [J].Journal of X-Ray Science and Technology,2019,27(4):685-701.
[2]XUE X,LI Y,DONG H,et al.Robust Correlation Tracking for UAV Videos via Feature Fusion and Saliency Proposals [J].Remote Sensing,2018,10(10):1644-1665.
[3]LI S F,CHEN C L Z,WANG H S.Object saliency ranking awareness network for efficient image retrieval [J].Application Research of Computers,2023,40(10):3186-3193.
[4]SHAO Z,WANG L,WANG Z,et al.Saliency-Aware Convolution Neural Network for Ship Detection in Surveillance Video [J].IEEE Transactions on Circuits and Systems for Video Technology,2020,30:781-794.
[5]LI C Y,YUAN Y C,CAI W D,et al.Robust saliency detection via regularized random walks ranking [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Re-cognition.Boston,MA,USA:IEEE Press,2015:2710-2717.
[6]ZHU C B,LI G,WANG W M,et al.An innovative salient object detection using center-dark channel prior [C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.Venice,Italy:lEEE Press,2017:1509-1515.
[7]QIN X B,ZHANG Z C,HUANG C Y,et al.BASNet:Boundary-Aware Salient Object Detection [C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Re-cognition.Long Beach,CA,USA:IEEE Press,2019:7479-7489.
[8]ZHAO J X,LIU J J,FAN D P,et al.Egnet:Edge guidance network for salient object detection [C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,Korea(South):IEEE Press,2019:8778-8787.
[9]LIU J J,HOU Q,CHENG M M,et al.A simple pooling based design for real-time salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seoul,Korea(South):IEEE Press,2019:3912-3921.
[10]WANG L,LU H,WANG Y,et al.Learning to detect salient ob-jects with image-level supervision [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA:IEEE Press,2017:3796-3805.
[11]GU Y,WANG L,WANG Z,et al.Pyramid constrained self-attention network for fast video salient object detection [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2010:10869-10876.
[12]PERAZZI F,PONT-TUSET J,BRIAN M,et al.A benchmark dataset and evaluation methodology for video object segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,USA:IEEE Press,2018:724-732.
[13]FAN D P,WANG W G,CHENG M M,et al.Shifting More Attention to Video Salient Object Detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE Press,2019:8554-8564.
[14]LI H,CHEN G,LI G B,et al.Motion Guided Attention for Vi-deo Salient Object Detection [C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,Korea(South):IEEE Press,2019:7273-7282.
[15]ITTI L,DHAVALE N,PIGHIN F,et al.Realistic avatar eye and head animation using a neurobiological model of visual attention [C]//Proceedings of the SPIE Annual Meeting.San Diego,California,USA,2003:64-78.
[16]SONG H M,WANG W G,ZHAO S Y,et al.Pyramid dilated deeper convlstm for video salient object detection [C]//Proceedings of the European Conference on Computer Vision.Seoul,Korea(South):IEEE Press,2018:715-731.
[17]LE T N,SUGIMOTO A.Video salient object detection using spatiotemporal deep features [J].IEEE Transactions on Image Processing,2018,27:5002-5015.
[18]TANG Y,ZOU W,HUA Y,et al.Video salient object detection via spatiotemporal attention neural networks [J].Neurocompu-ting,2020,377:27-37.
[19]ZHENG Q,LI Y,ZHENG L,et al.Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention [J].Neurocomputing,2022,467:465-475.
[20]WU R,FENG M,GUAN W L,et al.A Mutual Learning Me-thod for Salient Object Detection with Intertwined Multi-Supervision [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE Press,2019:8142-8151.
[21]LI X,ZHAO L M,WEI L N,et al,DeepSaliency:Multi-Task Deep Neural Network Model for Salient Object Detection [J].IEEE Transactions on Image Processing,2016,25:3919-3930.
[22]HOU Q B,LIU J J,CHENG M M,et al.Three Birds OneStone:A General Architecture for Salient Object Segmentation,Edge Detection and Skeleton Extraction [J].arxiv:1803.09860,2018.
[23]HE K,ZHANG X,REN S Q,et al.Deep residual learning forimage recognition [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,USA:IEEE Press,2016:770-778.
[24]RUSSAKOVSKY O,DENG J,SU H,et al.ImageNet LargeScale Visual Recognition Challenge [J].International Journal of Computer Vision,2015,115:211-252.
[25]WANG X L,GIRSHICK R,GUPTA A,et al.Non-Local Neural Networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE Press,2018:7794-7803.
[26]YUAN Y,MOU L,LU X,et al.Scene recognition by manifold regularized deep learning architecture [J].IEEE Transactions on Neural Networks and Learning Systems,2015,26:2222-2233.
[27]WANG W,SHEN J,SHAO L,et al.Consistent video saliency using local gradient flow optimization and global refinement [J].IEEE Transactions on Image Processing,2015,24:4185-4196.
[28]LI J,XIA C,CHEN X,et al.A benchmark dataset and saliency guided stacked autoencoders for video-based salient object detection [J].IEEE Transactions on Image Processing,2018,27:349-364.
[29]LI F,KIN T,HUMAYUN A,et al.Video segmentation bytracking many figure-ground segments [C]//Proceedings of the IEEE International Conference on Computer Vision.Sydney,NSW,Australia:IEEE Press,2013:2192-2199.
[30]YAN Q,XU L,SHI J,et al.Hierarchical saliency detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Portland,OR,USA:IEEE Press,2013:1155-1162.
[31]LI Y,HOU X,KOCH C,et al.The secrets of salient object segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Columbus,OH,USA:IEEE Press,2014:280-287.
[32]LI G,YU Y.Visual saliency based on multiscale deep features [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,MA:IEEE Press,2015:5455-5463
[33]YANG C,ZHANG L,LU H C,et al.Saliency detection viagraph-based manifold ranking [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Portland,OR,USA:IEEE Press,2013:3166-3173.
[34]MOVAHEDI V,ELDER J H.Design and perceptual validation of performance measures for salient object segmentation [C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.San Francisco,CA,USA:IEEE Press,2010:49-56.
[35]WU Z,SU L,HUANG Q,et al.Cascaded partial decoderfor fast and accurate salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE Press,2019:3902-3911.
[36]ZHANG P,WANG D,LU H C,et al.Learning uncertain convolutional features for accurate saliency detection [C]//Procee-dings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE Press,2017:212-221.
[37]ZHANG P,WANG D,LU H C,et al.Amulet:Aggregating multilevel convolutional features for salient object detection [C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE Press,2017:202-211.
[38]HOU Q,CHENG M M,HU X W,et al.Deeply supervised salient object detection with short connections [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(4):815-828.
[39]WANG T,BORJI A,ZHANG L H,et al.A stagewise refine-ment model for detecting salient objects in images [C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE Press,2017:4039-4048.
[40]LUO Z,MISHRA A,ACHKAR A,et al.Non-local deep features for salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA:IEEE Press,2017:6593-6601.
[41]ZHANG L,DAI J,LU H C,et al.A bi-directional message pas-sing model for salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE Press,2018:1741-1750.
[42]DENG Z J,HU X W,ZHU L,et al.R3net:Recurrent residual refinement network for saliency detection [C]//Proceedings of the International Joint Conference on Artificial Intelligence.2018:684-690.
[43]CHEN S,TAN X,WANG B,et al.Reverse Attention-Based Residual Network for Salient Object Detection [J].IEEE Transactions on Image Processing,2020,29:3763-3776.
[44]ZHANG X N,WANG T T,QI J Q,et al.Progressive attention guided recurrent network for salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE Press,2018:714-722.
[45]LIU N,HAN J,YANG M H.Picanet:Pixel-wise contextual attention learning for accurate saliency detection [J].IEEE Transactions on Image Processing,2020,29:6438-6451.
[46]WANG T,ZHANG L,WANG S,et al.Detect globally,refine locally:A novel approach to saliency detection [C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE Press,2018:3127-3135.
[47]ZENG Y,ZHUGE Y Z,LU H C,et al.Multi-source weak supervision for saliency detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE Press,2019:6067-6076.
[48]ZHANG L,ZHANG J,LIN Z,et al.Capsal:Leveraging captioning to boost semantics for salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE Press,2019:6017-6026.
[49]LIU Y,CHENG M M,ZHANG X Y,et al.DNA:Deeply supervised nonlinear aggregation for salient object detection [J].IEEE Transactions on Cybernetics,2022,52:6131-6142.
[50]MOHAMMADI S,NOORI M,BAHRI A,et al.Cagnet:Con-tent-aware guidance for salient object detection [J].Pattern Recognition,2020,103:107303.
[51]PANG Y,ZHAO X,ZHANG L H,et al.Multi-scale interactive network for salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Seattle,WA,USA:IEEE Press,2020:9410-9419.
[52]FENG M Y,LU H C,DING Y.Attentive feedback network for boundary-aware salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE Press,2019:1623-1632.
[53]LI X,SONG D,DONG Y S.Hierarchical feature fusion network for salient object detection [J].IEEE Transactions on Image Processing,2020,29:9165-9175.
[54]TU Z,MA Y,LI C L,et al.Edge-guidednon-local fully convolu-tional network for salient object detection [J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31:582-593.
[55]RAHTU E,KANNALA J,SALO M,et al.Segmenting salient objects from images and videos [C]//Proceedings of the European Conference on Computer Vision.Berlin,Heidelberg:Springer Press,2017:366-379.
[56]ZHOU F,KANG S B,COHEN M F.Time-mapping using space-time saliency [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Columbus,OH,USA:IEEE Press,2014:3358-3365.
[57]KIM H,KIM Y,SIM J Y,et al.Spatiotemporal saliency detection for video sequences based on random walk with restart [J].IEEE Transactions on Image Processing,2015,24:2552-2564.
[58]WANG W,SHEN J,PORIKLJ F,et al.Saliency-aware geodesic video object segmentation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Boston,MA,USA:IEEE Press,2015:3395-3402.
[59]LI S,SEYBOLD B,VOROBYOV A,et al.Unsupervised video objectsegmentation with motion-based bilateral networks[C]//Proceedings of the European Conference on Computer Vision.Munich,Germany:Springer Press,2018:215-231.
[60]TANG Y,ZOU W,JIN Z,et al.Weakly supervised salient object detection with spatiotemporal cascade neural networks [J].IEEE Transactions on Circuits and Systems for Video Technology,2019,29:1973-1984.
[61]CHEN Y,ZOU W,TANG Y,et al.Scom:Spatiotemporal constrained optimization for salient object detection [J].IEEE Transactions on Image Processing,2018,27:3345-3357.
[62]WANG W,SHEN J,SHAO L,et al.Video salient object detection via fully convolutional networks [J].IEEE Transactions on Image Processing,2018,27:38-49.
[63]LI G B,XIE Y,WEI T H,et al.Flow guided recurrent neural encoder for video salient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE Press,2018:3243-3252.
[64]CHEN C,LI S,WANG Y,et al.Video saliency detection viaspatial-temporal fusion and low-rank coherency diffusion [J].IEEE Transactions on Image Processing,2017,26:3156-3170.
[65]YAN P,LI G,XIE Y,et al.Semi-supervised video salient object detection using pseudo-labels [C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,Korea(South):IEEE Press,2019:7283-7292.
[66]ZHAO W,ZHANG J,LI L,et al.Weakly supervised video sa-lient object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Nashville,TN,USA:IEEE Press,2021:16821-16830.
[67]WANG W,SONG H,ZHAO S Y,et al.Learning unsupervised video object segmentation through visual attention [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE Press,2019:3059-3069.
[68]XI T,ZHAO W,WANG H,et al.Salient object detection with spatiotemporal background priors for video [J].IEEE Transactions on Image Processing,2017,26:3425-3436.
[69]LIU Z,LI J,YE L,et al.Saliency detection for unconstrainedvideos using superpixel-level graph and spatiotemporal propagation [J].IEEE Transactions on Circuits and Systems for Video Technology,2017,27:2527-2542.
[70]LIU B,MU K,XU M,et al.A novelspatiotemporal attention enhanced discriminative network for video salient object detection [J].Applied Intelligence,2022,52:5922-5937.
[71]ZHANG M,LIU J,WANG Y F,et al.Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Montreal,QC,Canada:IEEE Press,2021:1533-1543.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!