计算机科学 ›› 2020, Vol. 47 ›› Issue (8): 195-201.doi: 10.11896/jsjkx.190600148

• 计算机图形学&多媒体 • 上一篇    下一篇

基于3D全时序卷积神经网络的视频显著性检测

王教金1, 蹇木伟1, 刘翔宇1, 林培光1, 耿蕾蕾1, 崔超然1, 尹义龙2   

  1. 1 山东财经大学计算机科学与技术学院 济南 2500142
    山东大学软件学院 济南 250101
  • 出版日期:2020-08-15 发布日期:2020-08-10
  • 通讯作者: 蹇木伟(jianmuweihk@163.com)
  • 作者简介:125453468@qq.com
  • 基金资助:
    国家自然科学基金(61601427, 61976123, 61771230);泰山学者青年专家支持计划

Video Saliency Detection Based on 3D Full ConvLSTM Neural Network

WANG Jiao-jin1, JIAN Mu-wei1, LIU Xiang-yu1, LIN Pei-guang1, GEN Lei-lei1, CUI Chao-ran1, YIN Yi-long2   

  1. 1 School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
    2 School of Software Engineering, Shandong University, Jinan 250101, China
  • Online:2020-08-15 Published:2020-08-10
  • About author:WANG Jiao-jin, born in 1993, postgra-duate.His main research interests include image processing and visual significance detection.
    JIAN Mu-wei, professor, Ph.D supervisor, is a member of China Computer Federation.His main research interests include image processing, pattern recognition, multimedia computing.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61601427, 61976123, 61771230), Taishan Young Scholars Program of Shandong Province.

摘要: 视觉是人类感知世界的重要途径之一。视频显著性检测旨在通过计算机模拟人类的视觉注意机制, 智能地检测出视频中的显著性物体。目前, 基于传统方法的视频显著性检测已经达到一定的水平, 但是在时空信息一致性利用方面仍不能令人满意。因此, 文中提出了一种基于全时序卷积神经网络的视频显著性检测方法。首先, 利用全时序卷积对输入视频进行空间信息和时间信息的时空特征提取;然后, 利用3D池化层进行降维;其次, 在解码层中用3D反卷积和3D上采样对前端特征进行解码;最后, 通过把时空信息有机地提取与融合, 来有效地提升显著图的质量。实验结果表明, 所提算法在3个广泛使用的视频显著性检测数据集(DAVIS, FBMS, SegTrack)上的性能优于当前主流的视频显著性检测方法。

关键词: 全时序卷积, 神经网络, 时空特征, 显著性检测

Abstract: Video saliency detection aims to mimic human’s visual attention mechanism of perceiving the world via extracting the most attractive regions or objects in the input video.At present, it is still a challenge for video saliency detection.Traditional video saliency-detection models have reached a certain level, but exploiting the consistency of spatio-temporal information is unsatisfactory.In order to solve this issue, this paper proposes a video saliency-detection model based on 3D full ConvLSTM neural network.Firstly, the full-time convolution is utilized to extract spatio-temporal features from the input video, and then the 3D pooling layer is explored for dimensionality reduction.Secondly, the extracted features are decoded by 3D deconvolution in the decoding layer, and the interpolation algorithm is applied to restore the saliency map to the original size of the original image.The proposed method extracts the time and space information jointly so as to effectively enhance the completeness of the saliency map.Experimental results show that the performance of the proposed algorithm is superior to state-of-the-art video saliency detection methods based on three widely used data sets (DAVIS, FBMS, SegTrack) for video saliency detection.

Key words: ConvLSTM, Neural network, Saliency detection, Spatio-temporal feature

中图分类号: 

  • TP391
[1]RUSSAKOVSKY O, DENG J, SU H, et al.ImageNet large scale visual recognition challenge[J].Internationl Journal ofCompu-ter Vision, 2015, 115(3):211-252.
[2]BROX, MALIK J.Object segmentation by long term analysis of point trajectories[C]∥Proc. Eur. Conf. Comput. Vis..2010:282-295.
[3]LI F, KIM T, HUMAYUN A, et al.Video segmentation bytracking many figure-ground segments[C]∥Proc. IEEE Int. Conf. Comput. Vis..2013:2192-2199.
[4]LI J, XIA C, CHEN X.A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection[J]. IEEE Trans.Image Process., 2018, 27(1):349-364.
[5]GALASSO F, NAGARAJA N S, CARDENAS T, et al.A uni-fied video segmentation benchmark:Annotation, metrics and analysis[C]∥Proc.IEEE ICCV.2013:3527-3534.
[6]LIU Z, ZHANG X, LUO S, et al.Superpixel-based spatiotemporal saliency detection[J].IEEE TCSVT, 2014, 24(9):1522-1540.
[7]FANG Y, WANG Z, LIN W, et al.Video saliency incorporating spatiotemporal cues and uncertainty weighting[J].IEEE TIP, 2014, 23(9):3910-3921.
[8]WANG L, WANG L, LU H, et al.Saliency detection with recurrent fully convolutional networks[C]∥ECCV.2016:825-841.
[9]LIU Z, LI J, YE L, et al.Saliency detection for unconstrainedvideos using superpixel-level graph and spatiotemporal propagation[J].IEEE Trans.Circuits Syst.Video Technol., 2017, PP(9):1-17.
[10]WANG W, SHEN J, PORIKLI F.Saliency-aware geodesic video object segmentation[C]∥IEEE CVPR.2015:3395-3402.
[11]CHENG M M, MITRA N J, HUANG X, et al.Global contrast based salient region detection[J].IEEE TPAMI, 2015, 37(3):569-582.
[12]HOCHREITER S, SCHMIDHUBER J.Long short-term memory[J].Neural Computation, 1997, 9(8):1735-1780.
[13]SHI X, CHEN Z, WANG H, et al.Convolutional LSTM network:A machine learning approach for precipitation nowcasting[C]∥NIPS.2015.
[14]CONG R, LEI J, FU H, et al.Co-saliency detection for rgbd images based on multi-constraint feature matching and cross label propagation[J].IEEE TIP, 2018, 27(2):568-579.
[15]FU H, XU D, ZHANG B, et al.Object-based multiple fore-ground video co-segmentation via multi-state selection graph[J].IEEE TIP, 2015, 24(11):3415-3424.
[16]HE K, ZHANG X, REN S, et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE TPAMI, 2015, 37(9):1904-1916.
[17]KOH Y J, KIM C S.Primary object segmentation in videosbased on region augmentation and reduction[C]∥IEEE CVPR.2017:7417-7425.
[18]LIU Z, LI J, YE L, et al.Saliency detection for unconstrainedvideos using superpixel-level graph and spatiotemporal propagation[J].IEEE TCSVT, 2017, 27(12):2527-2542.
[19]WANG W, SHEN J, SHAO L.Consistent video saliency using local gradient flow optimization and global refinement[J].IEEE TIP, 2015, 24(11):4185-4196.
[20]KIM H, KIM Y, SIM J Y, et al.Spatiotemporal saliency detection for video sequences based on random walk with restart[J].IEEE Trans.Image Process., 2015, 24(8):2552-2564.
[21]CHEN C, LI S, WANG Y, et al.Video saliency detection viaspatial-temporal fusion and low-rank coherency diffusion[J].IEEE Trans.Image Process., 2017, 26(7):3156-3170.
[22]CHENG M M, MITRA N J, HUANG X L, et al.Salient shape:group saliency in image collections[J].The Visual Computer, 2014, 30(4):443-453.
[23]FANG Y, LIN W, CHEN Z, et al.A video saliency detection model in compressed domain[J].IEEE Trans.Circuits Syst.Video Technol., 2014, 24(1):27-38.
[24]LI G, XIE Y, WEI T, et al.Flow guided recurrent neural encoder for video salient object detection[C]∥IEEE CVPR.2018:3243-3252.
[25]ILG E, MAYER N, SAIKIA T, et al.Flownet 2.0:Evolution of optical flow estimation with deep networks[C]∥IEEE CVPR.2017:2462-2470.
[26]WANG W, SHEN J, SHAO L.Video salient object detection via fully convolutional networks[J].IEEE TIP, 2018, 27(1):38-49.
[27]SHI X, CHEN Z, WANG H, et al.Convolutional LSTM network:A machine learning approach for precipitation nowcas-ting[C]∥NIPS.2015.
[28]YANG C, ZHANG L, LU H, et al.Saliency detection via graphbased manifold ranking[C]∥IEEE CVPR.2013:3166-3173.
[29]ZHANG P, WANG D, LU H, et al.Amulet:Aggregating multi-level convolutional features for salient object detection[C]∥IEEE ICCV.2017:202-211.
[30]LE T N, SUGIMOTO A.Deeply supervised 3D recurrent FCN for salient object detection in videos[C]∥BMVC.2017:1-13.
[31]PERAZZI F, PONT-TUSET J, MCWILLIAMS B, et al.A ben-chmark dataset and evaluation methodology for video object segmentation[C]∥Proc.CVPR..2016:724-732.
[32]HOU Q, CHENG M M, HU X, et al.Deeply supervised salient object detection with short connections[C]∥Proc.IEEE Conf.Comput.Vis.Pattern Recognit..2017:5300-5309.
[33]FANG Y, WANG Z, LIN W, et al.Video saliency incorporating spatiotemporal cues and uncertainty weighting.IEEE Trans.Image Process., 2014, 22(9):3910-3921.
[34]XI T, ZHAO W, WANG H, et al.Salient object detection with spatiotemporal background priors for video[J].IEEE Trans.Ima-ge Process., 2017, 26(7):3425-3436.
[35]FAN D P, CHENG M M, LIU Y, et al.Structure-measure:Anew way to evaluate foreground maps[C]∥Proceedings of the IEEE International Conference on Computer Vision.2017:4548-4557.
[36]FAN D P, GONG C, CAO Y, et al.Enhanced-alignment measure for binary foreground map evaluation[J].arXiv:1805.10421, 2018.
[37]FAN D P, CHENG M M, LIU J J, et al.Salient objects in clutter:Bringing salient object detection to the foreground[C]∥IEEE ECCV.2018:186-202.
[38]JIAN M, LAM K M, DONG J, et al.Visual-patch-attention-aware Saliency Detection[J].IEEE Transactions on Cyberne-tics, 2015, 45(8):1575-1586.
[39]JIAN M, QI Q, DONG J, et al.Integrating QDWD with Pattern Distinctness and Local Contrast for Underwater Saliency Detection[J].Journal of Visual Communication and Image Representation, 2018, 53:31-41.
[40]JIAN M, ZHOU Q, CUI C, et al.Assessment of Feature Fusion Strategies in Visual Attention Mechanism for Saliency Detection, Pattern Recognition Letters[OL].
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[3] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[4] 王润安, 邹兆年.
基于物理操作级模型的查询执行时间预测方法
Query Performance Prediction Based on Physical Operation-level Models
计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074
[5] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[6] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[7] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[8] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[9] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[10] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[11] 齐秀秀, 王佳昊, 李文雄, 周帆.
基于概率元学习的矩阵补全预测融合算法
Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning
计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[12] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[13] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[14] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[15] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!