计算机科学 ›› 2021, Vol. 48 ›› Issue (10): 239-245.doi: 10.11896/jsjkx.200600130

• 计算机图形学&多媒体 • 上一篇    下一篇

用于视频修复的连贯语义时空注意力网络

刘浪, 李梁, 但远宏   

  1. 重庆理工大学计算机科学与工程学院 重庆400054
  • 收稿日期:2020-06-20 修回日期:2021-02-04 出版日期:2021-10-15 发布日期:2021-10-18
  • 通讯作者: 但远宏(dyh@cqut.edu.cn)
  • 作者简介:lang224017@gmail.com
  • 基金资助:
    国防科技创新特区项目

Coherent Semantic Spatial-Temporal Attention Network for Video Inpainting

LIU Lang, LI Liang, DAN Yuan-hong   

  1. College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China
  • Received:2020-06-20 Revised:2021-02-04 Online:2021-10-15 Published:2021-10-18
  • About author:LIU Lang,born in 1994,postgraduate.His main research interests include deep learning and pattern recognition.
    DAN Yuan-hong,born in 1981,Ph.D,associate professor.His main research interests include pattern recognition and intelligent control.
  • Supported by:
    National Defense Technology Innovation Zone Project.

摘要: 现有的视频修复方法通常会产生纹理模糊、结构扭曲的内容以及伪影,而将基于图像的修复模型直接应用于视频修复会导致时间上的不一致。从时间角度出发,提出了一种新的用于视频修复的连贯语义时空注意力(Coherent Semantic Spatial-Temporal Attention,CSSTA)网络,通过注意力层,使得模型关注于目标帧被遮挡而相邻帧可见的信息,以获取可见的内容来填充目标帧的孔区域(hole region)。CSSTA层不仅可以对孔特征之间的语义相关性进行建模,还能对远距离信息和孔区域之间的远程关联进行建模。为合成语义连贯的孔区域,提出了一种新的损失函数特征损失(Feature Loss)以取代VGG Loss。模型建立在一个双阶段粗到精的编码器-解码器结构上,用于从相邻帧中收集和提炼信息。在YouTube-VOS和DAVIS数据集上的实验结果表明,所提方法几乎实时运行,并且在修复结果、峰值信噪比(PSNR)和结构相似度(SSIM)3个方面均优于3种代表性视频修复方法。

关键词: VGG Loss, 时空注意力, 视频修复, 特征损失, 图像修复

Abstract: Existing video inpainting methods usually produce blurred texture,distorted structure and artifacts,while applying image-based inpainting model directly to the video inpainting will lead to inconsistent time.From the perspective of time,a novel coherent semantic spatial-temporal attention(CSSTA) for video inpainting is proposed,through the attention layer,the model focuses on the information that the target frame is partially blocked and the adjacent frames are visible,so as to obtain the visible content to fill the hole region of the target frame.The CSSTA layer can not only model the semantic correlation between hole features but also remotely correlate the long-range information with the hole regions.In order to complete semantically coherent hole regions,a novel loss function Feature Loss is proposed to replace VGG Loss.The model is built on a two-stage coarse-to-fine encoder-decoder model for collecting and refining information from adjacent frames.Experimental results on the YouTube-VOS and DAVIS datasets show that the method in this paper runs almost in real-time and outperforms the three typical video inpainting methods in terms of inpainting results,peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).

Key words: Feature Loss, Image inpainting, Spatial-Temporal attention, VGG Loss, Video inpainting

中图分类号: 

  • TP391.4
[1]BARNES C,SHECHTMAN E,FINKELSTEIN A,et al.PatchMatch:A randomized correspondence algorithm for structural image editing[J].ACM Transactions on Graphics,2009,28(3):24.
[2]HE K,SUN J.Image completion approaches using the statistics of similar patches[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(12):2423-2435.
[3]HO L J,CHOI I,KIM M H.Laplacian patch-based image synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Computer Society,2016:2727-2735.
[4]KOMODAKIS N.Image completion using global optimization[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.New York City:IEEE,2006:442-452.
[5]LE MEUR O,EBDELLI M,GUILLEMOT C.Hierarchical super-resolution-based inpainting[J].IEEE Transactions on Image Processing,2013,22(10):3779-3790.
[6]XU Z,SUN J.Image inpainting by patch propagation usingpatch sparsity[J].IEEE Transactions on Image Processing,2010,19(5):1153-1165.
[7]SHAO X W,LIU Z K,SONG B.An Adaptive Image Inpainting Approach Based on TV Model[J].Journal of Circuits and Systems,2004(2):113-117.
[8]LI Z D,HE H J,YIN Z K,et al.Adaptive Image Inpainting Algorithm Based on Patch Structure Sparsity[J].Acta Electronica Sinica,2013,41(3):549-554.
[9]IIZUKA S,SIMO-SERRA E,ISHIKAWA H.Globally and locally consistent image completion[J].ACM Transactions on Graphics (ToG),2017,36(4):1-14.
[10]LIU G,REDA F A,SHIH K J,et al.Image inpainting for irre-gular holes using partial convolutions[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich:Springer International Publishing,2018:85-100.
[11]PATHAK D,KRAHENBUHL P,DONAHUE J,et al.Context encoders:Feature learning by inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Computer Society,2016:2536-2544.
[12]YANG C,LU X,LIN Z,et al.High-resolution image inpainting using multi-scale neural patch synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA:IEEE Computer Society,2017:6721-6729.
[13]YU J,LIN Z,YANG J,et al.Generative image inpainting with contextual attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE,2018:5505-5514.
[14]YU J,LIN Z,YANG J,et al.Free-form image inpainting with gated convolution[C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,South Korea:IEEE,2019:4471-4480.
[15]SONG Y,YANG C,LIN Z,et al.Contextual-based imageinpainting:Infer,match,and translate[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich:Springer International Publishing,2018:3-19.
[16]YAN Z,LI X,LI M,et al.Shift-net:Image inpainting via deep feature rearrangement[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich:Springer International Publishing,2018:1-17.
[17]HUANG J B,KANG S B,AHUJA N,et al.Temporally cohe-rent completion of dynamic video[J].ACM Transactions on Graphics (TOG),2016,35(6):1-11.
[18]NEWSON A,ALMANSA A,FRADET M,et al.Video inpain-ting of complex scenes[J].Siam Journal on Imaging Sciences,2014,7(4):1993-2019.
[19]BERTALMIO M,BERTOZZI A L,SAPIRO G.Navier-stokes,fluid dynamics,and image and video inpainting[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Kauai,HI,USA:IEEE Compu-ter Society,2001.
[20]PATWARDHAN K A,SAPIRO G,BERTALMÍO M.Video inpainting under constrained camera motion[J].IEEE Transactions on Image Processing,2007,16(2):545-553.
[21]GAN L,GUO X Y,LIU B.A block matching algorithm for videoinpainting[J].Computer Applications and Software,2013(9):64-66.
[22]XIAO C X,LIU S,LIN C C,et al.A Global Space-Time Optimization Framework for Video Completion[J].Journal ofCompu-ter-Aided Design & Computer Graphics,2008,20(9):1204-1211.
[23]XU R,LI X,ZHOU B,et al.Deep flow-guided video inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:3723-3732.
[24]LEE S,OH S W,WON D Y,et al.Copy-and-paste networks for deep video inpainting[C]//Proceedings of the IEEE Internatio-nal Conference on Computer Vision.Seoul,South Korea:IEEE,2019:4413-4421.
[25]LAI W S,HUANG J B,WANG O,et al.Learning blind video temporal consistency[C]//Proceedings of the European Confe-rence on Computer Vision (ECCV).Munich:Springer International Publishing,2018:170-185.
[26]CHANG Y L,LIU Z Y,HSU W,et al.Vornet:Spatio-temporally consistent video inpainting for object removal[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).Long Beach,CA,USA:IEEE,2019:1785-1794.
[27]WANG C,HUANG H,HAN X,et al.Video inpainting by join-tly learning temporal structure and spatial details[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019,33:5232-5239.
[28]KIM D,WOO S,LEE J Y,et al.Deep video inpainting[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:5792-5801.
[29]OH S W,LEE S,LEE J Y,et al.Onion-peel networks for deep video completion[C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,South Korea:IEEE,2019:4403-4412.
[30]CHANG Y L,LIU Z Y,LEE K Y,et al.Learnable gated temporal shift module for deep video inpainting[J].arXiv:1907.01131,2019.
[31]KIM D,WOO S,LEE J Y,et al.Deep blind video decaptioning by temporal aggregation and recurrence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:4263-4272.
[32]CHANG Y L,LIU Z Y,LEE K Y,et al.Free-form video in-painting with 3d gated convolution and temporal patchgan[C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,South Korea:IEEE,2019:9066-9075.
[33]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[34]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[35]GATYS L A,ECKER A S,BETHGE M.A neural algorithm of artistic style[J].arXiv:1508.06576,2015.
[36]NAZERI K,NG E,JOSEPH T,et al.Edgeconnect:Generative image inpainting with adversarial edge learning[J].arXiv:1901.00212,2019.
[37]JOHNSON J,ALAHI A,LI F F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Amsterdam:Springer International Publishing,2016:694-711.
[38]LEDIG C,THEIS L,HUSZÁR F,et al.Photo-realistic singleimage super-resolution using a generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA:IEEE Computer Society,2017:4681-4690.
[39]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 million image database for scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1452-1464.
[40]XU N,YANG L,FAN Y,et al.Youtube-vos:A large-scale video object segmentation benchmark[J].arXiv:1809.03327,2018.
[41]PERAZZI F,PONT-TUSET J,MCWILLIAMS B,et al.Abenchmark dataset and evaluation methodology for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Computer Society,2016:724-732.
[1] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2] 高越, 傅湘玲, 欧阳天雄, 陈松龄, 闫晨巍.
基于时空自适应图卷积神经网络的脑电信号情绪识别
EEG Emotion Recognition Based on Spatiotemporal Self-Adaptive Graph ConvolutionalNeural Network
计算机科学, 2022, 49(4): 30-36. https://doi.org/10.11896/jsjkx.210900200
[3] 赵露露, 沈玲, 洪日昌.
图像修复研究进展综述
Survey on Image Inpainting Research Progress
计算机科学, 2021, 48(3): 14-26. https://doi.org/10.11896/jsjkx.210100048
[4] 孟丽莎, 任坤, 范春奇, 黄泷.
基于密集卷积生成对抗网络的图像修复
Dense Convolution Generative Adversarial Networks Based Image Inpainting
计算机科学, 2020, 47(8): 202-207. https://doi.org/10.11896/jsjkx.190700017
[5] 周先春, 徐燕.
基于结构相关性的自适应图像修复
Adaptive Image Inpainting Based on Structural Correlation
计算机科学, 2020, 47(4): 131-135. https://doi.org/10.11896/jsjkx.190300149
[6] 甘玲, 赵福超, 杨梦.
一种自适应组稀疏表示的图像修复方法
Self-adaptive Group Sparse Representation Method for Image Inpainting
计算机科学, 2018, 45(8): 272-276. https://doi.org/10.11896/j.issn.1002-137X.2018.08.049
[7] 张雷,康宝生.
基于结构稀疏度和块差异度的目标移除图像修复
Image Inpainting for Object Removal Based on Structure Sparsity and Patch Difference
计算机科学, 2018, 45(5): 255-259. https://doi.org/10.11896/j.issn.1002-137X.2018.05.044
[8] 孙全, 曾晓勤.
基于生成对抗网络的图像修复
Image Inpainting Based on Generative Adversarial Networks
计算机科学, 2018, 45(12): 229-234. https://doi.org/10.11896/j.issn.1002-137X.2018.12.038
[9] 窦立云,徐丹,李杰,陈浩,刘义成.
基于双树复小波的图像修复
Image Inpainting Based on Dual-tree Complex Wavelet Transform
计算机科学, 2017, 44(Z6): 179-182. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.041
[10] 祝轩,张旭峰,李秋菊,王宁,陶吉瑶.
基于稀疏分解的图像修复方法
Image Inpainting Method Based on Sparse Decomposition
计算机科学, 2016, 43(1): 294-297. https://doi.org/10.11896/j.issn.1002-137X.2016.01.063
[11] 睢 丹,高国伟.
基于人工鱼群微细分解的先验未知像素点修复算法
Image Restoration Algorithm of Unknown Priori Pixel Based on Artificial Fish Swarm Decomposition
计算机科学, 2015, 42(3): 316-320. https://doi.org/10.11896/j.issn.1002-137X.2015.03.065
[12] 胡文瑾,刘仲民,李战明.
一种改进的小波域图像修复算法
Improved Algorithm for Image Inpainting in Wavelet Domains
计算机科学, 2014, 41(5): 299-303. https://doi.org/10.11896/j.issn.1002-137X.2014.05.064
[13] 翟东海,李同亮,段维夏,鱼江,肖杰.
基于矩阵相似度的最佳样本块匹配算法及其在图像修复中的应用
Optimal Exemplar Matching Algorithm Based on Matrix Similarity and its Application in Image Inpainting
计算机科学, 2014, 41(1): 307-310.
[14] 刘纯利,张弓.
基于小波框架的盲图像修复研究
Wavelet Frame Based Blind Image Inpainting
计算机科学, 2013, 40(4): 295-297.
[15] 刘春英,房庆云,胡维华.
基于区别缺损对象的视频序列修复方法
Restoration Method for Video Sequences Based on Distinguishing Damaged Objects
计算机科学, 2012, 39(7): 267-269.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!