Computer Science ›› 2021, Vol. 48 ›› Issue (10): 239-245.doi: 10.11896/jsjkx.200600130

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Coherent Semantic Spatial-Temporal Attention Network for Video Inpainting

LIU Lang, LI Liang, DAN Yuan-hong   

  1. College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 400054,China
  • Received:2020-06-20 Revised:2021-02-04 Online:2021-10-15 Published:2021-10-18
  • About author:LIU Lang,born in 1994,postgraduate.His main research interests include deep learning and pattern recognition.
    DAN Yuan-hong,born in 1981,Ph.D,associate professor.His main research interests include pattern recognition and intelligent control.
  • Supported by:
    National Defense Technology Innovation Zone Project.

Abstract: Existing video inpainting methods usually produce blurred texture,distorted structure and artifacts,while applying image-based inpainting model directly to the video inpainting will lead to inconsistent time.From the perspective of time,a novel coherent semantic spatial-temporal attention(CSSTA) for video inpainting is proposed,through the attention layer,the model focuses on the information that the target frame is partially blocked and the adjacent frames are visible,so as to obtain the visible content to fill the hole region of the target frame.The CSSTA layer can not only model the semantic correlation between hole features but also remotely correlate the long-range information with the hole regions.In order to complete semantically coherent hole regions,a novel loss function Feature Loss is proposed to replace VGG Loss.The model is built on a two-stage coarse-to-fine encoder-decoder model for collecting and refining information from adjacent frames.Experimental results on the YouTube-VOS and DAVIS datasets show that the method in this paper runs almost in real-time and outperforms the three typical video inpainting methods in terms of inpainting results,peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).

Key words: Feature Loss, Image inpainting, Spatial-Temporal attention, VGG Loss, Video inpainting

CLC Number: 

  • TP391.4
[1]BARNES C,SHECHTMAN E,FINKELSTEIN A,et al.PatchMatch:A randomized correspondence algorithm for structural image editing[J].ACM Transactions on Graphics,2009,28(3):24.
[2]HE K,SUN J.Image completion approaches using the statistics of similar patches[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,36(12):2423-2435.
[3]HO L J,CHOI I,KIM M H.Laplacian patch-based image synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Computer Society,2016:2727-2735.
[4]KOMODAKIS N.Image completion using global optimization[C]//2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.New York City:IEEE,2006:442-452.
[5]LE MEUR O,EBDELLI M,GUILLEMOT C.Hierarchical super-resolution-based inpainting[J].IEEE Transactions on Image Processing,2013,22(10):3779-3790.
[6]XU Z,SUN J.Image inpainting by patch propagation usingpatch sparsity[J].IEEE Transactions on Image Processing,2010,19(5):1153-1165.
[7]SHAO X W,LIU Z K,SONG B.An Adaptive Image Inpainting Approach Based on TV Model[J].Journal of Circuits and Systems,2004(2):113-117.
[8]LI Z D,HE H J,YIN Z K,et al.Adaptive Image Inpainting Algorithm Based on Patch Structure Sparsity[J].Acta Electronica Sinica,2013,41(3):549-554.
[9]IIZUKA S,SIMO-SERRA E,ISHIKAWA H.Globally and locally consistent image completion[J].ACM Transactions on Graphics (ToG),2017,36(4):1-14.
[10]LIU G,REDA F A,SHIH K J,et al.Image inpainting for irre-gular holes using partial convolutions[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich:Springer International Publishing,2018:85-100.
[11]PATHAK D,KRAHENBUHL P,DONAHUE J,et al.Context encoders:Feature learning by inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Computer Society,2016:2536-2544.
[12]YANG C,LU X,LIN Z,et al.High-resolution image inpainting using multi-scale neural patch synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA:IEEE Computer Society,2017:6721-6729.
[13]YU J,LIN Z,YANG J,et al.Generative image inpainting with contextual attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE,2018:5505-5514.
[14]YU J,LIN Z,YANG J,et al.Free-form image inpainting with gated convolution[C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,South Korea:IEEE,2019:4471-4480.
[15]SONG Y,YANG C,LIN Z,et al.Contextual-based imageinpainting:Infer,match,and translate[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich:Springer International Publishing,2018:3-19.
[16]YAN Z,LI X,LI M,et al.Shift-net:Image inpainting via deep feature rearrangement[C]//Proceedings of the European Conference on Computer Vision (ECCV).Munich:Springer International Publishing,2018:1-17.
[17]HUANG J B,KANG S B,AHUJA N,et al.Temporally cohe-rent completion of dynamic video[J].ACM Transactions on Graphics (TOG),2016,35(6):1-11.
[18]NEWSON A,ALMANSA A,FRADET M,et al.Video inpain-ting of complex scenes[J].Siam Journal on Imaging Sciences,2014,7(4):1993-2019.
[19]BERTALMIO M,BERTOZZI A L,SAPIRO G.Navier-stokes,fluid dynamics,and image and video inpainting[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Kauai,HI,USA:IEEE Compu-ter Society,2001.
[20]PATWARDHAN K A,SAPIRO G,BERTALMÍO M.Video inpainting under constrained camera motion[J].IEEE Transactions on Image Processing,2007,16(2):545-553.
[21]GAN L,GUO X Y,LIU B.A block matching algorithm for videoinpainting[J].Computer Applications and Software,2013(9):64-66.
[22]XIAO C X,LIU S,LIN C C,et al.A Global Space-Time Optimization Framework for Video Completion[J].Journal ofCompu-ter-Aided Design & Computer Graphics,2008,20(9):1204-1211.
[23]XU R,LI X,ZHOU B,et al.Deep flow-guided video inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:3723-3732.
[24]LEE S,OH S W,WON D Y,et al.Copy-and-paste networks for deep video inpainting[C]//Proceedings of the IEEE Internatio-nal Conference on Computer Vision.Seoul,South Korea:IEEE,2019:4413-4421.
[25]LAI W S,HUANG J B,WANG O,et al.Learning blind video temporal consistency[C]//Proceedings of the European Confe-rence on Computer Vision (ECCV).Munich:Springer International Publishing,2018:170-185.
[26]CHANG Y L,LIU Z Y,HSU W,et al.Vornet:Spatio-temporally consistent video inpainting for object removal[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).Long Beach,CA,USA:IEEE,2019:1785-1794.
[27]WANG C,HUANG H,HAN X,et al.Video inpainting by join-tly learning temporal structure and spatial details[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019,33:5232-5239.
[28]KIM D,WOO S,LEE J Y,et al.Deep video inpainting[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:5792-5801.
[29]OH S W,LEE S,LEE J Y,et al.Onion-peel networks for deep video completion[C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,South Korea:IEEE,2019:4403-4412.
[30]CHANG Y L,LIU Z Y,LEE K Y,et al.Learnable gated temporal shift module for deep video inpainting[J].arXiv:1907.01131,2019.
[31]KIM D,WOO S,LEE J Y,et al.Deep blind video decaptioning by temporal aggregation and recurrence[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Long Beach,CA,USA:IEEE,2019:4263-4272.
[32]CHANG Y L,LIU Z Y,LEE K Y,et al.Free-form video in-painting with 3d gated convolution and temporal patchgan[C]//Proceedings of the IEEE International Conference on Computer Vision.Seoul,South Korea:IEEE,2019:9066-9075.
[33]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[34]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[35]GATYS L A,ECKER A S,BETHGE M.A neural algorithm of artistic style[J].arXiv:1508.06576,2015.
[36]NAZERI K,NG E,JOSEPH T,et al.Edgeconnect:Generative image inpainting with adversarial edge learning[J].arXiv:1901.00212,2019.
[37]JOHNSON J,ALAHI A,LI F F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Amsterdam:Springer International Publishing,2016:694-711.
[38]LEDIG C,THEIS L,HUSZÁR F,et al.Photo-realistic singleimage super-resolution using a generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,HI,USA:IEEE Computer Society,2017:4681-4690.
[39]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 million image database for scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1452-1464.
[40]XU N,YANG L,FAN Y,et al.Youtube-vos:A large-scale video object segmentation benchmark[J].arXiv:1809.03327,2018.
[41]PERAZZI F,PONT-TUSET J,MCWILLIAMS B,et al.Abenchmark dataset and evaluation methodology for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE Computer Society,2016:724-732.
[1] LIN Zhen-xian, ZHANG Meng-kai, WU Cheng-mao, ZHENG Xing-ning. Face Image Inpainting with Generative Adversarial Network [J]. Computer Science, 2021, 48(9): 174-180.
[2] ZHAO Lu-lu, SHEN Ling, HONG Ri-chang. Survey on Image Inpainting Research Progress [J]. Computer Science, 2021, 48(3): 14-26.
[3] MENG Li-sha, REN Kun, FAN Chun-qi, HUANG Long. Dense Convolution Generative Adversarial Networks Based Image Inpainting [J]. Computer Science, 2020, 47(8): 202-207.
[4] ZHOU Xian-chun, XU Yan. Adaptive Image Inpainting Based on Structural Correlation [J]. Computer Science, 2020, 47(4): 131-135.
[5] TANG Hao-feng, DONG Yuan-fang, ZHANG Yi-tong, SUN Juan-juan. Survey of Image Inpainting Algorithms Based on Deep Learning [J]. Computer Science, 2020, 47(11A): 151-164.
[6] GAN Ling, ZHAO Fu-chao, YANG Meng. Self-adaptive Group Sparse Representation Method for Image Inpainting [J]. Computer Science, 2018, 45(8): 272-276.
[7] ZHANG Lei and KANG Bao-sheng. Image Inpainting for Object Removal Based on Structure Sparsity and Patch Difference [J]. Computer Science, 2018, 45(5): 255-259.
[8] SUN Quan, ZENG Xiao-qin. Image Inpainting Based on Generative Adversarial Networks [J]. Computer Science, 2018, 45(12): 229-234.
[9] DOU Li-yun, XU Dan, LI Jie, CHEN Hao and LIU Yi-cheng. Image Inpainting Based on Dual-tree Complex Wavelet Transform [J]. Computer Science, 2017, 44(Z6): 179-182.
[10] ZHU Xuan, ZHANG Xu-feng, LI Qiu-ju, WANG Ning and Tao Ji-yao. Image Inpainting Method Based on Sparse Decomposition [J]. Computer Science, 2016, 43(1): 294-297.
[11] HU Wen-jin,LIU Zhong-min and LI Zhan-ming. Improved Algorithm for Image Inpainting in Wavelet Domains [J]. Computer Science, 2014, 41(5): 299-303.
[12] ZHAI Dong-hai,LI Tong-liang,DUAN Wei-xia,YU Jiang and XIAO Jie. Optimal Exemplar Matching Algorithm Based on Matrix Similarity and its Application in Image Inpainting [J]. Computer Science, 2014, 41(1): 307-310.
[13] LIU Chun-li and ZHANG Gong. Wavelet Frame Based Blind Image Inpainting [J]. Computer Science, 2013, 40(4): 295-297.
[14] . [J]. Computer Science, 2008, 35(12): 212-213.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!