计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 221100156-10.doi: 10.11896/jsjkx.221100156

• 图像处理&多媒体技术 • 上一篇    下一篇

基于潜在注意力的高性能视频超分辨率技术

王宇骥1, 董昊呈1, 龚雪鸾2, 陈艳姣3   

  1. 1 武汉大学国家网络安全学院 武汉 430070
    2 武汉大学计算机学院 武汉 430070
    3 浙江大学电气工程学院 杭州 310058
  • 发布日期:2023-11-09
  • 通讯作者: 龚雪鸾(xueluangong@whu.edu.cn)
  • 作者简介:(2020302181008@whu.edu.cn)

Efficient Video Super-Resolution with Latent Attention

WANG Yuji1, DONG Haocheng1, GONG Xueluan2, CHEN Yanjiao3   

  1. 1 School of Cyber Science and Engineering,Wuhan University,Wuhan 430070,China
    2 School of Computer Science,Wuhan University,Wuhan 430070,China
    3 College of Electrical Engineering,Zhejiang University,Hangzhou 310058,China
  • Published:2023-11-09
  • About author:WANG Yuji,born in 2002,undergra-duate.His main research interests include deep learning and computer vision.
    GONG Xueluan,born in 1996,Ph.D candidate.Her main research interest is network security.

摘要: 为了解决视频超分辨率的问题,可以对视频中的时空相关性信息加以利用,这是将低分辨率视频重建为高分辨率视频的一种行之有效的方法。之前的相关工作主要集中在利用运动补偿来捕捉视频生成中的时间依赖性,这种阶段性重建策略是低效的。相比运动补偿,注意力模型更能在寻找时空相关性中发挥作用。为了使注意力模型可以被应用于视频超分辨率问题,利用基于摊销变分推理的注意力估计构建潜在注意力模型,并设计了长程注意力模块和短程注意力模块两个有效的注意力功能模块。在此基础上构建出一个新型深度学习网络模型,它可以有效地捕捉视频超分辨率的时空相关性,并允许端到端学习。通过在公共视频数据集的广泛实验,可以证明该方法相比当前最先进的几种方法如SPMC,DUF-16L等具有更优越的性能。

关键词: 超分辨率, 深度学习, 潜在注意力, 变分推理, 高性能

Abstract: To solve the problem of video super-resolution,the spatio-temporal correlation information in videos can be utilized,which is an effective method for reconstructing low resolution videos into high-resolution videos.Prior works mainly focus on utilizing motion compensation to capture temporal dependency in video generation,leading to inefficient stage-wise modeling strategies.Compared to motion compensation,attention model is more efficient in the search for spatio-temporal correlation.In this paper,we formulate a latent attention model for attention estimation with amortized variational inference and instantiate two effective attention modules for video super-resolution.Based on it,a novel deep network model,which can capture spatio-temporal correlations efficiently for video super-resolution and admit end-to-end learning,is presented.Extensive experiments on public video datasets demonstrate the superior performance of our approach over several state-of-the-art methods like SPMC,DUF-16L.

Key words: Super-resolution, Deep learning, Latent attention, Variational inference, Efficient

中图分类号: 

  • TP391.41
[1]DONG C,LOY C C,HE K,et al.Image super-resolution using deep convolutional networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38(2):295-307.
[2]LIANG M,WANG X.Semantic segmentation model for remote sensing images combining super resolution and domain adaptation[J].Chinese Journal of Computers,2022,45(12):2619-2636.
[3]HE P H,YU Y,XU C Y.Image super-resolution reconstruction network based on dynamic pyramid and subspace attention[J].Computer Science,2022,49(S2):423-430.
[4]WU J,YE X J,HUANG F,et al.A review of single image super-resolution reconstruction based on deep learning[J].Chinese Journal of Electronics,2022,50(9):2265-2294.
[5]CABALLERO J,LEDIG C,AITKEN A,et al.Real-time video super-resolution with spatio-temporal networks and motion compensation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4778-4787.
[6]TAO X,GAO H,LIAO R,et al.Detail-revealing deep video super-resolution[J].arXiv:1704.02738.2017.
[7]KAPPELER A,YOO S,DAI Q,et al.Video super-resolutionwith convolutional neural networks[J].IEEE Transactions on Computational Imaging,2016,2(2):109-122.
[8]LIU D,WANG Z,FAN Y,et al.Robust video super-resolution with learned temporal dynamics[C]//IEEE International Conference on Computer Vision.2017:2507-2515.
[9]FU L H,SUN X W,ZHAO Y,et al.Fast video super-resolution reconstruction method based on motion feature fusion[J].Pattern Recognition and Artificial Intelligence,2019,32(11):1022-1031.
[10]SHI X J,CHEN Z,WANG H,et al.Convolutional lstm net-work:A machine learning approach for precipitation nowcasting[C]//Annual Conference on Neural Information Processing Systems.2015:802-810.
[11]FUOLI D,GU S,TIMOFTE R.Efficient video super-resolution through recurrent latent space propagation[C]//ICCVW.2019.
[12]DENG Y,KIM Y,CHIU J,et al.Latent alignment and variational attention[C]//Advances in Neural Information Proces-sing Systems.2018:9712-9724.
[13]WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//IEEE Conference on Computer Vision and Pattern Recognition.2018:7794-7803.
[14]WANG F,JIANG M,QIAN C,et al.Residual attention network for image classi?cation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:3156-3164.
[15]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[16]KIM J,LEE J K,LEE K M.Accurate image super-resolutionusing very deep convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:1646-1654.
[17]LEDIG C,THEIS L,HUSZ’AR F,et al.Photo-realistic single image super-resolution using a generative adversarial network[J].arXiv:1609.04802,2016.
[18]KIM J,LEE J K,LEE K M.Deeply-recursive convolutional network for image super-resolution[J].IEEE Conference on Computer Vision and Pattern Recognition,2016:1637-1645.
[19]TAI Y,YANG J,LIU X.Image super-resolution via deep recursive residual network[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017.
[20]SHI W,CABALLERO J,HUSZ′AR F,et al.Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:1874-1883.
[21]LIM B,SON S,KIM H,et al.Enhanced deep residual networks for single image super-resolution[C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017.
[22]PARK S J,SON H,CHO S,et al.Srfeat:Single image super-resolution with feature discrimination[C]//European Conference on Computer Vision.2018:439-455.
[23]TAI Y,YANG J,LIU X,et al.Memnet:A persistent memory network for image restoration[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4539-4547.
[24]ZHANG Y,TIAN Y,KONG Y,et al.Residual dense network for image super-resolution[J].arXiv:1802.08797,2018.
[25]SAJJADI M S,VEMULAPALLI R,BROWN M.Frame-recurrent video super-resolution[J].arXiv:1801.04590,2018.
[26]HUANG Y,WANG W,WANG L.Bidirectional recurrent con-volutional networks for multi-frame super-resolution[C]//An-nual Conference on Neural Information Processing Systems.2015:235-243.
[27]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[28]VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:A neural image caption generator[C]//IEEE Conference on Computer Vision and Pattern Recognition.2015:3156-3164.
[29]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[30]MNIH V,HEESS N,GRAVES A,et al.Recurrent models ofvisual attention[C]//Annual Conference on Neural Information Processing Systems.2014:2204-2212.
[31]LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[J].arXiv:1508.04025,2015.
[32]YAO L,TORABI A,CHO K,et al.Describing videos by exploiting temporal structure[C]//IEEE International Conference on Computer Vision.2015:4507-4515.
[33]WANG F,JIANG M,QIAN C,et al.Residual attention network for image classification[J].arXiv:1704.06904,2017.
[34]ZHOU C,NEUBIG G.Morphological in?ection generation with multi-space variational encoder-decoders[C]//CoNLL SIGMORPHON 2017 Shared Task:Universal Morphological Reinflection.2017:58-65.
[35]WILLIAM S,RONALD J.Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning[J].Machine Learning,1992,8(3/4):229-256.
[36]MAKHZANI A,SHLENS J,JAITLY N,et al.Adversarial autoencoders[J].arXiv:1511.05644,2015.
[37]LIU C,SUN D.A bayesian approach to adaptive video super resolution[C]//IEEE Conference on Computer Vision and Pattern Recognition.2011:209-216.
[38]HARMONIC I.Free 4K Demo Footage Center[OL].https://www.harmonicinc.com/free-4k-demo-footage/.
[39]PINSON M H.The consumer digital video library [J].IEEE Signal Processing Magazine,2013,30(4):172-174.
[40]SONG L,TANG X,ZHANG W,et al.The sjtu 4k video sequence dataset[C]//Quality of Multimedia Experience.IEEE,2013:34-35.
[41]DONG C,LOY C C,HE K,et al.Learning a deep convolutional network for image super-resolution[C]//European Conference on Computer Vision.Springer,2014:184-199.
[42]JO Y,WUG OH S,KANG J,et al.Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2018:3224-3232.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!