基于潜在注意力的高性能视频超分辨率技术

doi:10.11896/jsjkx.221100156

摘要/Abstract

摘要： 为了解决视频超分辨率的问题,可以对视频中的时空相关性信息加以利用,这是将低分辨率视频重建为高分辨率视频的一种行之有效的方法。之前的相关工作主要集中在利用运动补偿来捕捉视频生成中的时间依赖性,这种阶段性重建策略是低效的。相比运动补偿,注意力模型更能在寻找时空相关性中发挥作用。为了使注意力模型可以被应用于视频超分辨率问题,利用基于摊销变分推理的注意力估计构建潜在注意力模型,并设计了长程注意力模块和短程注意力模块两个有效的注意力功能模块。在此基础上构建出一个新型深度学习网络模型,它可以有效地捕捉视频超分辨率的时空相关性,并允许端到端学习。通过在公共视频数据集的广泛实验,可以证明该方法相比当前最先进的几种方法如SPMC,DUF-16L等具有更优越的性能。

关键词: 超分辨率, 深度学习, 潜在注意力, 变分推理, 高性能

Abstract: To solve the problem of video super-resolution,the spatio-temporal correlation information in videos can be utilized,which is an effective method for reconstructing low resolution videos into high-resolution videos.Prior works mainly focus on utilizing motion compensation to capture temporal dependency in video generation,leading to inefficient stage-wise modeling strategies.Compared to motion compensation,attention model is more efficient in the search for spatio-temporal correlation.In this paper,we formulate a latent attention model for attention estimation with amortized variational inference and instantiate two effective attention modules for video super-resolution.Based on it,a novel deep network model,which can capture spatio-temporal correlations efficiently for video super-resolution and admit end-to-end learning,is presented.Extensive experiments on public video datasets demonstrate the superior performance of our approach over several state-of-the-art methods like SPMC,DUF-16L.

Key words: Super-resolution, Deep learning, Latent attention, Variational inference, Efficient

中图分类号:

TP391.41

王宇骥, 董昊呈, 龚雪鸾, 陈艳姣. 基于潜在注意力的高性能视频超分辨率技术[J]. 计算机科学, 2023, 50(11A): 221100156-10. https://doi.org/10.11896/jsjkx.221100156

WANG Yuji, DONG Haocheng, GONG Xueluan, CHEN Yanjiao. Efficient Video Super-Resolution with Latent Attention[J]. Computer Science, 2023, 50(11A): 221100156-10. https://doi.org/10.11896/jsjkx.221100156

参考文献

[1]DONG C,LOY C C,HE K,et al.Image super-resolution using deep convolutional networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38(2):295-307.
[2]LIANG M,WANG X.Semantic segmentation model for remote sensing images combining super resolution and domain adaptation[J].Chinese Journal of Computers,2022,45(12):2619-2636.
[3]HE P H,YU Y,XU C Y.Image super-resolution reconstruction network based on dynamic pyramid and subspace attention[J].Computer Science,2022,49(S2):423-430.
[4]WU J,YE X J,HUANG F,et al.A review of single image super-resolution reconstruction based on deep learning[J].Chinese Journal of Electronics,2022,50(9):2265-2294.
[5]CABALLERO J,LEDIG C,AITKEN A,et al.Real-time video super-resolution with spatio-temporal networks and motion compensation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4778-4787.
[6]TAO X,GAO H,LIAO R,et al.Detail-revealing deep video super-resolution[J].arXiv:1704.02738.2017.
[7]KAPPELER A,YOO S,DAI Q,et al.Video super-resolutionwith convolutional neural networks[J].IEEE Transactions on Computational Imaging,2016,2(2):109-122.
[8]LIU D,WANG Z,FAN Y,et al.Robust video super-resolution with learned temporal dynamics[C]//IEEE International Conference on Computer Vision.2017:2507-2515.
[9]FU L H,SUN X W,ZHAO Y,et al.Fast video super-resolution reconstruction method based on motion feature fusion[J].Pattern Recognition and Artificial Intelligence,2019,32(11):1022-1031.
[10]SHI X J,CHEN Z,WANG H,et al.Convolutional lstm net-work:A machine learning approach for precipitation nowcasting[C]//Annual Conference on Neural Information Processing Systems.2015:802-810.
[11]FUOLI D,GU S,TIMOFTE R.Efficient video super-resolution through recurrent latent space propagation[C]//ICCVW.2019.
[12]DENG Y,KIM Y,CHIU J,et al.Latent alignment and variational attention[C]//Advances in Neural Information Proces-sing Systems.2018:9712-9724.
[13]WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//IEEE Conference on Computer Vision and Pattern Recognition.2018:7794-7803.
[14]WANG F,JIANG M,QIAN C,et al.Residual attention network for image classi?cation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:3156-3164.
[15]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[16]KIM J,LEE J K,LEE K M.Accurate image super-resolutionusing very deep convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:1646-1654.
[17]LEDIG C,THEIS L,HUSZ’AR F,et al.Photo-realistic single image super-resolution using a generative adversarial network[J].arXiv:1609.04802,2016.
[18]KIM J,LEE J K,LEE K M.Deeply-recursive convolutional network for image super-resolution[J].IEEE Conference on Computer Vision and Pattern Recognition,2016:1637-1645.
[19]TAI Y,YANG J,LIU X.Image super-resolution via deep recursive residual network[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017.
[20]SHI W,CABALLERO J,HUSZ′AR F,et al.Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//IEEE Conference on Computer Vision and Pattern Recognition.2016:1874-1883.
[21]LIM B,SON S,KIM H,et al.Enhanced deep residual networks for single image super-resolution[C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops.2017.
[22]PARK S J,SON H,CHO S,et al.Srfeat:Single image super-resolution with feature discrimination[C]//European Conference on Computer Vision.2018:439-455.
[23]TAI Y,YANG J,LIU X,et al.Memnet:A persistent memory network for image restoration[C]//IEEE Conference on Computer Vision and Pattern Recognition.2017:4539-4547.
[24]ZHANG Y,TIAN Y,KONG Y,et al.Residual dense network for image super-resolution[J].arXiv:1802.08797,2018.
[25]SAJJADI M S,VEMULAPALLI R,BROWN M.Frame-recurrent video super-resolution[J].arXiv:1801.04590,2018.
[26]HUANG Y,WANG W,WANG L.Bidirectional recurrent con-volutional networks for multi-frame super-resolution[C]//An-nual Conference on Neural Information Processing Systems.2015:235-243.
[27]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[28]VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:A neural image caption generator[C]//IEEE Conference on Computer Vision and Pattern Recognition.2015:3156-3164.
[29]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[30]MNIH V,HEESS N,GRAVES A,et al.Recurrent models ofvisual attention[C]//Annual Conference on Neural Information Processing Systems.2014:2204-2212.
[31]LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[J].arXiv:1508.04025,2015.
[32]YAO L,TORABI A,CHO K,et al.Describing videos by exploiting temporal structure[C]//IEEE International Conference on Computer Vision.2015:4507-4515.
[33]WANG F,JIANG M,QIAN C,et al.Residual attention network for image classification[J].arXiv:1704.06904,2017.
[34]ZHOU C,NEUBIG G.Morphological in?ection generation with multi-space variational encoder-decoders[C]//CoNLL SIGMORPHON 2017 Shared Task:Universal Morphological Reinflection.2017:58-65.
[35]WILLIAM S,RONALD J.Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning[J].Machine Learning,1992,8(3/4):229-256.
[36]MAKHZANI A,SHLENS J,JAITLY N,et al.Adversarial autoencoders[J].arXiv:1511.05644,2015.
[37]LIU C,SUN D.A bayesian approach to adaptive video super resolution[C]//IEEE Conference on Computer Vision and Pattern Recognition.2011:209-216.
[38]HARMONIC I.Free 4K Demo Footage Center[OL].https://www.harmonicinc.com/free-4k-demo-footage/.
[39]PINSON M H.The consumer digital video library [J].IEEE Signal Processing Magazine,2013,30(4):172-174.
[40]SONG L,TANG X,ZHANG W,et al.The sjtu 4k video sequence dataset[C]//Quality of Multimedia Experience.IEEE,2013:34-35.
[41]DONG C,LOY C C,HE K,et al.Learning a deep convolutional network for image super-resolution[C]//European Conference on Computer Vision.Springer,2014:184-199.
[42]JO Y,WUG OH S,KANG J,et al.Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation[C]//IEEE Conference on Computer Vision and Pattern Recognition.2018:3224-3232.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed