计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 232-239.doi: 10.11896/jsjkx.240500069

• 计算机图形学&多媒体 • 上一篇    下一篇

基于隐式对齐的视频超分辨率模型

王凤玲1, 魏爱敏2, 庞雄文3, 李智1, 谢景明4   

  1. 1 华南师范大学人工智能学院 广东 佛山 528000
    2 广州番禺职业技术学院建筑工程学院 广州 511483
    3 华南师范大学计算机学院 广州 510555
    4 广州商学院信息技术与工程学院 广州 511363
  • 收稿日期:2024-05-20 修回日期:2024-09-06 出版日期:2025-08-15 发布日期:2025-08-08
  • 通讯作者: 谢景明(32959247@qq.com)
  • 作者简介:(wfl314159@163.com)
  • 基金资助:
    2022年广州市科技局基础与应用基础研究项目(20220101185)

Video Super-resolution Model Based on Implicit Alignment

WANG Fengling1, WEI Aimin2, PANG Xiongwen3, LI Zhi1, XIE Jingming4   

  1. 1 School of Artificial Intelligence,South China Normal University,Foshan,Guangdong 528000,China
    2 School of Architectural Engineering,Guangzhou Panyu Polytechnic College,Guangzhou 511483,China
    3 School of Computer Science,South China Normal University,Guangzhou 510555,China
    4 School of Information Technology & Engineering,Guangzhou College of Commerce,Guangzhou 511363,China
  • Received:2024-05-20 Revised:2024-09-06 Online:2025-08-15 Published:2025-08-08
  • About author:WANG Fengling,born in 2000,postgraduate.Her main research interests include video super-resolution and time series.
    XIE Jingming,born in 1977,Ph.D,professor.His main research interests include artificial intelligence technology application and so on.
  • Supported by:
    2022 Guangzhou Science and Technology Bureau Basic and Application Basic Research Project(20220101185).

摘要: 视频帧之间不仅具有空间相关性,还存在时间相关性。根据低分辨率视频重建高分辨率视频时,可以利用相邻的多帧信息对齐到目标帧,以指导当前帧的恢复。相邻帧之间的对齐一般采用光流指导的可变形卷积进行显式对齐,这种方法克服了可变形卷积的不稳定性,但会影响帧中高频信息的恢复,降低对齐信息的准确性并放大伪影。为解决上述问题,提出了一种基于隐式对齐的视频超分模型IAVSR(Implicit Alignment Video Super-Resolution)。IAVSR通过偏移量和原始值将光流编码到特定像素位置,以此计算光流预对齐的信息而不是利用插值函数插值获得,随后利用光流指导的可变形卷积对计算后的预对齐特征进行重对齐,以帮助高频信息的恢复。在双向传播中利用前两帧传播的信息进行对齐来指导当前帧的恢复,并引入残差网络结构,在提高对齐信息准确性的同时避免引入过多的参数。在REDS4公开数据集上的实验结果表明,IAVSR的峰值信噪比(PSNR)比基准模型提高了0.6 dB,且模型训练时的收敛速度提升了20%。

关键词: 视频超分辨率, 可变形卷积, 重采样, 隐式对齐, 光流

Abstract: Video contains both intra-frame spatial correlation and inter-frame temporal correlation.When reconstructing high-re-solution video from low-resolution video,adjacent multi-frame information can be aligned to guide the current frame recovery.Deformable convolution guided by optical flow is commonly used for explicit frame-by-frame alignment,this method overcomes the instability of deformable convolution,but will affect the recovery of high-frequency information in the frame,reduce the accuracy of the alignment information and magnify artifacts.To address these issues,this paper proposes IAVSR(Implicit Alignment Video Super-Resolution),a video super-resolution model based on implicit alignment.IAVSR encodes optical flow to specific pixel positions using offset and original values,calculating pre-alignment information instead of interpolating.Deformable convolution is used to realign pre-aligned features and recover high-frequency information.Bidirectional propagation uses information from the first two frames to guide current frame recovery,while a residual network structure improves alignment accuracy and avoids excessive parameter introduction.Experimental results on the REDS4 public dataset show that IAVSR achieves 0.6 dB higher PSNR value than the benchmark models and improves model convergence speed by 20% during training.

Key words: Video super resolution, Deformable convolution, Re-sampling, Implicit alignment, Optical flow

中图分类号: 

  • TP391
[1]HUANG Z,HUANG A,HU X,et al.Scale-Adaptive FeatureAggregation for Efficient Space-Time Video Super-Resolution[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2024:4228-4239.
[2]ZHU Y,LI G.A Lightweight Recurrent Grouping AttentionNetwork for Video Super-Resolution[J].Sensors,2023,23(20):8574.
[3]CHEN Y H,CHEN S C,LIN Y Y,et al.MoTIF:Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:23131-23141.
[4]TUO Z,YANG H,FU J,et al.Learning data-driven vector-quantized degradation model for animation video super-resolution[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2023:13179-13189.
[5]WANG X,CHAN K C K,YU K,et al.Edvr:Video restoration with enhanced deformable convolutional networks [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019.
[6]CHAN K C K,WANG X,YU K,et al.Basicvsr:The search for essential components in video super-resolution and beyond[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4947-4956.
[7]ISOBE T,JIA X,GU S,et al.Video super-resolution with recurrent structure-detail network[C]//European Conference on Computer Vision.Cham:Springer,2020:645-660.
[8]CHAN K C K,ZHOU S,XU X,et al.Basicvsr++:Improving video super-resolution with enhanced propagation and alignment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5972-5981.
[9]TIAN Y,ZHANG Y,FU Y,et al.Tdan:Temporally-deformable alignment network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3360-3369.
[10]KIM S Y,LIM J,NA T,et al.3dsrnet:Video super-resolution using 3d convolutional neural networks[J].arXiv:1812.09079,2018.
[11]LIN J,HUANG Y,WANG L,et al. FDAN:Flow-guided de-formable alignment network for video super-resolution[J].ar-Xiv:2105.05640,2021.
[12]ROTA C,BUZZELLI M,VAN DE WEIJER J.Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models[J].arXiv:2311.15908,2023.
[13]XU K,YU Z,WANG X,et al.An Implicit Alignment for Video Super-Resolution[J].arXiv:2305.00163,2023.
[14]LIU M,JIN S,YAO C,et al.Temporal Consistency Learning of Inter-Frames for Video Super-Resolution[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,33(4):1507-1520.
[15]PRIESSNER M,GABORIAU D C A,SHERIDAN A,et al.Content-aware frame interpolation(CAFI):Deep Learning-based temporal super-resolution for fast bioimaging[J].Nature Methods,2024,21(2):322-330.
[16]LI A,ZHANG L,LIU Y,et al.Feature Modulation Transfor-mer:Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:12514-12524.
[17]YIN Z,LIU M,LI X,et al.MetaF2N:Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:13033-13044
[18]WEI P,SUN Y,GUO X,et al.Towards Real-World Burst Image Super-Resolution:Benchmark and Method[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:13233-13242.
[19]WANG X,YU K,WU S,et al.Esrgan:Enhanced super-resolution generative adversarial networks[C]//Proceedings of the European Conference on Computer Vision(ECCV) Workshops.
[20]LIANG J,FAN Y,XIANG X,et al.Recurrent video restoration transformer with guided deformable attention[J].Advances in Neural Information Processing Systems,2022,35:378-393.
[21]SHI S,GU J,XIE L,et al.Rethinking alignment in video super-resolution transformers[J].Advances in Neural Information Processing Systems,2022,35:36081-36093.
[22]SHI W,CABALLERO J,HUSAZR F,et al.Real-time singleimage and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1874-1883.
[23]CHARBONNIER P,BLANC-FERAUD L,AUBERT G,et al.Two deterministic half-quadratic regularization algorithms for computed imaging[C]//Proceedings of 1st International Confe-rence on Image Processing.IEEE,1994:168-172.
[24]XUE T,CHEN B,WU J,et al.Video enhancement with task-oriented flow[J].International Journal of Computer Vision,2019,127:1106-1125.
[25]NAH S,BAIK S,HONG S,et al.Ntire 2019 challenge on video deblurring and super-resolution:Dataset and study[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019.
[26]LIU C,SUN D.On Bayesian adaptive video super resolution[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(2):346-360.
[27]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[28]RANJAN A,BLACK M J.Optical flow estimation using a spatial pyramid network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4161-4170.
[29]HARIS M,SHAKHNAROVICH G,UKITA N.Recurrentback-projection network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3897-3906.
[30]LIU C,YANG H,FU J,et al.Learning trajectory-aware transformer for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5687-5696.
[31]QING T,YING X,SHA Z,et al.Video Super-Resolution with Pyramid Flow-Guided Deformable Alignment Network[C]//2023 3rd International Conference on Electrical Engineering and Mechatronics Technology.IEEE,2023:758-764.
[32]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[33]ZHONG Z,CAO M,JI X,et al.Blur Interpolation Transformer for Real-World Motion from Blur[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:5713-5723.
[34]YUE H,CAO C,LIAO L,et al.RViDeformer:Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset[J].arXiv:2305.00767,2023.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!