基于隐式对齐的视频超分辨率模型

doi:10.11896/jsjkx.240500069

Abstract

Abstract: Video contains both intra-frame spatial correlation and inter-frame temporal correlation.When reconstructing high-re-solution video from low-resolution video,adjacent multi-frame information can be aligned to guide the current frame recovery.Deformable convolution guided by optical flow is commonly used for explicit frame-by-frame alignment,this method overcomes the instability of deformable convolution,but will affect the recovery of high-frequency information in the frame,reduce the accuracy of the alignment information and magnify artifacts.To address these issues,this paper proposes IAVSR(Implicit Alignment Video Super-Resolution),a video super-resolution model based on implicit alignment.IAVSR encodes optical flow to specific pixel positions using offset and original values,calculating pre-alignment information instead of interpolating.Deformable convolution is used to realign pre-aligned features and recover high-frequency information.Bidirectional propagation uses information from the first two frames to guide current frame recovery,while a residual network structure improves alignment accuracy and avoids excessive parameter introduction.Experimental results on the REDS4 public dataset show that IAVSR achieves 0.6 dB higher PSNR value than the benchmark models and improves model convergence speed by 20% during training.

Key words: Video super resolution, Deformable convolution, Re-sampling, Implicit alignment, Optical flow

CLC Number:

TP391

WANG Fengling, WEI Aimin, PANG Xiongwen, LI Zhi, XIE Jingming. Video Super-resolution Model Based on Implicit Alignment[J].Computer Science, 2025, 52(8): 232-239.

References

[1]HUANG Z,HUANG A,HU X,et al.Scale-Adaptive FeatureAggregation for Efficient Space-Time Video Super-Resolution[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2024:4228-4239.
[2]ZHU Y,LI G.A Lightweight Recurrent Grouping AttentionNetwork for Video Super-Resolution[J].Sensors,2023,23(20):8574.
[3]CHEN Y H,CHEN S C,LIN Y Y,et al.MoTIF:Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:23131-23141.
[4]TUO Z,YANG H,FU J,et al.Learning data-driven vector-quantized degradation model for animation video super-resolution[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2023:13179-13189.
[5]WANG X,CHAN K C K,YU K,et al.Edvr:Video restoration with enhanced deformable convolutional networks [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019.
[6]CHAN K C K,WANG X,YU K,et al.Basicvsr:The search for essential components in video super-resolution and beyond[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4947-4956.
[7]ISOBE T,JIA X,GU S,et al.Video super-resolution with recurrent structure-detail network[C]//European Conference on Computer Vision.Cham:Springer,2020:645-660.
[8]CHAN K C K,ZHOU S,XU X,et al.Basicvsr++:Improving video super-resolution with enhanced propagation and alignment[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5972-5981.
[9]TIAN Y,ZHANG Y,FU Y,et al.Tdan:Temporally-deformable alignment network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3360-3369.
[10]KIM S Y,LIM J,NA T,et al.3dsrnet:Video super-resolution using 3d convolutional neural networks[J].arXiv:1812.09079,2018.
[11]LIN J,HUANG Y,WANG L,et al. FDAN:Flow-guided de-formable alignment network for video super-resolution[J].ar-Xiv:2105.05640,2021.
[12]ROTA C,BUZZELLI M,VAN DE WEIJER J.Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models[J].arXiv:2311.15908,2023.
[13]XU K,YU Z,WANG X,et al.An Implicit Alignment for Video Super-Resolution[J].arXiv:2305.00163,2023.
[14]LIU M,JIN S,YAO C,et al.Temporal Consistency Learning of Inter-Frames for Video Super-Resolution[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,33(4):1507-1520.
[15]PRIESSNER M,GABORIAU D C A,SHERIDAN A,et al.Content-aware frame interpolation(CAFI):Deep Learning-based temporal super-resolution for fast bioimaging[J].Nature Methods,2024,21(2):322-330.
[16]LI A,ZHANG L,LIU Y,et al.Feature Modulation Transfor-mer:Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:12514-12524.
[17]YIN Z,LIU M,LI X,et al.MetaF2N:Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:13033-13044
[18]WEI P,SUN Y,GUO X,et al.Towards Real-World Burst Image Super-Resolution:Benchmark and Method[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:13233-13242.
[19]WANG X,YU K,WU S,et al.Esrgan:Enhanced super-resolution generative adversarial networks[C]//Proceedings of the European Conference on Computer Vision(ECCV) Workshops.
[20]LIANG J,FAN Y,XIANG X,et al.Recurrent video restoration transformer with guided deformable attention[J].Advances in Neural Information Processing Systems,2022,35:378-393.
[21]SHI S,GU J,XIE L,et al.Rethinking alignment in video super-resolution transformers[J].Advances in Neural Information Processing Systems,2022,35:36081-36093.
[22]SHI W,CABALLERO J,HUSAZR F,et al.Real-time singleimage and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1874-1883.
[23]CHARBONNIER P,BLANC-FERAUD L,AUBERT G,et al.Two deterministic half-quadratic regularization algorithms for computed imaging[C]//Proceedings of 1st International Confe-rence on Image Processing.IEEE,1994:168-172.
[24]XUE T,CHEN B,WU J,et al.Video enhancement with task-oriented flow[J].International Journal of Computer Vision,2019,127:1106-1125.
[25]NAH S,BAIK S,HONG S,et al.Ntire 2019 challenge on video deblurring and super-resolution:Dataset and study[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019.
[26]LIU C,SUN D.On Bayesian adaptive video super resolution[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,36(2):346-360.
[27]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[28]RANJAN A,BLACK M J.Optical flow estimation using a spatial pyramid network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4161-4170.
[29]HARIS M,SHAKHNAROVICH G,UKITA N.Recurrentback-projection network for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3897-3906.
[30]LIU C,YANG H,FU J,et al.Learning trajectory-aware transformer for video super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5687-5696.
[31]QING T,YING X,SHA Z,et al.Video Super-Resolution with Pyramid Flow-Guided Deformable Alignment Network[C]//2023 3rd International Conference on Electrical Engineering and Mechatronics Technology.IEEE,2023:758-764.
[32]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[33]ZHONG Z,CAO M,JI X,et al.Blur Interpolation Transformer for Real-World Motion from Blur[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:5713-5723.
[34]YUE H,CAO C,LIAO L,et al.RViDeformer:Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset[J].arXiv:2305.00767,2023.

Related Articles 15

[1]	ZHANG Huazhong, PAN Yuekai, TU Xiaoguang, LIU Jianhua, XU Luopeng, ZHOU Chao. Facial Expression Recognition Integrating 3D Facial Dynamic Information and Optical Flow Information [J]. Computer Science, 2024, 51(6A): 230700210-7.
[2]	XU Bangwu, WU Qin, ZHOU Haojie. Appearance Fusion Based Motion-aware Architecture for Moving Object Segmentation [J]. Computer Science, 2024, 51(3): 155-164.
[3]	JIANG Sheng, ZHU Jianhong. Face Micro-expression Recognition Method Based on ME-ResNet [J]. Computer Science, 2024, 51(11A): 231000053-7.
[4]	CHENG Yan. Facial Forgery Detection Based on Key Frames and Fused Spatial-Temporal Features [J]. Computer Science, 2024, 51(11): 191-197.
[5]	BAI Mingli, WANG Mingwen. Fabric Defect Detection Algorithm Based on Improved Cascade R-CNN [J]. Computer Science, 2023, 50(6A): 220300224-6.
[6]	LONG Tao, DONG Anguo, LIU Laijun. Pavement Crack Detection Based on Attention Mechanism and Deformable Convolution [J]. Computer Science, 2023, 50(6A): 220300214-6.
[7]	LI Haochen, CAO Fuyuan, QIAO Shichang. Unbiased Scene Graph Generation Based on Adaptive Regularization Algorithm [J]. Computer Science, 2023, 50(10): 104-111.
[8]	LENG Jia-xu, TAN Ming-pi, HU Bo, GAO Xin-bo. Video Anomaly Detection Based on Implicit View Transformation [J]. Computer Science, 2022, 49(2): 142-148.
[9]	SUN Chang-di, PAN Zhi-song, ZHANG Yan-yan. Re-lightweight Method of MobileNet Based on Low-cost Deformable Convolution [J]. Computer Science, 2022, 49(12): 312-318.
[10]	CHEN Jin-ling, CHENG Mao-kai, XU Zi-han. Improved FCOS Target Detection Algorithm [J]. Computer Science, 2022, 49(11A): 210900220-6.
[11]	MENG Xiang-yu, XUE Xin-wei, LI Wen-lin, WANG Yi. Motion-estimation Based Space-temporal Feature Aggregation Network for Multi-frames Rain Removal [J]. Computer Science, 2021, 48(5): 170-176.
[12]	WANG Kun-lun, LIU Wen-can, HE Xiao-hai, QING Lin-bo, WU Xiao-hong. Motion Feature Descriptor for Abnormal Behavior Detection [J]. Computer Science, 2020, 47(4): 119-124.
[13]	GUO Lan-ying, HAN Rui-zhi, CHENG Xin. Digital Instrument Identification Method Based on Deformable Convolutional Neural Network [J]. Computer Science, 2020, 47(10): 187-193.
[14]	WANG Zheng-ning, ZHOU Yang, LV Xia, ZENG Fan-wei, ZHANG Xiang, ZHANG Feng-jun. Improved MDP Tracking Method by Combining 2D and 3D Information [J]. Computer Science, 2019, 46(3): 97-102.
[15]	XU Deng, HUANG Xiao-dong. Fire Images Features Extraction Based on Improved Two-stream Convolution Network [J]. Computer Science, 2019, 46(11): 291-296.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Video Super-resolution Model Based on Implicit Alignment

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0