计算机科学 ›› 2025, Vol. 52 ›› Issue (11): 150-156.doi: 10.11896/jsjkx.240800026

• 计算机图形学&多媒体 • 上一篇    下一篇

一种3D可变形卷积结合Transformer的视频压缩感知方法

杜秀丽, 朱金耀, 高星, 吕亚娜, 邱少明   

  1. 大连大学通信与网络重点实验室 辽宁 大连 116622
    大连大学信息工程学院 辽宁 大连 116622
  • 收稿日期:2024-08-03 修回日期:2025-02-20 出版日期:2025-11-15 发布日期:2025-11-06
  • 通讯作者: 朱金耀(zhujinyao@s.dlu.edu.cn)
  • 作者简介:(duxiuli@dlu.edu.cn)
  • 基金资助:
    辽宁省教育厅项目(JYTMS20230377)

Video Compressed Sensing Method with Integrated Deformable 3D Convolution and Transformer

DU Xiuli, ZHU Jinyao, GAO Xing, LYU Yana, QIU Shaoming   

  1. Key Laboratory of Communication and Network,Dalian University,Dalian,Liaoning 116622,China
    School of Information Engineering,Dalian University,Dalian,Liaoning 116622,China
  • Received:2024-08-03 Revised:2025-02-20 Online:2025-11-15 Published:2025-11-06
  • About author:DU Xiuli,born in 1977,professor,is a member of CCF(No.22427M).Her main research interests include compressed sensing and EEG signal processing.
    ZHU Jinyao,born in 1999,postgra-duate.His main research interests include video compressed sensing and so on.
  • Supported by:
    Liaoning Provincial Department of Education(JYTMS20230377).

摘要: 面对视频的分辨率越来越高导致数据量越来越大的挑战,以更低的采样率实现视频的高质量重构可降低对通信资源的占用,进而降低采样端的部署难度。然而,现有的视频压缩感知方法对视频的帧间相关性无法充分利用,低采样率下的视频重构质量有待进一步提高。随着深度学习技术的引入,基于深度学习的分布式视频压缩感知给视频压缩感知重构提供了新思路。因此,结合3D可变形卷积与Transformer构建CS3Dformer网络,利用3D可变形卷积捕获视频的局部特征和时空特征的有效性,学习视频帧间的时空特征;同时,利用Transformer捕获长距离依赖特征的优点,一定程度上弥补了卷积神经网络方法在捕获图像的非局部相似性方面的缺陷,能更好地实现对视频的建模。所提方法是一种端到端的视频压缩感知方法,在多个数据集上的实验结果验证了该方法的有效性。

关键词: 压缩感知, 视频重构, 可变形卷积, Transformer, 卷积神经网络

Abstract: Facing the challenge of increasing data volume due to higher resolution of video,realizing high quality video reconstruction with lower sampling rate can reduce the consumption of communication resources and thus reduce the difficulty of deployment at the sampling end.However,the existing video compressed sensing methods cannot fully utilize the inter-frame correlation of the video,and the reconstruction quality of the video at low sampling rates needs to be further improved.With the introduction of deep learning technology,distributed video compression sensing based on deep learning provides new ideas for video compression sensing reconstruction.Therefore,this paper combines 3D deformable convolution with Transformer to construct CS3Dformer network,which utilizes the effectiveness of 3D deformable convolutional network in capturing local and spatio-temporal features of video and learns spatio-temporal features between video frames,and at the same time,utilizes the advantages of Transformer in capturing long-range dependency features,which compensates to some extent for the advantages of convolutional neural network method in capturing the non-local similarity of the defects of image,and better realize the modeling of the video.This method is an end-to-end video compression perception method,the experimental results on multiple datasets verify the effectiveness of the proposed method.

Key words: Compressive sensing, Video reconstruction, Deformable convolution, Transformer, Convolutional nerul network

中图分类号: 

  • TN919.81
[1]DONOHO D L.Compressed sensing [J].IEEE Transation on Information Theory,2006,52(4):1289-1306.
[2]CANDÈS E J,TAO T.Near-optimal signal recovery from random projections:Universal encoding strategies? [J].IEEE Transaction on Information Theory,2006,52(12):5406-5425.
[3]CANDES E J,WAKIN M B. An introduction to compressive sampling [J].IEEE Signal Processing Magazine,2008,25(2):21-30.
[4]SHI W,JIANG F,ZHANG S,et al.Deep networks for compressed image sensing[C]//2017 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2017:877-882.
[5]VEERARAGHAVAN A,REDDY D,RASKAR R.Coded Strobing Photography:Compressive Sensing of High Speed Periodic Videos [J].IEEE Transaction on Pattern Analysis and Machine Intelligence,2011,33(4):671-686.
[6]DO T T,CHEN Y,NGUYEN D T,et al.Distributed Com-pressed Video Sensing [C]//2009 16th IEEE International Conference on Image Processing(ICIP).IEEE,2009:1393-1396.
[7]OU Y F,LIU T,ZHAO Z,et al.Modeling the impact of frame rate on perceptual quality of video [C]//Proceedings of the IEEE Conference on Image Processing.2008:689-692.
[8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[9]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[10]YE D J,NI Z K,WANG H L,et al.CSformer:Bridging convolution and Transformer for Compressive Sensing[J].IEEE Transactions on Image Processing,2023,32:2827-2842.
[11]SHEN M H,GAN H P,MA C Y,et al.MTC-CSNet:Marrying transformer and convolution for image compressed sensing[J].IEEE Transactions on Cybernetics,2024,54(9):4949-4961.
[12]YANG Z Y,PAN J J,DAI J,et al.Self-supervised lightweight depth estimation in endoscopy combining CNN and Transformer[J].IEEE Transactions on Medical Imaging,2024,43(5):1934-1944.
[13]LIU J L,GONG M G,GAO Y,et al.Bidirectional interaction of CNN and Transformer for image inpainting[J].Knowledge-Based Systems.2024,299:112046.
[14]DUAN Z,LUO X,ZHANG T.Combining transformers withCNN for multi-focus image fusion[J].Expert Systems with Applications,2023,235:12115.
[15]XU K,REN F.CSVideoNet:A real-time end-to-end learningframework for high-frame-rate video compressive sensing [C]//Proceedings of the IEEE Conference on Computer Vision Pattern Recognition.2018.
[16]ZHAO Z,XIE X,LIU W,et al.A hybrid-3d convolutional network for video compressive sensing [J].IEEE Access,2020,8:20503-20513.
[17]CHEN C,ZHOU C,ZHANG D Y.Adaptive Reconstruction for Distributed Compressive Video Sensing Based on Text features.[J].Chinese Journal of Sensors and Actuators,2024,37(1):58-63.
[18]SHI W,LIU S,JIANG F,et al.Video compressed sensing using a convolutional neural network [J].IEEE Transactions on Circuits System and Video Technology,2021,31(2):425-438.
[19]YANG J,WANG H X,FAN Y B,et al.VCSL:Video compressive sensing with low-complexity ROI detection in compressed domain [C]//Proceedings of the IEEE Conference on Data Compression.2023.
[20]YANG J,WANG H X,TANGUCHI I,et al.AVCSR:Adaptive video compressive sensing using region-of-interest detection in the compressed domain [J].IEEE Multimedia,2023,31(1):19-32.
[21]ZHAO C,MA S,ZHANG J,et al.Video compressive sensing reconstruction via reweighted residual sparsity[J].IEEE Transactions on Circuits and Systems for Video Technology,2017,27(6):1182-1195.
[22]ZHONG Y H,ZHANG C X,YANG X,et al.Video compressed sensing reconstruction via an untrained network with low-rank regularization [J].IEEE Transaction on Multimedia,2023,26:4590-460.
[23]TRAMEL E M,FOWLER J E.Video compressed sensing with multihypothesis [C]//Proceedings of the IEEE Conference on Data Compression.2011:193-202.
[24]DU X L,HU X,CHENG B,et al.Multi-hypothesis Reconstruction Algorithm of DCVS Based on Weighted Non-local Similarity [J].Computer Science,2019,46(1):291-296.
[25]SUN R H,LIU H,DENG K L,et al.Window-adaptive Reconstruction for Low-delay Video Compressive Sensing [J].Chinese Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2374-2383.
[26]YING X Y,WANG L G,WANG Y Q,et al.Deformable 3D convolution for Video Super-Resolution [J].IEEE Signal Proces-sing Letters,2020,27:1500-1504.
[27]PAN Z M,TAN Y L,ZHENG H,et al.Block-based Compressed Sensing of Image Reconstruction Based on Deep Neural Network[J].Computer Scienc,2022,49(S2):510-518.
[28]SOOMRO K,ZAMIR A R,SHAH M.UCF101:A Dataset of101 Human Actions Classes From Videos in The Wild [J].ar-Xiv:2012,1212,0402.
[29]KINGMA D P,BA J.Adam:A method for stochastic optimization [C]//Proceedings of the IEEE Conference on International Conference on Learning Representations.2015.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!