一种3D可变形卷积结合Transformer的视频压缩感知方法

doi:10.11896/jsjkx.240800026

Abstract

Abstract: Facing the challenge of increasing data volume due to higher resolution of video,realizing high quality video reconstruction with lower sampling rate can reduce the consumption of communication resources and thus reduce the difficulty of deployment at the sampling end.However,the existing video compressed sensing methods cannot fully utilize the inter-frame correlation of the video,and the reconstruction quality of the video at low sampling rates needs to be further improved.With the introduction of deep learning technology,distributed video compression sensing based on deep learning provides new ideas for video compression sensing reconstruction.Therefore,this paper combines 3D deformable convolution with Transformer to construct CS3Dformer network,which utilizes the effectiveness of 3D deformable convolutional network in capturing local and spatio-temporal features of video and learns spatio-temporal features between video frames,and at the same time,utilizes the advantages of Transformer in capturing long-range dependency features,which compensates to some extent for the advantages of convolutional neural network method in capturing the non-local similarity of the defects of image,and better realize the modeling of the video.This method is an end-to-end video compression perception method,the experimental results on multiple datasets verify the effectiveness of the proposed method.

Key words: Compressive sensing, Video reconstruction, Deformable convolution, Transformer, Convolutional nerul network

CLC Number:

TN919.81

DU Xiuli, ZHU Jinyao, GAO Xing, LYU Yana, QIU Shaoming. Video Compressed Sensing Method with Integrated Deformable 3D Convolution and Transformer[J].Computer Science, 2025, 52(11): 150-156.

References

[1]DONOHO D L.Compressed sensing [J].IEEE Transation on Information Theory,2006,52(4):1289-1306.
[2]CANDÈS E J,TAO T.Near-optimal signal recovery from random projections:Universal encoding strategies? [J].IEEE Transaction on Information Theory,2006,52(12):5406-5425.
[3]CANDES E J,WAKIN M B. An introduction to compressive sampling [J].IEEE Signal Processing Magazine,2008,25(2):21-30.
[4]SHI W,JIANG F,ZHANG S,et al.Deep networks for compressed image sensing[C]//2017 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2017:877-882.
[5]VEERARAGHAVAN A,REDDY D,RASKAR R.Coded Strobing Photography:Compressive Sensing of High Speed Periodic Videos [J].IEEE Transaction on Pattern Analysis and Machine Intelligence,2011,33(4):671-686.
[6]DO T T,CHEN Y,NGUYEN D T,et al.Distributed Com-pressed Video Sensing [C]//2009 16^th IEEE International Conference on Image Processing(ICIP).IEEE,2009:1393-1396.
[7]OU Y F,LIU T,ZHAO Z,et al.Modeling the impact of frame rate on perceptual quality of video [C]//Proceedings of the IEEE Conference on Image Processing.2008:689-692.
[8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[9]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16×16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[10]YE D J,NI Z K,WANG H L,et al.CSformer:Bridging convolution and Transformer for Compressive Sensing[J].IEEE Transactions on Image Processing,2023,32:2827-2842.
[11]SHEN M H,GAN H P,MA C Y,et al.MTC-CSNet:Marrying transformer and convolution for image compressed sensing[J].IEEE Transactions on Cybernetics,2024,54(9):4949-4961.
[12]YANG Z Y,PAN J J,DAI J,et al.Self-supervised lightweight depth estimation in endoscopy combining CNN and Transformer[J].IEEE Transactions on Medical Imaging,2024,43(5):1934-1944.
[13]LIU J L,GONG M G,GAO Y,et al.Bidirectional interaction of CNN and Transformer for image inpainting[J].Knowledge-Based Systems.2024,299:112046.
[14]DUAN Z,LUO X,ZHANG T.Combining transformers withCNN for multi-focus image fusion[J].Expert Systems with Applications,2023,235:12115.
[15]XU K,REN F.CSVideoNet:A real-time end-to-end learningframework for high-frame-rate video compressive sensing [C]//Proceedings of the IEEE Conference on Computer Vision Pattern Recognition.2018.
[16]ZHAO Z,XIE X,LIU W,et al.A hybrid-3d convolutional network for video compressive sensing [J].IEEE Access,2020,8:20503-20513.
[17]CHEN C,ZHOU C,ZHANG D Y.Adaptive Reconstruction for Distributed Compressive Video Sensing Based on Text features.[J].Chinese Journal of Sensors and Actuators,2024,37(1):58-63.
[18]SHI W,LIU S,JIANG F,et al.Video compressed sensing using a convolutional neural network [J].IEEE Transactions on Circuits System and Video Technology,2021,31(2):425-438.
[19]YANG J,WANG H X,FAN Y B,et al.VCSL:Video compressive sensing with low-complexity ROI detection in compressed domain [C]//Proceedings of the IEEE Conference on Data Compression.2023.
[20]YANG J,WANG H X,TANGUCHI I,et al.AVCSR:Adaptive video compressive sensing using region-of-interest detection in the compressed domain [J].IEEE Multimedia,2023,31(1):19-32.
[21]ZHAO C,MA S,ZHANG J,et al.Video compressive sensing reconstruction via reweighted residual sparsity[J].IEEE Transactions on Circuits and Systems for Video Technology,2017,27(6):1182-1195.
[22]ZHONG Y H,ZHANG C X,YANG X,et al.Video compressed sensing reconstruction via an untrained network with low-rank regularization [J].IEEE Transaction on Multimedia,2023,26:4590-460.
[23]TRAMEL E M,FOWLER J E.Video compressed sensing with multihypothesis [C]//Proceedings of the IEEE Conference on Data Compression.2011:193-202.
[24]DU X L,HU X,CHENG B,et al.Multi-hypothesis Reconstruction Algorithm of DCVS Based on Weighted Non-local Similarity [J].Computer Science,2019,46(1):291-296.
[25]SUN R H,LIU H,DENG K L,et al.Window-adaptive Reconstruction for Low-delay Video Compressive Sensing [J].Chinese Journal of Beijing University of Aeronautics and Astronautics,2025,51(7):2374-2383.
[26]YING X Y,WANG L G,WANG Y Q,et al.Deformable 3D convolution for Video Super-Resolution [J].IEEE Signal Proces-sing Letters,2020,27:1500-1504.
[27]PAN Z M,TAN Y L,ZHENG H,et al.Block-based Compressed Sensing of Image Reconstruction Based on Deep Neural Network[J].Computer Scienc,2022,49(S2):510-518.
[28]SOOMRO K,ZAMIR A R,SHAH M.UCF101:A Dataset of101 Human Actions Classes From Videos in The Wild [J].ar-Xiv:2012,1212,0402.
[29]KINGMA D P,BA J.Adam:A method for stochastic optimization [C]//Proceedings of the IEEE Conference on International Conference on Learning Representations.2015.

Related Articles 15

[1]	DENG Jiayan, TIAN Shirui, LIU Xiangli, OUYANG Hongwei, JIAO Yunjia, DUAN Mingxing. Trajectory Prediction Method Based on Multi-stage Pedestrian Feature Mining [J]. Computer Science, 2025, 52(9): 241-248.
[2]	HU Hailong, XU Xiangwei, LI Yaqian. Drug Combination Recommendation Model Based on Dynamic Disease Modeling [J]. Computer Science, 2025, 52(9): 96-105.
[3]	DING Zhengze, NIE Rencan, LI Jintao, SU Huaping, XU Hang. MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer [J]. Computer Science, 2025, 52(8): 188-194.
[4]	LIU Huayong, XU Minghui. Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss [J]. Computer Science, 2025, 52(8): 204-213.
[5]	WANG Fengling, WEI Aimin, PANG Xiongwen, LI Zhi, XIE Jingming. Video Super-resolution Model Based on Implicit Alignment [J]. Computer Science, 2025, 52(8): 232-239.
[6]	LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[7]	HUANG Xingyu, WANG Lihui, TANG Kun, CHENG Xinyu, ZHANG Jian, YE Chen. EFormer:Efficient Transformer for Medical Image Registration Based on Frequency Division and Board Attention [J]. Computer Science, 2025, 52(7): 151-160.
[8]	WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[9]	LONG Xiao, HUANG Wei, HU Kai. Bi-MI ViT:Bi-directional Multi-level Interaction Vision Transformer for Lung CT ImageClassification [J]. Computer Science, 2025, 52(6A): 240700183-6.
[10]	CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[11]	WANG Xuejian, WANG Yiheng, SUN Xinpo, LIU Chuan, JIA Ming, ZHAO Chao, YANG Chao. Extraction of Crustal Deformation Anomalies Based on Transformer-Isolation Forest [J]. Computer Science, 2025, 52(6A): 240600155-6.
[12]	PIAO Mingjie, ZHANG Dongdong, LU Hu, LI Rupeng, GE Xiaoli. Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer [J]. Computer Science, 2025, 52(6A): 240500054-10.
[13]	LI Yang, LIU Yi, LI Hao, ZHANG Gang, XU Mingfeng, HAO Chongqing. Human Pose Estimation Using Millimeter Wave Radar Based on Transformer and PointNet++ [J]. Computer Science, 2025, 52(6A): 240400169-9.
[14]	CHEN Jiajun, LIU Bo, LIN Weiwei, ZHENG Jianwen, XIE Jiachen. Survey of Transformer-based Time Series Forecasting Methods [J]. Computer Science, 2025, 52(6): 96-105.
[15]	WANG Teng, XIAN Yunting, XU Hao, XIE Songqi, ZOU Quanyi. Ship License Plate Recognition Network Based on Pyramid Transformer in Transformer [J]. Computer Science, 2025, 52(6): 179-186.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Video Compressed Sensing Method with Integrated Deformable 3D Convolution and Transformer

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0