计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 225-230.doi: 10.11896/jsjkx.250700104

• 计算机图形学 & 多媒体 • 上一篇    下一篇

基于三维高斯溅射的低码率实时多视点视频流传输

王义总1, 宁泓博2, 王昊峰2, 马思伟1, 高文1   

  1. 1 北京大学计算机学院 北京 100871
    2 北京大学信息工程学院 广东 深圳 518055
  • 收稿日期:2025-07-16 修回日期:2025-10-18 发布日期:2026-03-12
  • 通讯作者: 马思伟(swma@pku.edu.cn)
  • 作者简介:(wang@pku.edu.cn)
  • 基金资助:
    北京市自然科学基金(L242014);鹏城实验室科教基金会-中国移动科创专项(2024ZY1C0040);“国家资助博士后研究人员计划”和“中国博士后科学基金”(BX20250382)

Low-bitrate and Real-time Multiview Video Streaming with 3D Gaussian Splatting

WANG Yizong1, NING Hongbo2, WANG Haofeng2, MA Siwei1, GAO Wen1   

  1. 1 School of Computer Science, Peking University, Beijing 100871, China
    2 School of Electronic and Computer Engineering, Peking University, Shenzhen, Guangdong 518055, China
  • Received:2025-07-16 Revised:2025-10-18 Online:2026-03-12
  • About author:WANG Yizong,born in 1997,Ph.D,is a member of CCF(No.B7846M).His main research interest is immersive vi-deo streaming.
    MA Siwei,born in 1979,professor,Ph.D supervisor.His main research interest is video coding.
  • Supported by:
    Beijing Natural Science Foundation(L242014),PCL-CMCC Foundation for Science and Innovation (2024ZY1C0040) and Postdoctoral Fellowship Program and China Postdoctoral Science Foundation(BX20250382).

摘要: 多视点视频能够为用户提供沉浸式体验并支持多种应用,但其传输带宽需求远高于传统视频。现有多视点编码算法主要利用二维视点间的冗余信息,未考虑三维空间冗余。为此,提出一种多视点视频流传输方法,将多视点视频转换为稀疏视点紧凑表示来降低三维空间冗余,并基于该表示在接收端进行三维重建,合成剩余视点。具体包括:1)提出一种基于稀疏视点的多视点视频紧凑表示,利用三维高斯重建与溅射合成剩余视点;2)设计视点选择方法,以优化合成视点的视觉质量。实验表明,提出的系统相比基线方法可降低至少44.6%的码率,同时支持端到端30 FPS以上的实时传输。

关键词: 视频流传输, 多视点视频, 三维高斯溅射, 沉浸式视频, 三维重建

Abstract: Multiview videos can offer viewers immersive experiences and enable a variety of applications,but they require times of transmission bandwidth compared to traditional videos.Current multiview coding algorithms mainly leverage redundancy between 2D views and do not consider 3D spatial redundancy.This paper presents a multiview video streaming approach that transforms multiview video content into a compact sparse-view representation to reduce redundancy in 3D space.At the receiver side,the remaining views are synthesized through 3D reconstruction based on this representation.Specifically,this paper proposes a compact multiview video representation based on sparse-views,where the remaining views are synthesized using 3D Gaussian reconstruction and splatting,and a view selection method that selects views to optimize visual quality of synthesized views.Experiments show that the proposed method achieves at least a 44.6% bitrate reduction compared with the baseline and supports end-to-end streaming at over 30 FPS.

Key words: Video streaming, Multiview video, 3D Gaussian splatting, Immersive video, 3D reconstruction

中图分类号: 

  • TP391
[1]ANTHES C,GARCÍA-HERNÁNDEZ R J,WIEDEMANN M,et al.State of the Art of Virtual Reality Technology[C]//Proceedings of IEEE Aerospace Conference.2016:1-19.
[2]SCHMALSTIEG D,HÖLLERER T.Augmented Reality:Principles and Practice[C]//Proceedings of IEEE Virtual Reality.2017:425-426.
[3]DAI A,CHANG A X,SAVVA M,et al.Scannet:Richly-annotated 3D Reconstructions of Indoor Scenes[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2017:2432-2443.
[4]CHEN X,MA H,WAN J,et al.Multi-view 3D Object Detection Network for Autonomous Driving[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2017:6526-6534.
[5]HOSSEINIAN S,AREFI H.3D Reconstruction from Multi-view Medical X-ray Images Review and Evaluation of Existing Methods[J].The International Archives of the Photogrammetry,Remote Sensing and Spatial Information Sciences,2015,XL-1/W5:319-326.
[6]YU X,SANG X,CHEN D,et al.AutostereoscopicThree-dimensional Display with High Dense Views and the Narrow Structure Pitch[J].Chinese Optics Letters,2014,12(6):060008.
[7]VETRO A,WIEGAND T,SULLIVAN G J.Overview of theStereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard[J].Proceedings of IEEE,2011,99(4):626-642.
[8]HANNUKSELA M M,YAN Y,HUANG X,et al.Overview ofthe Multiview High Efficiency Video Coding(MV-HEVC) Standard[C]//Proceedings of IEEE International Conference on Image Processing(ICIP).2015:2154-2158.
[9]WANG Y K,SKUPIN R,HANNUKSELA M M,et al.TheHigh-level Syntax of the Versatile Video Coding(VVC) Stan-dard[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(10):3779-3800.
[10]BOYCE J M,DORÉ R,DZIEMBOWSKI A,et al.MPEG Immersive Video Coding Standard[J].Proceedings of the IEEE,2021,109(9):1521-1536.
[11]VADAKITAL V K M,DZIEMBOWSKI A,LAFRUIT G,et al.TheMPEG Immersive Video Standard-Current Status and Future Outlook[J].IEEE Multimedia,2022,29(3):101-111.
[12]DENG X,YANG W,YANG R,et al.DeepHomography for Efficient Stereo Image compression[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.2021:1492-1501.
[13]LEI J,LIU X,PENG B,et al.DeepStereo Image Compression via Bi-directional coding[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2022:19669-19678.
[14]ZHANG X,SHAO J,ZHANG J.Ldmic:Learning-based Distributed Multi-view Image Coding[C]//Proceedings of International Conference on Learning Representations.2023.
[15]LIU Z,ZHANG X,SHAO J,et al.Bidirectional Stereo ImageCompression with Cross-dimensional Entropy Model[C]//Proceedings of European Conference on Computer Vision.2024.
[16]HUANG Y,CHEN B,LIAN N,et al.3D-GP-LMVIC:Learning-based Multi-view Image Coding with 3D Gaussian Geometric Priors[J].arXiv:2409.04013,2024.
[17]KERBL B,KOPANAS G,LEIMKUEHLER T,et al.3D Gaus-sian Splatting for Real-time Radiance Field Rendering[J].ACM Transactions on Graphics,2023,42(4):1-14.
[18]WÖDLINGER M,KOTERA J,XU J,et al.Sasic:Stereo Image Compression with Latent Shifts and Stereo Attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2022:661-670.
[19]ZHAI Y,TANG L,MA Y,et al.Disparity-based Stereo Image Compression with Aligned Cross-view Priors[C]//Proceedings of ACM International Conference on Multimedia.2022:2351-2360.
[20]DENG X,DENG Y,YANG R,et al.Masic:DeepMask StereoImage Compression[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(10):6062-6040.
[21]AYZIK S,AVIDAN S.Deep Image Compression Using Decoder Side Information[C]//Proceedings European Conference on Computer Vision.2020:699-714.
[22]HUANG Y,CHEN B,QIN S,et al.Learned Distributed ImageCompression with Multi-scale Patch Matching in Feature Domain[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:4322-4329.
[23]HAMDI A,MELAS-KYRIAZI L,MAI J,et al.Ges:Genera-lized Exponential Splatting for Efficient Radiance Field Rendering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2024:19812-19822.
[24]JIANG Y H,SHEN Z H,HONG Y,et al.Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos[J].ACM Transactions on Graphics,2024.43(6):1-15.
[25]LIU W,BAO Q,SUN Y,et al.Recent Advances of Monocular 2D and 3D Human Pose Estimation:A Deep Learning Perspective[J].ACM Computing Surveys,2023,55(4):1-41.
[26]ZHENG S,ZHOU B,SHAO R,et al.GPS-Gaussian:Generalizable Pixel-wise 3d Gaussian Splatting for Real-time Human Novel View Synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE,2024:19680-19690.
[27]D’EON E,HARRISON B,MYERS T,et al.8iVoxelized Full Bodies-a Voxelized Point Cloud Dataset[EB/OL].http://plenodb.jpeg.org/pc/8ilabs/.
[28]LOOP C,CAI Q,ESCOLANO S O,et al.Microsoft VoxelizedUpper Bodies-a Voxelized Point Cloud Dataset[EB/OL].http://plenodb.jpeg.org/pc/microsoft/.
[29]WANG Z,BOVIK A,SHEIKH H,et al.ImageQuality Assessment:from Error Visibility to Structural Similarity[J].IEEE Transactions on Image Processing,2004,13(4):600-612.
[30]LOSHCHILOV I,HUTTER F.DecoupledWeight Decay Regularization[C]//Proceedigns of International Conference on Learning Representations.2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!