计算机科学 ›› 2025, Vol. 52 ›› Issue (11): 175-183.doi: 10.11896/jsjkx.240900141

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多尺度层次网络的人体重建神经辐射场

王洋, 王国栋, 赵俊莉, 盛筱萌   

  1. 青岛大学计算机科学技术学院 山东 青岛 266071
  • 收稿日期:2024-09-23 修回日期:2025-02-06 出版日期:2025-11-15 发布日期:2025-11-06
  • 通讯作者: 王国栋(doctorwgd@gmail.com)
  • 作者简介:(wangyang0689@qdu.edu.cn)
  • 基金资助:
    国家自然科学基金(62172247);青岛市自然科学基金(23-2-1-163-zyyd-jch)

Neural Radiance Field for Human Reconstruction Based on Multi-scale Hierarchical Network

WANG Yang, WANG Guodong, ZHAO Junli, SHENG Xiaomeng   

  1. College of Computer Science and Technology,Qingdao University,Qingdao,Shandong 266071,China
  • Received:2024-09-23 Revised:2025-02-06 Online:2025-11-15 Published:2025-11-06
  • About author:WANG Yang,born in 1998,postgra-duate.His main research interests include neural radiance fields and 3D human body reconstruction.
    WANG Guodong,born in 1980,Ph.D,professor,is a member of CCF(No.16234M).His main research interests include computer graphics and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(62172247) and Qingdao Natural Science Foundation(23-2-1-163-zyyd-jch).

摘要: 单目RGB视频中的三维人体重建面临着准确捕捉人体姿态的挑战,尤其在使用诸如SMPL人体先验模型时,其刚性假设限制,难以描述姿态的细微变化,导致重建效果不理想。此外,现有的基于神经辐射场的人体建模方法在处理未见过的姿态时,容易在局部区域产生不自然的阴影或漂浮现象,且在纹理细节的呈现上有所不足。为了解决这些问题,提出了一种基于三平面多尺度分解网络,旨在通过神经辐射场方法增强三维人体的纹理细节,并提高模型对不同姿态的泛化能力。在方法上,使用多分辨率哈希编码技术替代传统的三角频率编码函数,能够更高效地捕获人体的高频特征,并加快模型的收敛速度。三平面多尺度学习策略被应用于人体姿态的细节捕捉,从而有效提高了三维重建的精度与视觉质量。在实验中,所提出的改进方法显著提升了人体三维模型的重建效果,尤其在处理复杂的姿态变化时表现突出。该方法在训练速度、渲染质量以及姿态泛化能力上均优于传统方法,展示出较大的优势。应用该模型生成的三维人体模型在细节上更加逼真,且在新颖姿态下的合成结果表现良好,进一步推动了单目视频中的三维人体重建技术的发展。

关键词: 神经网络辐射场, 蒙皮多人线性模型, 人体重建, 深度学习, 多层感知机

Abstract: The reconstruction of 3D human models from monocular RGB video faces challenges in accurately capturing human poses,especially when using prior models like SMPL.Due to its rigid assumptions,such models struggle to depict subtle pose variations,leading to suboptimal reconstruction results.Additionally,existing NeRF-based human modeling methods often generate unnatural shadows or floating artifacts around certain body parts when rendering unseen poses,and their representation of texture details tends to be insufficient.To address these issues,this paper proposes a hierarchical network based on the Triplane Multiscale learning,aims at enhancing the texture details of 3D human models through NeRF techniques and improving the model's generalization capability across different poses.In terms of methodology,multi-resolution hash encoding is employed to replace the traditional sinusoidal frequency encoding function,allowing for more efficient capture of high-frequency human features and speeding up model convergence.The Triplane Multiscale learning strategy is applied to capture pose details,effectively improving the accuracy and visual quality of 3D reconstructions.Experiments demonstrate that the proposed improvements significantly enhance the reconstruction of 3D human models,especially when handling complex pose variations.The method shows notable advantages in terms of training speed,rendering quality,and pose generalization capabilities.By applying this model,the resulting 3D human models exhibit more realistic details,and the synthesized results for novel poses are of high quality,further advancing the development of 3D human reconstruction technology from monocular video.

Key words: Neural radiance field, SMPL, Human reconstruction, Deep learning, MLP

中图分类号: 

  • TP391.4
[1]MILDENHALL B,SRINIVASAN P P,TANCIK M,et al.Representing scenes as neural radiance fields for view synthesis[J].Communications of the ACM,2021,65(1):99-106.
[2]HE G X,ZHU B,XIE B,et al.Progress in Novel View Synthesis Using Neural Radiance Fields[J].Laser & Optoelectronics Progress,2024,61(12):71-83.
[3]LI J Y,CHENG L C,HE J X,et al.Research Status and Prospects of Neural Radiance Fields [J].Journal of Computer-Aided Design & Computer Graphics,2024,36(7):995-1013.
[4]LOPER M,MAHMOOD N,ROMERO J,et al.Skinned multi-person linear model [C]//Seminal Graphics Papers:Pushing the Boundaries,Volume 2.2023:851-866.
[5]CHEN X,JIANG T,SONG J,et al.Fast-snarf:A fast deformer for articulated neural fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(10):11796-11809.
[6]MULLER T,EVANS A,SCHIED C,et al.Instant neuralgraphics primitives with a multiresolution hash encoding [J].ACM Transactions on Graphics,2022,41(4):1-15.
[7]ALLDIECK T,MAGNOR M,XU W,et al.Detailed human avatars from monocular video [C]//2018 International Conference on 3D Vision(3DV).IEEE,2018:98-109.
[8]HAN K,XU J.Research on 3D Scene Rendering Technology-Neural Radiance Field[J].Application Research of Computers,2024,41(8):2252-2260.
[9]WANG Z R,CHANG Y,LU P,et al.A Review of Acceleration Algorithms for Neural Radiance Fields[J].Journal of Graphics,2024,45(1):1-13.
[10]COLLET A,CHUANG M,SWEENEY P,et al.High-qualitystreamable free-viewpoint video [J].ACM Transactions on Graphics,2015,34(4):1-13.
[11]DOU M,KHAMIS S,DEGTYAREV Y,et al.Fusion4D:Real-time performance capture of challenging scenes [J].ACM Transactions on Graphics,2016,35(4):1-13.
[12]GUO K,LINCOLN P,DAVIDSON P,et al.The Relightables:Volumetric performance capture of humans with realistic relighting [J].ACM Transactions on Graphics,2019,38(6):1-19.
[13]MATUSIK W,BUEHLER C,RASKAR R,et al.Image-based visual hulls [C]//Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques.2000:369-374.
[14]SAITO S,HUANG Z,NATSUME R,et al.Pifu:Pixel-alignedimplicit function for high-resolution clothed human digitization [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2304-2314.
[15]SAITO S,SIMON T,SARAGIH J,et al.PifuHD:Multi-levelpixel-aligned implicit function for high-resolution 3D human digitization [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:84-93.
[16]LAZOVA V,INSAFUTDINOV E,PONS-MOLL G.360-degree textures of people in clothing from a single image [C]//2019 International Conference on 3D Vision(3DV).IEEE,2019:643-653.
[17]ALLDIECK T,PONS-MOLL G,THEOBAL T,et al.Tex2Shape:Detailed full human body geometry from a single image [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2293-2303.
[18]ZHENG Z,YU T,LIU Y,et al.Pamir:Parametric model-conditioned implicit representation for image-based human reconstruction [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(6):3170-3184.
[19]LIU X N,CHEN C Y,HU X J,et al.Virtual View-point Image Synthesis of Neural Radiance Field with Depth Information Supervision [J].Journal of Image and Graphics,2024,29(7):2035-2045.
[20]PESAVENTO M,XU Y,SARAFIANOS N,et al.ANIM:accurate neural implicit model for human reconstruction from a single RGB-D image[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2024:5448-5458.
[21]ALLDIECK T,MAGNOR M,XU W,et al.Video-based reconstruction of 3D people models [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8387-8397.
[22]ALLDIECK T,MAGNOR M,BHATNAGAR BL,et al.Learning to reconstruct people in clothing from a single RGB camera [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1175-1186.
[23]SONG C,WANDT B,RHODIN H.Pose modulated avatarsfrom video[J].arXiv:2308.11951,2023.
[24]ALLDIECK T,MAGNOR M,BHATNAGAR B L,et al.Learning to reconstruct people in clothing from a single RGB camera [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1175-1186.
[25]LING S,NGUYEN K,ROUX-LANGLOIS A,et al.A lattice-based group signature scheme with verifier-local revocation [J].Theoretical Computer Science,2018,730(19):1-20.
[26]VAMBOL A,KHARCHENKO V,POTII O,et al.McEliece and Niederreiter Cryptosystems Analysis in the Context of Post-Quantum Network Security [C]//International Conference on Mathematics & Computers in Sciences & in Industry.IEEE Computer Society,2017:134-137.
[27] SAITO S,HUANG Z,NATSUME R,et al.Pifu:Pixel-aligned implicit function for high-resolution clothed human digitization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2304-2314.
[28] SAITO S,SIMON T,SARAGIH J,et al.Pifuhd:Multi-levelpixel-aligned implicit function for high-resolution 3d human digitization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:84-93.
[29]DONG Z,CHEN X, YANG J,et al.Ag3d:Learning to generate 3d avatars from 2d image collections[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:14916-14927.
[30]ZHI T,LASSNER C,TUNG T,et al.Texmesh:Reconstructing detailed human texture and geometry from rgb-d video[C]//Computer Vision-ECCV 2020:16th European Conference.Springer,2020:492-509.
[31]ZHAO X,WANG L,SUN J,et al.Havatar:High-fidelity head avatar via facial model conditioned neural radiance field[J].ACM Transactions on Graphics,2023,43(1):1-16.
[32]XIANG D,PRADA F,WU C,et al.Monoclothcap:Towardstemporally coherent clothing capture from monocular rgb video[C]//2020 International Conference on 3D Vision(3DV).IEEE,2020:322-332.
[33]HABERMANN M,XU W,ZOLLHOEFER M,et al.Livecap:Real-time human performance capture from monocular video[J].ACM Transactions On Graphics,2019,38(2):1-17.
[34]HABERMANN M,XU W,ZOLLHOFER M,et al.Deepcap:Monocular human performance capture using weak supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5052-5063.
[35]ZHANG H,FENG Y,KULITS P,et al.Text-guided generation and editing of compositional 3D avatars[J].arXiv:2309.07125,2023.
[36] SUN C,QIU J,WU L N,et al.Dynamic human body neural radiance field reconstruction based on monocular vision[J].Acta Optica Sinica,2024,44(19):256-266.
[37]PENG S,DONG J,WANG Q,et al.Animatable neural radiance fields for modeling dynamic human bodies[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:14314-14323.
[38]GUO C,CHEN X,SONG J,et al.Human performance capture from monocular video in the wild[C]//2021 International Conference on 3D Vision(3DV).IEEE,2021:889-898.
[39]XIU Y,YANG J,TZIONAS D,et al.Icon:Implicit clothed humans obtained from normals[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2022:13286-13296.
[40]XIU Y,YANG J,CAO X,et al.Econ:Explicit clothed humans optimized via normal integration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:512-523.
[41]WANG S,SCHWARZ K,GEIGER A,et al.Arah:Animatable volume rendering of articulated human SDFs[C]//European Conference on Computer Vision.Springer,2022:1-19.
[42]JIANG B,HONG Y,BAO H,et al.Selfrecon:Self-reconstruction your digital avatar from monocular video[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5605-5615.
[43]PENG S,ZHANG Y,XU Y,et al.Neural body:Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:9054-9063.
[44]CHEN M,ZHANG J,XU X,et al.Geometry-guided progressive nerf for generalizable and efficient neural human rendering[C]//European Conference on Computer Vision.Cham:Springer,2022:222-239.
[45]PENG S,DONG J,WANG Q,et al.Animatable neural radiance fields for modeling dynamic human bodies[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:14314-14323.
[46]WENG C Y,CURLESS B,SRINIVASAN P P,et al.Human-NeRF:Free-viewpoint rendering of moving people from monocular video[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and pattern Recognition.2022:16210-16220.
[47]XU H,ALLDIECK T,SMINCHISESCU C.H-NeRF:Neuralradiance fields for rendering and temporal reconstruction of humans in motion[J].Advances in Neural Information Processing Systems,2021,34:14955-14966.
[48]WANG Z,WU S,XIE W,et al.NeRF-:Neural radiance fields without known camera parameters[J].arXiv:2102.07064,2021.
[49] XIAO Y L,DENG Y Q,CHEN Z G.Accelerating Method of Neural Radiance Fields for Dynamic 3D Human Reconstruction[J/OL].https://doi.org/10.19678/j.issn.1000-3428.0069317.
[50]JING W P,WANG Y F,LI C.NeRF 3D Reconstruction Method Based on Cone Tracing and Network Decomposition[J].Computer Engineering,2024,50(10):334-341.
[51]HU S,HONG F,PAN L,et al.Sherf:Generalizable humanNeRF from a single image[J].arXiv:2303.12791,2023.
[52]GAFNI G,THIES J,ZOLLHOFER M,et al.Dynamic neural radiance fields for monocular 4D facial avatar reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8649-8658.
[53]SU S Y,YU F,ZOLLHOEFER M,et al.A-NeRF:Surface-free human 3D pose refinement via neural rendering[J].arXiv:2102.06199,2021.
[54]SUN C,SUN M,CHEN H T.Direct voxel grid optimization:Super-fast convergence for radiance fields reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5459-5469.
[55]TAKIKAWA T,LITALIEN J,YIN K,et al.Neural geometric level of detail:Real-time rendering with implicit 3d shapes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11358-11367.
[56]YU A,LI R,TANCIK M,et al.Plenoctrees for real-time rendering of neural radiance fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:5752-5761.
[57]SHAO R,ZHENG Z,TU H,et al.Tensor4d:Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16632-16642.
[58]MARTIN-BRUALLA R,RADWAN N,SAJJADI M S,et al.Nerf in the wild:Neural radiance fields for unconstrained photo collections[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:7210-7219.
[59]PUMAROLA A,CORONA E,PONS-MOLL G,et al.D-nerf:Neural radiance fields for dynamic scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10318-10327.
[60]CHAN E R,LIN C Z,CHAN M A,et al.Efficient geometry-aware 3d generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:16123-16133.
[61]ZHANG J W,ZHANG H X,LI S H,et al.3D Reconstruction of Human Head Based on TE-NeuS[J].Software Engineering,2024,27(7):56-60.
[62]WU S P,MA J S, SHE J F.An Implicit Representation-Based Method for Instant Real-Scene 3D Reconstruction and Neural Rendering[J].Science of Surveying and Mapping,2024,49(4):147-158.
[63]CHEN Q,QIN Z B,CAI X Y,et al.Dynamic 3D reconstruction of soft tissue with neural radiation field for robotic surgery simulator[J].Acta Optica Sinica,2024,44(7):279-291.
[64]CHEN X,ZHENG Y,BLACK M J,et al.Snarf:Differentiable forward skinning for animating non-rigid neural implicit shapes[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11594-11604.
[65]FAN T,YANG H,YIN W,et al.Multi-scale view synthesis based on neural radiance fields[J].Journal of Graphics,2023,44(6):1140-1148.
[66]XIE Z,YANG X,YANG Y,et al.S3IM:Stochastic StructuralSIMilarity and Its Unreasonable Effectiveness for Neural Fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:18024-18034.
[67]CHEN J,ZHANG Y,KANG D,et al.Animatable neural radiance fields from monocular rgb videos[J].arXiv:2106.13629,2021.
[68]JIANG T,CHEN X,SONG J,et al.Instantavatar:Learning avatars from monocular video in 60 seconds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16922-16932.
[69]TIWARI G,SARAFIANOS N,TUNG T,et al.Neural-gif:Neural generalized implicit functions for animating people in clothing[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11708-11718.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!