Computer Science ›› 2025, Vol. 52 ›› Issue (11): 175-183.doi: 10.11896/jsjkx.240900141

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Neural Radiance Field for Human Reconstruction Based on Multi-scale Hierarchical Network

WANG Yang, WANG Guodong, ZHAO Junli, SHENG Xiaomeng   

  1. College of Computer Science and Technology,Qingdao University,Qingdao,Shandong 266071,China
  • Received:2024-09-23 Revised:2025-02-06 Online:2025-11-15 Published:2025-11-06
  • About author:WANG Yang,born in 1998,postgra-duate.His main research interests include neural radiance fields and 3D human body reconstruction.
    WANG Guodong,born in 1980,Ph.D,professor,is a member of CCF(No.16234M).His main research interests include computer graphics and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(62172247) and Qingdao Natural Science Foundation(23-2-1-163-zyyd-jch).

Abstract: The reconstruction of 3D human models from monocular RGB video faces challenges in accurately capturing human poses,especially when using prior models like SMPL.Due to its rigid assumptions,such models struggle to depict subtle pose variations,leading to suboptimal reconstruction results.Additionally,existing NeRF-based human modeling methods often generate unnatural shadows or floating artifacts around certain body parts when rendering unseen poses,and their representation of texture details tends to be insufficient.To address these issues,this paper proposes a hierarchical network based on the Triplane Multiscale learning,aims at enhancing the texture details of 3D human models through NeRF techniques and improving the model's generalization capability across different poses.In terms of methodology,multi-resolution hash encoding is employed to replace the traditional sinusoidal frequency encoding function,allowing for more efficient capture of high-frequency human features and speeding up model convergence.The Triplane Multiscale learning strategy is applied to capture pose details,effectively improving the accuracy and visual quality of 3D reconstructions.Experiments demonstrate that the proposed improvements significantly enhance the reconstruction of 3D human models,especially when handling complex pose variations.The method shows notable advantages in terms of training speed,rendering quality,and pose generalization capabilities.By applying this model,the resulting 3D human models exhibit more realistic details,and the synthesized results for novel poses are of high quality,further advancing the development of 3D human reconstruction technology from monocular video.

Key words: Neural radiance field, SMPL, Human reconstruction, Deep learning, MLP

CLC Number: 

  • TP391.4
[1]MILDENHALL B,SRINIVASAN P P,TANCIK M,et al.Representing scenes as neural radiance fields for view synthesis[J].Communications of the ACM,2021,65(1):99-106.
[2]HE G X,ZHU B,XIE B,et al.Progress in Novel View Synthesis Using Neural Radiance Fields[J].Laser & Optoelectronics Progress,2024,61(12):71-83.
[3]LI J Y,CHENG L C,HE J X,et al.Research Status and Prospects of Neural Radiance Fields [J].Journal of Computer-Aided Design & Computer Graphics,2024,36(7):995-1013.
[4]LOPER M,MAHMOOD N,ROMERO J,et al.Skinned multi-person linear model [C]//Seminal Graphics Papers:Pushing the Boundaries,Volume 2.2023:851-866.
[5]CHEN X,JIANG T,SONG J,et al.Fast-snarf:A fast deformer for articulated neural fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(10):11796-11809.
[6]MULLER T,EVANS A,SCHIED C,et al.Instant neuralgraphics primitives with a multiresolution hash encoding [J].ACM Transactions on Graphics,2022,41(4):1-15.
[7]ALLDIECK T,MAGNOR M,XU W,et al.Detailed human avatars from monocular video [C]//2018 International Conference on 3D Vision(3DV).IEEE,2018:98-109.
[8]HAN K,XU J.Research on 3D Scene Rendering Technology-Neural Radiance Field[J].Application Research of Computers,2024,41(8):2252-2260.
[9]WANG Z R,CHANG Y,LU P,et al.A Review of Acceleration Algorithms for Neural Radiance Fields[J].Journal of Graphics,2024,45(1):1-13.
[10]COLLET A,CHUANG M,SWEENEY P,et al.High-qualitystreamable free-viewpoint video [J].ACM Transactions on Graphics,2015,34(4):1-13.
[11]DOU M,KHAMIS S,DEGTYAREV Y,et al.Fusion4D:Real-time performance capture of challenging scenes [J].ACM Transactions on Graphics,2016,35(4):1-13.
[12]GUO K,LINCOLN P,DAVIDSON P,et al.The Relightables:Volumetric performance capture of humans with realistic relighting [J].ACM Transactions on Graphics,2019,38(6):1-19.
[13]MATUSIK W,BUEHLER C,RASKAR R,et al.Image-based visual hulls [C]//Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques.2000:369-374.
[14]SAITO S,HUANG Z,NATSUME R,et al.Pifu:Pixel-alignedimplicit function for high-resolution clothed human digitization [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2304-2314.
[15]SAITO S,SIMON T,SARAGIH J,et al.PifuHD:Multi-levelpixel-aligned implicit function for high-resolution 3D human digitization [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:84-93.
[16]LAZOVA V,INSAFUTDINOV E,PONS-MOLL G.360-degree textures of people in clothing from a single image [C]//2019 International Conference on 3D Vision(3DV).IEEE,2019:643-653.
[17]ALLDIECK T,PONS-MOLL G,THEOBAL T,et al.Tex2Shape:Detailed full human body geometry from a single image [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2293-2303.
[18]ZHENG Z,YU T,LIU Y,et al.Pamir:Parametric model-conditioned implicit representation for image-based human reconstruction [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(6):3170-3184.
[19]LIU X N,CHEN C Y,HU X J,et al.Virtual View-point Image Synthesis of Neural Radiance Field with Depth Information Supervision [J].Journal of Image and Graphics,2024,29(7):2035-2045.
[20]PESAVENTO M,XU Y,SARAFIANOS N,et al.ANIM:accurate neural implicit model for human reconstruction from a single RGB-D image[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2024:5448-5458.
[21]ALLDIECK T,MAGNOR M,XU W,et al.Video-based reconstruction of 3D people models [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8387-8397.
[22]ALLDIECK T,MAGNOR M,BHATNAGAR BL,et al.Learning to reconstruct people in clothing from a single RGB camera [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1175-1186.
[23]SONG C,WANDT B,RHODIN H.Pose modulated avatarsfrom video[J].arXiv:2308.11951,2023.
[24]ALLDIECK T,MAGNOR M,BHATNAGAR B L,et al.Learning to reconstruct people in clothing from a single RGB camera [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:1175-1186.
[25]LING S,NGUYEN K,ROUX-LANGLOIS A,et al.A lattice-based group signature scheme with verifier-local revocation [J].Theoretical Computer Science,2018,730(19):1-20.
[26]VAMBOL A,KHARCHENKO V,POTII O,et al.McEliece and Niederreiter Cryptosystems Analysis in the Context of Post-Quantum Network Security [C]//International Conference on Mathematics & Computers in Sciences & in Industry.IEEE Computer Society,2017:134-137.
[27] SAITO S,HUANG Z,NATSUME R,et al.Pifu:Pixel-aligned implicit function for high-resolution clothed human digitization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:2304-2314.
[28] SAITO S,SIMON T,SARAGIH J,et al.Pifuhd:Multi-levelpixel-aligned implicit function for high-resolution 3d human digitization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:84-93.
[29]DONG Z,CHEN X, YANG J,et al.Ag3d:Learning to generate 3d avatars from 2d image collections[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:14916-14927.
[30]ZHI T,LASSNER C,TUNG T,et al.Texmesh:Reconstructing detailed human texture and geometry from rgb-d video[C]//Computer Vision-ECCV 2020:16th European Conference.Springer,2020:492-509.
[31]ZHAO X,WANG L,SUN J,et al.Havatar:High-fidelity head avatar via facial model conditioned neural radiance field[J].ACM Transactions on Graphics,2023,43(1):1-16.
[32]XIANG D,PRADA F,WU C,et al.Monoclothcap:Towardstemporally coherent clothing capture from monocular rgb video[C]//2020 International Conference on 3D Vision(3DV).IEEE,2020:322-332.
[33]HABERMANN M,XU W,ZOLLHOEFER M,et al.Livecap:Real-time human performance capture from monocular video[J].ACM Transactions On Graphics,2019,38(2):1-17.
[34]HABERMANN M,XU W,ZOLLHOFER M,et al.Deepcap:Monocular human performance capture using weak supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:5052-5063.
[35]ZHANG H,FENG Y,KULITS P,et al.Text-guided generation and editing of compositional 3D avatars[J].arXiv:2309.07125,2023.
[36] SUN C,QIU J,WU L N,et al.Dynamic human body neural radiance field reconstruction based on monocular vision[J].Acta Optica Sinica,2024,44(19):256-266.
[37]PENG S,DONG J,WANG Q,et al.Animatable neural radiance fields for modeling dynamic human bodies[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:14314-14323.
[38]GUO C,CHEN X,SONG J,et al.Human performance capture from monocular video in the wild[C]//2021 International Conference on 3D Vision(3DV).IEEE,2021:889-898.
[39]XIU Y,YANG J,TZIONAS D,et al.Icon:Implicit clothed humans obtained from normals[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2022:13286-13296.
[40]XIU Y,YANG J,CAO X,et al.Econ:Explicit clothed humans optimized via normal integration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:512-523.
[41]WANG S,SCHWARZ K,GEIGER A,et al.Arah:Animatable volume rendering of articulated human SDFs[C]//European Conference on Computer Vision.Springer,2022:1-19.
[42]JIANG B,HONG Y,BAO H,et al.Selfrecon:Self-reconstruction your digital avatar from monocular video[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5605-5615.
[43]PENG S,ZHANG Y,XU Y,et al.Neural body:Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:9054-9063.
[44]CHEN M,ZHANG J,XU X,et al.Geometry-guided progressive nerf for generalizable and efficient neural human rendering[C]//European Conference on Computer Vision.Cham:Springer,2022:222-239.
[45]PENG S,DONG J,WANG Q,et al.Animatable neural radiance fields for modeling dynamic human bodies[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:14314-14323.
[46]WENG C Y,CURLESS B,SRINIVASAN P P,et al.Human-NeRF:Free-viewpoint rendering of moving people from monocular video[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and pattern Recognition.2022:16210-16220.
[47]XU H,ALLDIECK T,SMINCHISESCU C.H-NeRF:Neuralradiance fields for rendering and temporal reconstruction of humans in motion[J].Advances in Neural Information Processing Systems,2021,34:14955-14966.
[48]WANG Z,WU S,XIE W,et al.NeRF-:Neural radiance fields without known camera parameters[J].arXiv:2102.07064,2021.
[49] XIAO Y L,DENG Y Q,CHEN Z G.Accelerating Method of Neural Radiance Fields for Dynamic 3D Human Reconstruction[J/OL].https://doi.org/10.19678/j.issn.1000-3428.0069317.
[50]JING W P,WANG Y F,LI C.NeRF 3D Reconstruction Method Based on Cone Tracing and Network Decomposition[J].Computer Engineering,2024,50(10):334-341.
[51]HU S,HONG F,PAN L,et al.Sherf:Generalizable humanNeRF from a single image[J].arXiv:2303.12791,2023.
[52]GAFNI G,THIES J,ZOLLHOFER M,et al.Dynamic neural radiance fields for monocular 4D facial avatar reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:8649-8658.
[53]SU S Y,YU F,ZOLLHOEFER M,et al.A-NeRF:Surface-free human 3D pose refinement via neural rendering[J].arXiv:2102.06199,2021.
[54]SUN C,SUN M,CHEN H T.Direct voxel grid optimization:Super-fast convergence for radiance fields reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5459-5469.
[55]TAKIKAWA T,LITALIEN J,YIN K,et al.Neural geometric level of detail:Real-time rendering with implicit 3d shapes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:11358-11367.
[56]YU A,LI R,TANCIK M,et al.Plenoctrees for real-time rendering of neural radiance fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:5752-5761.
[57]SHAO R,ZHENG Z,TU H,et al.Tensor4d:Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16632-16642.
[58]MARTIN-BRUALLA R,RADWAN N,SAJJADI M S,et al.Nerf in the wild:Neural radiance fields for unconstrained photo collections[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:7210-7219.
[59]PUMAROLA A,CORONA E,PONS-MOLL G,et al.D-nerf:Neural radiance fields for dynamic scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:10318-10327.
[60]CHAN E R,LIN C Z,CHAN M A,et al.Efficient geometry-aware 3d generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:16123-16133.
[61]ZHANG J W,ZHANG H X,LI S H,et al.3D Reconstruction of Human Head Based on TE-NeuS[J].Software Engineering,2024,27(7):56-60.
[62]WU S P,MA J S, SHE J F.An Implicit Representation-Based Method for Instant Real-Scene 3D Reconstruction and Neural Rendering[J].Science of Surveying and Mapping,2024,49(4):147-158.
[63]CHEN Q,QIN Z B,CAI X Y,et al.Dynamic 3D reconstruction of soft tissue with neural radiation field for robotic surgery simulator[J].Acta Optica Sinica,2024,44(7):279-291.
[64]CHEN X,ZHENG Y,BLACK M J,et al.Snarf:Differentiable forward skinning for animating non-rigid neural implicit shapes[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11594-11604.
[65]FAN T,YANG H,YIN W,et al.Multi-scale view synthesis based on neural radiance fields[J].Journal of Graphics,2023,44(6):1140-1148.
[66]XIE Z,YANG X,YANG Y,et al.S3IM:Stochastic StructuralSIMilarity and Its Unreasonable Effectiveness for Neural Fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:18024-18034.
[67]CHEN J,ZHANG Y,KANG D,et al.Animatable neural radiance fields from monocular rgb videos[J].arXiv:2106.13629,2021.
[68]JIANG T,CHEN X,SONG J,et al.Instantavatar:Learning avatars from monocular video in 60 seconds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16922-16932.
[69]TIWARI G,SARAFIANOS N,TUNG T,et al.Neural-gif:Neural generalized implicit functions for animating people in clothing[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:11708-11718.
[1] YIN Shi, SHI Zhenyang, WU Menglin, CAI Jinyan, YU De. Deep Learning-based Kidney Segmentation in Ultrasound Imaging:Current Trends and Challenges [J]. Computer Science, 2025, 52(9): 16-24.
[2] ZENG Lili, XIA Jianan, LI Shaowen, JING Maike, ZHAO Huihui, ZHOU Xuezhong. M2T-Net:Cross-task Transfer Learning Tongue Diagnosis Method Based on Multi-source Data [J]. Computer Science, 2025, 52(9): 47-53.
[3] LI Yaru, WANG Qianqian, CHE Chao, ZHU Deheng. Graph-based Compound-Protein Interaction Prediction with Drug Substructures and Protein 3D Information [J]. Computer Science, 2025, 52(9): 71-79.
[4] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[5] LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211.
[6] LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion [J]. Computer Science, 2025, 52(9): 259-268.
[7] TANG Boyuan, LI Qi. Review on Application of Spatial-Temporal Graph Neural Network in PM2.5 ConcentrationForecasting [J]. Computer Science, 2025, 52(8): 71-85.
[8] LIU Zhengyu, ZHANG Fan, QI Xiaofeng, GAO Yanzhao, SONG Yijing, FAN Wang. Review of Research on Deep Learning Compiler [J]. Computer Science, 2025, 52(8): 29-44.
[9] ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[10] FAN Xing, ZHOU Xiaohang, ZHANG Ning. Review on Methods and Applications of Short Text Similarity Measurement in Social Media Platforms [J]. Computer Science, 2025, 52(6A): 240400206-8.
[11] YANG Jixiang, JIANG Huiping, WANG Sen, MA Xuan. Research Progress and Challenges in Forest Fire Risk Prediction [J]. Computer Science, 2025, 52(6A): 240400177-8.
[12] WANG Jiamin, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, HAO Xu, ZHANG Chao, FU Rongsheng. Review of Concrete Defect Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900137-12.
[13] HAO Xu, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, WANG Jiamin, CHU Hongkun. Survey of Man-Machine Distance Detection Method in Construction Site [J]. Computer Science, 2025, 52(6A): 240700098-10.
[14] CHEN Shijia, YE Jianyuan, GONG Xuan, ZENG Kang, NI Pengcheng. Aircraft Landing Gear Safety Pin Detection Algorithm Based on Improved YOlOv5s [J]. Computer Science, 2025, 52(6A): 240400189-7.
[15] GAO Junyi, ZHANG Wei, LI Zelin. YOLO-BFEPS:Efficient Attention-enhanced Cross-scale YOLOv10 Fire Detection Model [J]. Computer Science, 2025, 52(6A): 240800134-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!