Computer Science ›› 2025, Vol. 52 ›› Issue (3): 50-57.doi: 10.11896/jsjkx.240200060

• 3D Vision and Metaverse • Previous Articles     Next Articles

Animatable Head Avatar Reconstruction Algorithm Based on Region Encoding

WANG Jie, WANG Chuangye, XIE Jiucheng, GAO Hao   

  1. School of Automation and Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2024-02-19 Revised:2024-07-11 Online:2025-03-15 Published:2025-03-07
  • About author:WANG Jie,born in 2000,master.His main research interests include 3D deep learning,head avatar and digital human.
    GAO Hao,born in 1976,Ph.D,professor,Ph.D supervisor.His main research interests include AI and 3D reconstruction.
  • Supported by:
    National Natural Science Foundation of China(61931012,62301278,62371254) and Natural Science Foundation of Jiangsu Province,China(BK20230362).

Abstract: Traditional head avatar reconstruction methods are mostly based on 3D Morphable Models(3DMM),which,while convenient for animating,cannot represent non-rigid structures like hairs.Recently,head avatar approaches based on the neural radiance field achieve impressive visual results but suffer from shortcomings in animation and training efficiency.To address these issues,monocular videos are used as raw data,and a dynamically expanding point cloud is utilized,to construct an animatable virtual head avatar.The point cloud can be rapidly rendered into images by rasterization,significantly reducing training time.In terms of texture representation,color is decoupled into albedo and shading,with shading further decomposed into normal and a combination of region features obtained through sparse encoding of points,resulting in more precise textures.However,the inherent discreteness of point clouds can lead to holes.Therefore,a normal smoothing strategy is employed to enhance texture continuity,successfully eliminating texture holes in regions like teeth and tongue.A large number of experiments on multiple subjects show that compared to the state-of-the-art head avatar construction algorithms,such as IMavatar,PointAvatar,NerFace,and StyleAvatar,the animatable avatars constructed based on point clouds,combined with region encoding and normal smoothing strategy,exhibit an improvement of average 3.41% on the PSNR metric.Ablation experiments show that the PSNR metric is improved by approximately 3.50% and 3.44% respectively over not using region encoding and normal smoothing strategy.

Key words: Head avatar, 3D reconstruction, Region encoding, Point cloud, Rasterization, Deep learning

CLC Number: 

  • TP183
[1]CAO C,WENG Y,ZHOU S,et al.Facewarehouse:A 3d facial expression database for visual computing[J].IEEE Transactions on Visualization and Computer Graphics,2013,20(3):413-425.
[2]EGGER B,SMITH W A P,TEWARI A,et al.3d morphableface models—past,present,and future[J].ACM Transactions on Graphics(ToG),2020,39(5):1-38.
[3]PAYSAN P,KNOTHE R,AMBERG B,et al.A 3D face model for pose and illumination invariant face recognition[C]//2009 sixth IEEE International Conference on Advanced Video and Signal based Surveillance.Genova,IEEE,2009:296-301.
[4]LI T,BOLKART T,BLACK M J,et al.Learning a model of facial shape and expression from 4D scans[J].ACM Transactions on Graphics(ToG),2017,36(6):194:1-194:17.
[5]MILDENHALL B,SRINIVASAN P P,TANCIK M,et al.Nerf:Representing scenes as neural radiance fields for view synthesis[J].Communications of the ACM,2021,65(1):99-106.
[6]GAFNI G,THIES J,ZOLLHOFER M,et al.Dynamic neural radiance fields for monocular 4d facial avatar reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2021:8649-8658.
[7]HONG Y,PENG B,XIAO H,et al.Headnerf:A real-time nerf-based parametric head model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:20374-20384.
[8]ATHAR S R,XU Z,SUNKAVALLI K,et al.Rignerf:Fullycontrollable neural 3d portraits[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:20364-20373.
[9]PARK J J,FLORENCE P,STRAUB J,et al.Deepsdf:Learning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,IEEE,2019:165-174.
[10]MESCHEDER L,OECHSLE M,NIEMEYER M,et al.Occupancy networks:Learning 3d reconstruction in function space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,IEEE,2019:4460-4470.
[11]ZHU L,WANG S M,LIU Q S.Self-supervised 3D Face Reconstruction Based on Detailed Face Mask[J].Computer Science,2023,50(2):214-220.
[12]LIANG W L,LI Y,WANG P F.Lightweight Face Generation Method Based on TransEditor and Its Application Specification[J].Computer Science,2023,50(2):221-230.
[13]FENG Y,FENG H,BLACK M J,et al.Learning an animatable detailed 3D face model from in-the-wild images[J].ACM Tran-sactions on Graphics(ToG),2021,40(4):1-13.
[14]ZHENG Y,ABREVAYA V F,BÜHLER M C,et al.Im avatar:Implicit morphable head avatars from videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:13545-13555.
[15]XU Y,WANG L,ZHAO X,et al.Avatarmav:Fast 3d head avatar reconstruction using motion-aware neural voxels[C]//ACM SIGGRAPH 2023 Conference Proceedings.Los Angeles:ACM,2023:1-10.
[16]FRIDOVICH K S,YU A,TANCIK M,et al.Plenoxels:Radi-ance fields without neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:5501-5510.
[17]FRIDOVICH K S,MEANTI G,WARBURG F R,et al.K-planes:Explicit radiance fields in space,time,and appearance[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:12479-12488.
[18]YI B,ZENG W,BUCHANAN S,et al.Canonical factors for hybrid neural fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver:IEEE,2023:3414-3426.
[19]CAO A,JOHNSON J.Hexplane:A fast representation for dynamic scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:130-141.
[20]MÜLLER T,EVANS A,SCHIED C,et al.Instant neuralgraphics primitives with a multiresolution hash encoding[J].ACM Transactions on Graphics(ToG),2022,41(4):1-15.
[21]ZIELONKA W,BOLKART T,THIES J.Instant volumetrichead avatars[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:4574-4584.
[22]LI J,ZHANG J,BAI X,et al.Efficient region-aware neural ra-diance fields for high-fidelity talking portrait synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver:IEEE,2023:7568-7578.
[23]ZHENG Y,YIFAN W,WETZSTEIN G,et al.Pointavatar:Deformable point-based head avatars from videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:21057-21067.
[24]XU Q,XU Z,PHILIP J,et al.Point-nerf:Point-based neural radiance fields[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:IEEE,2022:5438-5448.
[25]CHEN R,HAN S,XU J,et al.Point-based multi-view stereonetwork[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Long Beach:IEEE,2019:1538-1547.
[26]REN F,CHANG Q L,LIU X L,et,al.Overview of 3D Reconstruction of Indoor Structures Based on Point Clouds[J/OL].https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=21167.
[27]RAVI N,REIZENSTEIN J,NOVOTNY D,et al.Accelerating3d deep learning with pytorch3d[J].arXiv:2007.08501,2020.
[28]CHEN J,WU X J.3D Human Body Shape and Motion Tracking by LBS and Snake[J].Journal of Computer-Aided Design & Computer Graphics,2012,24(3):357-363,371.
[29]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[30]GROPP A,YARIV L,HAIM N,et al.Implicit geometric regularization for learning shapes[J].arXiv:2002.10099,2020.
[31]KE Z,SUN J,LI K,et al.Modnet:Real-time trimap-free portrait matting via objective decomposition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2022,36(1):1140-1147.
[32]SALIMANS T,KINGMA D P.Weight normalization:A simple reparameterization to accelerate training of deep neural networks[C]//30th Conference on Neural Information Processing Systems(NIPS 2016).Barcelona,Spain,2016.
[33]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[34]WANG L,ZHAO X,SUN J,et al.StyleAvatar:Real-time Photo-realistic Portrait Avatar from a Single Video[J].arXiv:2305.00942,2023.
[35]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:4401-4410.
[36]WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE Transactions on Image Pocessing,2004,13(4):600-612.
[37]ZHANG R,ISOLA P,EFROS A A,et al.The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:586-595.
[1] ZHONG Yue, GU Jieming. 3D Reconstruction of Single-view Sketches Based on Attention Mechanism and Contrastive Loss [J]. Computer Science, 2025, 52(3): 77-85.
[2] LI Zongmin, RONG Guangcai, BAI Yun, XU Chang , XIAN Shiyang. 3D Object Detection with Dynamic Weight Graph Convolution [J]. Computer Science, 2025, 52(3): 104-111.
[3] WANG Yuan, HUO Peng, HAN Yi, CHEN Tun, WANG Xiang, WEN Hui. Survey on Deep Learning-based Meteorological Forecasting Models [J]. Computer Science, 2025, 52(3): 112-126.
[4] SHEN Yaxin, GAO Lijian , MAO Qirong. Semi-supervised Sound Event Detection Based on Meta Learning [J]. Computer Science, 2025, 52(3): 222-230.
[5] HAN Lin, WANG Yifan, LI Jianan, GAO Wei. Automatic Scheduling Search Optimization Method Based on TVM [J]. Computer Science, 2025, 52(3): 268-276.
[6] SONG Xingnuo, WANG Congyan, CHEN Mingkai. Survey on 3D Scene Reconstruction Techniques in Metaverse [J]. Computer Science, 2025, 52(3): 17-32.
[7] WANG Tao, BAI Xuefei, WANG Wenjian. Selective Feature Fusion for 3D CT Image Segmentation of Renal Cancer Based on Edge Enhancement [J]. Computer Science, 2025, 52(3): 41-49.
[8] WANG Xingbo, ZHANG Hao, GAO Hao, ZHAI Mingliang, XIE Jiucheng. Talking Portrait Synthesis Method Based on Regional Saliency and Spatial Feature Extraction [J]. Computer Science, 2025, 52(3): 58-67.
[9] SUN Rui, WANG Fei, FENG Huidong, ZHANG Xudong, GAO Jun. Research Progress in Facial Presentation Attack Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(2): 323-335.
[10] DING Ruiyang, SUN Lei, DAI Leyu, ZANG Weifei, XU Bayi. Generation Method for Adversarial Networks Traffic Based on Universal Perturbations [J]. Computer Science, 2025, 52(2): 336-343.
[11] CHEN Zigang, PAN Ding, LENG Tao, ZHU Haihua, CHEN Long, ZHOU Yousheng. Explanation Robustness Adversarial Training Method Based on Local Gradient Smoothing [J]. Computer Science, 2025, 52(2): 374-379.
[12] ZHANG Yusong, XU Shuai, YAN Xingyu, GUAN Donghai, XU Jianqiu. Survey on Cross-city Human Mobility Prediction [J]. Computer Science, 2025, 52(1): 102-119.
[13] LIU Yuming, DAI Yu, CHEN Gongping. Review of Federated Learning in Medical Image Processing [J]. Computer Science, 2025, 52(1): 183-193.
[14] LI Yujie, MA Zihang, WANG Yifu, WANG Xinghe, TAN Benying. Survey of Vision Transformers(ViT) [J]. Computer Science, 2025, 52(1): 194-209.
[15] ZHU Xiaoyan, WANG Wenge, WANG Jiayin, ZHANG Xuanping. Just-In-Time Software Defect Prediction Approach Based on Fine-grained Code Representationand Feature Fusion [J]. Computer Science, 2025, 52(1): 242-249.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!