Computer Science ›› 2025, Vol. 52 ›› Issue (3): 50-57.doi: 10.11896/jsjkx.240200060

• 3D Vision and Metaverse • Previous Articles     Next Articles

Animatable Head Avatar Reconstruction Algorithm Based on Region Encoding

WANG Jie, WANG Chuangye, XIE Jiucheng, GAO Hao   

  1. School of Automation and Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2024-02-19 Revised:2024-07-11 Online:2025-03-15 Published:2025-03-07
  • About author:WANG Jie,born in 2000,master.His main research interests include 3D deep learning,head avatar and digital human.
    GAO Hao,born in 1976,Ph.D,professor,Ph.D supervisor.His main research interests include AI and 3D reconstruction.
  • Supported by:
    National Natural Science Foundation of China(61931012,62301278,62371254) and Natural Science Foundation of Jiangsu Province,China(BK20230362).

Abstract: Traditional head avatar reconstruction methods are mostly based on 3D Morphable Models(3DMM),which,while convenient for animating,cannot represent non-rigid structures like hairs.Recently,head avatar approaches based on the neural radiance field achieve impressive visual results but suffer from shortcomings in animation and training efficiency.To address these issues,monocular videos are used as raw data,and a dynamically expanding point cloud is utilized,to construct an animatable virtual head avatar.The point cloud can be rapidly rendered into images by rasterization,significantly reducing training time.In terms of texture representation,color is decoupled into albedo and shading,with shading further decomposed into normal and a combination of region features obtained through sparse encoding of points,resulting in more precise textures.However,the inherent discreteness of point clouds can lead to holes.Therefore,a normal smoothing strategy is employed to enhance texture continuity,successfully eliminating texture holes in regions like teeth and tongue.A large number of experiments on multiple subjects show that compared to the state-of-the-art head avatar construction algorithms,such as IMavatar,PointAvatar,NerFace,and StyleAvatar,the animatable avatars constructed based on point clouds,combined with region encoding and normal smoothing strategy,exhibit an improvement of average 3.41% on the PSNR metric.Ablation experiments show that the PSNR metric is improved by approximately 3.50% and 3.44% respectively over not using region encoding and normal smoothing strategy.

Key words: Head avatar, 3D reconstruction, Region encoding, Point cloud, Rasterization, Deep learning

CLC Number: 

  • TP183
[1]CAO C,WENG Y,ZHOU S,et al.Facewarehouse:A 3d facial expression database for visual computing[J].IEEE Transactions on Visualization and Computer Graphics,2013,20(3):413-425.
[2]EGGER B,SMITH W A P,TEWARI A,et al.3d morphableface models—past,present,and future[J].ACM Transactions on Graphics(ToG),2020,39(5):1-38.
[3]PAYSAN P,KNOTHE R,AMBERG B,et al.A 3D face model for pose and illumination invariant face recognition[C]//2009 sixth IEEE International Conference on Advanced Video and Signal based Surveillance.Genova,IEEE,2009:296-301.
[4]LI T,BOLKART T,BLACK M J,et al.Learning a model of facial shape and expression from 4D scans[J].ACM Transactions on Graphics(ToG),2017,36(6):194:1-194:17.
[5]MILDENHALL B,SRINIVASAN P P,TANCIK M,et al.Nerf:Representing scenes as neural radiance fields for view synthesis[J].Communications of the ACM,2021,65(1):99-106.
[6]GAFNI G,THIES J,ZOLLHOFER M,et al.Dynamic neural radiance fields for monocular 4d facial avatar reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2021:8649-8658.
[7]HONG Y,PENG B,XIAO H,et al.Headnerf:A real-time nerf-based parametric head model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:20374-20384.
[8]ATHAR S R,XU Z,SUNKAVALLI K,et al.Rignerf:Fullycontrollable neural 3d portraits[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:20364-20373.
[9]PARK J J,FLORENCE P,STRAUB J,et al.Deepsdf:Learning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,IEEE,2019:165-174.
[10]MESCHEDER L,OECHSLE M,NIEMEYER M,et al.Occupancy networks:Learning 3d reconstruction in function space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,IEEE,2019:4460-4470.
[11]ZHU L,WANG S M,LIU Q S.Self-supervised 3D Face Reconstruction Based on Detailed Face Mask[J].Computer Science,2023,50(2):214-220.
[12]LIANG W L,LI Y,WANG P F.Lightweight Face Generation Method Based on TransEditor and Its Application Specification[J].Computer Science,2023,50(2):221-230.
[13]FENG Y,FENG H,BLACK M J,et al.Learning an animatable detailed 3D face model from in-the-wild images[J].ACM Tran-sactions on Graphics(ToG),2021,40(4):1-13.
[14]ZHENG Y,ABREVAYA V F,BÜHLER M C,et al.Im avatar:Implicit morphable head avatars from videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:13545-13555.
[15]XU Y,WANG L,ZHAO X,et al.Avatarmav:Fast 3d head avatar reconstruction using motion-aware neural voxels[C]//ACM SIGGRAPH 2023 Conference Proceedings.Los Angeles:ACM,2023:1-10.
[16]FRIDOVICH K S,YU A,TANCIK M,et al.Plenoxels:Radi-ance fields without neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:5501-5510.
[17]FRIDOVICH K S,MEANTI G,WARBURG F R,et al.K-planes:Explicit radiance fields in space,time,and appearance[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:12479-12488.
[18]YI B,ZENG W,BUCHANAN S,et al.Canonical factors for hybrid neural fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver:IEEE,2023:3414-3426.
[19]CAO A,JOHNSON J.Hexplane:A fast representation for dynamic scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:130-141.
[20]MÜLLER T,EVANS A,SCHIED C,et al.Instant neuralgraphics primitives with a multiresolution hash encoding[J].ACM Transactions on Graphics(ToG),2022,41(4):1-15.
[21]ZIELONKA W,BOLKART T,THIES J.Instant volumetrichead avatars[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:4574-4584.
[22]LI J,ZHANG J,BAI X,et al.Efficient region-aware neural ra-diance fields for high-fidelity talking portrait synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver:IEEE,2023:7568-7578.
[23]ZHENG Y,YIFAN W,WETZSTEIN G,et al.Pointavatar:Deformable point-based head avatars from videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:21057-21067.
[24]XU Q,XU Z,PHILIP J,et al.Point-nerf:Point-based neural radiance fields[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:IEEE,2022:5438-5448.
[25]CHEN R,HAN S,XU J,et al.Point-based multi-view stereonetwork[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Long Beach:IEEE,2019:1538-1547.
[26]REN F,CHANG Q L,LIU X L,et,al.Overview of 3D Reconstruction of Indoor Structures Based on Point Clouds[J/OL].https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=21167.
[27]RAVI N,REIZENSTEIN J,NOVOTNY D,et al.Accelerating3d deep learning with pytorch3d[J].arXiv:2007.08501,2020.
[28]CHEN J,WU X J.3D Human Body Shape and Motion Tracking by LBS and Snake[J].Journal of Computer-Aided Design & Computer Graphics,2012,24(3):357-363,371.
[29]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[30]GROPP A,YARIV L,HAIM N,et al.Implicit geometric regularization for learning shapes[J].arXiv:2002.10099,2020.
[31]KE Z,SUN J,LI K,et al.Modnet:Real-time trimap-free portrait matting via objective decomposition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2022,36(1):1140-1147.
[32]SALIMANS T,KINGMA D P.Weight normalization:A simple reparameterization to accelerate training of deep neural networks[C]//30th Conference on Neural Information Processing Systems(NIPS 2016).Barcelona,Spain,2016.
[33]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[34]WANG L,ZHAO X,SUN J,et al.StyleAvatar:Real-time Photo-realistic Portrait Avatar from a Single Video[J].arXiv:2305.00942,2023.
[35]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:4401-4410.
[36]WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE Transactions on Image Pocessing,2004,13(4):600-612.
[37]ZHANG R,ISOLA P,EFROS A A,et al.The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:586-595.
[1] HUANG Miaomiao, WANG Huiying, WANG Meixia, WANG Yejiang , ZHAO Yuhai. Review of Graph Embedding Learning Research:From Simple Graph to Complex Graph [J]. Computer Science, 2026, 53(1): 58-76.
[2] WANG Cheng, JIN Cheng. KAN-based Unsupervised Multivariate Time Series Anomaly Detection Network [J]. Computer Science, 2026, 53(1): 89-96.
[3] XUE Jingyan, XIA Jianan, HUO Ruili, LIU Jie, ZHOU Xuezhong. Review of Retinal Image Analysis Methods for OCT/OCTA Based on Deep Learning [J]. Computer Science, 2026, 53(1): 128-140.
[4] ZHOU Bingquan, JIANG Jie, CHEN Jiangmin, ZHAN Lixin. EvR-DETR:Event-RGB Fusion for Lightweight End-to-End Object Detection [J]. Computer Science, 2026, 53(1): 153-162.
[5] LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion [J]. Computer Science, 2025, 52(9): 259-268.
[6] YIN Shi, SHI Zhenyang, WU Menglin, CAI Jinyan, YU De. Deep Learning-based Kidney Segmentation in Ultrasound Imaging:Current Trends and Challenges [J]. Computer Science, 2025, 52(9): 16-24.
[7] ZENG Lili, XIA Jianan, LI Shaowen, JING Maike, ZHAO Huihui, ZHOU Xuezhong. M2T-Net:Cross-task Transfer Learning Tongue Diagnosis Method Based on Multi-source Data [J]. Computer Science, 2025, 52(9): 47-53.
[8] LI Yaru, WANG Qianqian, CHE Chao, ZHU Deheng. Graph-based Compound-Protein Interaction Prediction with Drug Substructures and Protein 3D Information [J]. Computer Science, 2025, 52(9): 71-79.
[9] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[10] LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211.
[11] TANG Boyuan, LI Qi. Review on Application of Spatial-Temporal Graph Neural Network in PM2.5 ConcentrationForecasting [J]. Computer Science, 2025, 52(8): 71-85.
[12] ZENG Xinran, LI Tianrui, LI Chongshou. Active Learning for Point Cloud Semantic Segmentation Based on Dynamic Balance and DistanceSuppression [J]. Computer Science, 2025, 52(8): 180-187.
[13] YUAN Youwen, JIN Shuo, ZHAO Xi. IBSNet:A Neural Implicit Field for IBS Prediction in Single-view Scanned Point Cloud [J]. Computer Science, 2025, 52(8): 195-203.
[14] LIU Zhengyu, ZHANG Fan, QI Xiaofeng, GAO Yanzhao, SONG Yijing, FAN Wang. Review of Research on Deep Learning Compiler [J]. Computer Science, 2025, 52(8): 29-44.
[15] ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!