计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 50-57.doi: 10.11896/jsjkx.240200060

• 三维视觉与元宇宙 • 上一篇    下一篇

基于区域编码的可驱动头部虚拟化身重建算法

王杰, 王创业, 谢九成, 高浩   

  1. 南京邮电大学自动化学院,人工智能学院 南京 210023
  • 收稿日期:2024-02-19 修回日期:2024-07-11 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 高浩(tsgaohao@gmail.com)
  • 作者简介:(chieh.wangs@gmail.com)
  • 基金资助:
    国家自然科学基金(61931012,62301278,62371254);江苏省自然科学基金(BK20230362)

Animatable Head Avatar Reconstruction Algorithm Based on Region Encoding

WANG Jie, WANG Chuangye, XIE Jiucheng, GAO Hao   

  1. School of Automation and Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2024-02-19 Revised:2024-07-11 Online:2025-03-15 Published:2025-03-07
  • About author:WANG Jie,born in 2000,master.His main research interests include 3D deep learning,head avatar and digital human.
    GAO Hao,born in 1976,Ph.D,professor,Ph.D supervisor.His main research interests include AI and 3D reconstruction.
  • Supported by:
    National Natural Science Foundation of China(61931012,62301278,62371254) and Natural Science Foundation of Jiangsu Province,China(BK20230362).

摘要: 传统的头部化身大多基于 3D 可变形模型(3DMM) 制作,虽然 3DMM 能够方便地进行驱动,但无法表达头发等非刚性结构。近期,基于神经辐射场的头部化身工作虽然取得了优秀的视觉效果,但在可驱动性和训练效率上存在不足。为了解决以上问题,单目视频被作为原始数据,一个数量动态增长的点云被用来构建可驱动的头部虚拟化身。点云能够通过光栅化操作快速渲染为图像,大大减少了训练时间。在纹理表示上,颜色被解耦为反照率和阴影,阴影又进一步被分解为法线和通过对点进行稀疏性编码获得的区域特征的组合,这种分解最终得到了更精准的纹理。然而,点云固有的离散性质会导致渲染时出现孔洞。因此,一项法线平滑策略被用来提高纹理的连续性,从而有效地消除了牙齿、舌头等区域上的纹理孔洞。在多个主体上的大量的实验表明,相比 IMavatar,PointAvatar,NerFace和 StyleAvatar 等目前最好的头部化身构建算法,基于点云并结合区域编码和法线平滑策略构建的可驱动头部化身在 PSNR 指标上平均取得了约3.41% 的提升。消融实验表明,相较于不使用区域编码和法线平滑策略,所提方法的 PSNR 指标分别提升了约3.50% 和3.44%。

关键词: 头部化身, 三维重建, 区域编码, 点云, 光栅化, 深度学习

Abstract: Traditional head avatar reconstruction methods are mostly based on 3D Morphable Models(3DMM),which,while convenient for animating,cannot represent non-rigid structures like hairs.Recently,head avatar approaches based on the neural radiance field achieve impressive visual results but suffer from shortcomings in animation and training efficiency.To address these issues,monocular videos are used as raw data,and a dynamically expanding point cloud is utilized,to construct an animatable virtual head avatar.The point cloud can be rapidly rendered into images by rasterization,significantly reducing training time.In terms of texture representation,color is decoupled into albedo and shading,with shading further decomposed into normal and a combination of region features obtained through sparse encoding of points,resulting in more precise textures.However,the inherent discreteness of point clouds can lead to holes.Therefore,a normal smoothing strategy is employed to enhance texture continuity,successfully eliminating texture holes in regions like teeth and tongue.A large number of experiments on multiple subjects show that compared to the state-of-the-art head avatar construction algorithms,such as IMavatar,PointAvatar,NerFace,and StyleAvatar,the animatable avatars constructed based on point clouds,combined with region encoding and normal smoothing strategy,exhibit an improvement of average 3.41% on the PSNR metric.Ablation experiments show that the PSNR metric is improved by approximately 3.50% and 3.44% respectively over not using region encoding and normal smoothing strategy.

Key words: Head avatar, 3D reconstruction, Region encoding, Point cloud, Rasterization, Deep learning

中图分类号: 

  • TP183
[1]CAO C,WENG Y,ZHOU S,et al.Facewarehouse:A 3d facial expression database for visual computing[J].IEEE Transactions on Visualization and Computer Graphics,2013,20(3):413-425.
[2]EGGER B,SMITH W A P,TEWARI A,et al.3d morphableface models—past,present,and future[J].ACM Transactions on Graphics(ToG),2020,39(5):1-38.
[3]PAYSAN P,KNOTHE R,AMBERG B,et al.A 3D face model for pose and illumination invariant face recognition[C]//2009 sixth IEEE International Conference on Advanced Video and Signal based Surveillance.Genova,IEEE,2009:296-301.
[4]LI T,BOLKART T,BLACK M J,et al.Learning a model of facial shape and expression from 4D scans[J].ACM Transactions on Graphics(ToG),2017,36(6):194:1-194:17.
[5]MILDENHALL B,SRINIVASAN P P,TANCIK M,et al.Nerf:Representing scenes as neural radiance fields for view synthesis[J].Communications of the ACM,2021,65(1):99-106.
[6]GAFNI G,THIES J,ZOLLHOFER M,et al.Dynamic neural radiance fields for monocular 4d facial avatar reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2021:8649-8658.
[7]HONG Y,PENG B,XIAO H,et al.Headnerf:A real-time nerf-based parametric head model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:20374-20384.
[8]ATHAR S R,XU Z,SUNKAVALLI K,et al.Rignerf:Fullycontrollable neural 3d portraits[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:20364-20373.
[9]PARK J J,FLORENCE P,STRAUB J,et al.Deepsdf:Learning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,IEEE,2019:165-174.
[10]MESCHEDER L,OECHSLE M,NIEMEYER M,et al.Occupancy networks:Learning 3d reconstruction in function space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,IEEE,2019:4460-4470.
[11]ZHU L,WANG S M,LIU Q S.Self-supervised 3D Face Reconstruction Based on Detailed Face Mask[J].Computer Science,2023,50(2):214-220.
[12]LIANG W L,LI Y,WANG P F.Lightweight Face Generation Method Based on TransEditor and Its Application Specification[J].Computer Science,2023,50(2):221-230.
[13]FENG Y,FENG H,BLACK M J,et al.Learning an animatable detailed 3D face model from in-the-wild images[J].ACM Tran-sactions on Graphics(ToG),2021,40(4):1-13.
[14]ZHENG Y,ABREVAYA V F,BÜHLER M C,et al.Im avatar:Implicit morphable head avatars from videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:13545-13555.
[15]XU Y,WANG L,ZHAO X,et al.Avatarmav:Fast 3d head avatar reconstruction using motion-aware neural voxels[C]//ACM SIGGRAPH 2023 Conference Proceedings.Los Angeles:ACM,2023:1-10.
[16]FRIDOVICH K S,YU A,TANCIK M,et al.Plenoxels:Radi-ance fields without neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:5501-5510.
[17]FRIDOVICH K S,MEANTI G,WARBURG F R,et al.K-planes:Explicit radiance fields in space,time,and appearance[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:12479-12488.
[18]YI B,ZENG W,BUCHANAN S,et al.Canonical factors for hybrid neural fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver:IEEE,2023:3414-3426.
[19]CAO A,JOHNSON J.Hexplane:A fast representation for dynamic scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:130-141.
[20]MÜLLER T,EVANS A,SCHIED C,et al.Instant neuralgraphics primitives with a multiresolution hash encoding[J].ACM Transactions on Graphics(ToG),2022,41(4):1-15.
[21]ZIELONKA W,BOLKART T,THIES J.Instant volumetrichead avatars[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:4574-4584.
[22]LI J,ZHANG J,BAI X,et al.Efficient region-aware neural ra-diance fields for high-fidelity talking portrait synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver:IEEE,2023:7568-7578.
[23]ZHENG Y,YIFAN W,WETZSTEIN G,et al.Pointavatar:Deformable point-based head avatars from videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:21057-21067.
[24]XU Q,XU Z,PHILIP J,et al.Point-nerf:Point-based neural radiance fields[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:IEEE,2022:5438-5448.
[25]CHEN R,HAN S,XU J,et al.Point-based multi-view stereonetwork[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Long Beach:IEEE,2019:1538-1547.
[26]REN F,CHANG Q L,LIU X L,et,al.Overview of 3D Reconstruction of Indoor Structures Based on Point Clouds[J/OL].https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=21167.
[27]RAVI N,REIZENSTEIN J,NOVOTNY D,et al.Accelerating3d deep learning with pytorch3d[J].arXiv:2007.08501,2020.
[28]CHEN J,WU X J.3D Human Body Shape and Motion Tracking by LBS and Snake[J].Journal of Computer-Aided Design & Computer Graphics,2012,24(3):357-363,371.
[29]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[30]GROPP A,YARIV L,HAIM N,et al.Implicit geometric regularization for learning shapes[J].arXiv:2002.10099,2020.
[31]KE Z,SUN J,LI K,et al.Modnet:Real-time trimap-free portrait matting via objective decomposition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2022,36(1):1140-1147.
[32]SALIMANS T,KINGMA D P.Weight normalization:A simple reparameterization to accelerate training of deep neural networks[C]//30th Conference on Neural Information Processing Systems(NIPS 2016).Barcelona,Spain,2016.
[33]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[34]WANG L,ZHAO X,SUN J,et al.StyleAvatar:Real-time Photo-realistic Portrait Avatar from a Single Video[J].arXiv:2305.00942,2023.
[35]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:4401-4410.
[36]WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE Transactions on Image Pocessing,2004,13(4):600-612.
[37]ZHANG R,ISOLA P,EFROS A A,et al.The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:586-595.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!