计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 50-57.doi: 10.11896/jsjkx.240200060
王杰, 王创业, 谢九成, 高浩
WANG Jie, WANG Chuangye, XIE Jiucheng, GAO Hao
摘要: 传统的头部化身大多基于 3D 可变形模型(3DMM) 制作,虽然 3DMM 能够方便地进行驱动,但无法表达头发等非刚性结构。近期,基于神经辐射场的头部化身工作虽然取得了优秀的视觉效果,但在可驱动性和训练效率上存在不足。为了解决以上问题,单目视频被作为原始数据,一个数量动态增长的点云被用来构建可驱动的头部虚拟化身。点云能够通过光栅化操作快速渲染为图像,大大减少了训练时间。在纹理表示上,颜色被解耦为反照率和阴影,阴影又进一步被分解为法线和通过对点进行稀疏性编码获得的区域特征的组合,这种分解最终得到了更精准的纹理。然而,点云固有的离散性质会导致渲染时出现孔洞。因此,一项法线平滑策略被用来提高纹理的连续性,从而有效地消除了牙齿、舌头等区域上的纹理孔洞。在多个主体上的大量的实验表明,相比 IMavatar,PointAvatar,NerFace和 StyleAvatar 等目前最好的头部化身构建算法,基于点云并结合区域编码和法线平滑策略构建的可驱动头部化身在 PSNR 指标上平均取得了约3.41% 的提升。消融实验表明,相较于不使用区域编码和法线平滑策略,所提方法的 PSNR 指标分别提升了约3.50% 和3.44%。
中图分类号:
[1]CAO C,WENG Y,ZHOU S,et al.Facewarehouse:A 3d facial expression database for visual computing[J].IEEE Transactions on Visualization and Computer Graphics,2013,20(3):413-425. [2]EGGER B,SMITH W A P,TEWARI A,et al.3d morphableface models—past,present,and future[J].ACM Transactions on Graphics(ToG),2020,39(5):1-38. [3]PAYSAN P,KNOTHE R,AMBERG B,et al.A 3D face model for pose and illumination invariant face recognition[C]//2009 sixth IEEE International Conference on Advanced Video and Signal based Surveillance.Genova,IEEE,2009:296-301. [4]LI T,BOLKART T,BLACK M J,et al.Learning a model of facial shape and expression from 4D scans[J].ACM Transactions on Graphics(ToG),2017,36(6):194:1-194:17. [5]MILDENHALL B,SRINIVASAN P P,TANCIK M,et al.Nerf:Representing scenes as neural radiance fields for view synthesis[J].Communications of the ACM,2021,65(1):99-106. [6]GAFNI G,THIES J,ZOLLHOFER M,et al.Dynamic neural radiance fields for monocular 4d facial avatar reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE,2021:8649-8658. [7]HONG Y,PENG B,XIAO H,et al.Headnerf:A real-time nerf-based parametric head model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:20374-20384. [8]ATHAR S R,XU Z,SUNKAVALLI K,et al.Rignerf:Fullycontrollable neural 3d portraits[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:20364-20373. [9]PARK J J,FLORENCE P,STRAUB J,et al.Deepsdf:Learning continuous signed distance functions for shape representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,IEEE,2019:165-174. [10]MESCHEDER L,OECHSLE M,NIEMEYER M,et al.Occupancy networks:Learning 3d reconstruction in function space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach,IEEE,2019:4460-4470. [11]ZHU L,WANG S M,LIU Q S.Self-supervised 3D Face Reconstruction Based on Detailed Face Mask[J].Computer Science,2023,50(2):214-220. [12]LIANG W L,LI Y,WANG P F.Lightweight Face Generation Method Based on TransEditor and Its Application Specification[J].Computer Science,2023,50(2):221-230. [13]FENG Y,FENG H,BLACK M J,et al.Learning an animatable detailed 3D face model from in-the-wild images[J].ACM Tran-sactions on Graphics(ToG),2021,40(4):1-13. [14]ZHENG Y,ABREVAYA V F,BÜHLER M C,et al.Im avatar:Implicit morphable head avatars from videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:13545-13555. [15]XU Y,WANG L,ZHAO X,et al.Avatarmav:Fast 3d head avatar reconstruction using motion-aware neural voxels[C]//ACM SIGGRAPH 2023 Conference Proceedings.Los Angeles:ACM,2023:1-10. [16]FRIDOVICH K S,YU A,TANCIK M,et al.Plenoxels:Radi-ance fields without neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans,IEEE,2022:5501-5510. [17]FRIDOVICH K S,MEANTI G,WARBURG F R,et al.K-planes:Explicit radiance fields in space,time,and appearance[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:12479-12488. [18]YI B,ZENG W,BUCHANAN S,et al.Canonical factors for hybrid neural fields[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver:IEEE,2023:3414-3426. [19]CAO A,JOHNSON J.Hexplane:A fast representation for dynamic scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:130-141. [20]MÜLLER T,EVANS A,SCHIED C,et al.Instant neuralgraphics primitives with a multiresolution hash encoding[J].ACM Transactions on Graphics(ToG),2022,41(4):1-15. [21]ZIELONKA W,BOLKART T,THIES J.Instant volumetrichead avatars[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:4574-4584. [22]LI J,ZHANG J,BAI X,et al.Efficient region-aware neural ra-diance fields for high-fidelity talking portrait synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Vancouver:IEEE,2023:7568-7578. [23]ZHENG Y,YIFAN W,WETZSTEIN G,et al.Pointavatar:Deformable point-based head avatars from videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver:IEEE,2023:21057-21067. [24]XU Q,XU Z,PHILIP J,et al.Point-nerf:Point-based neural radiance fields[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New Orleans:IEEE,2022:5438-5448. [25]CHEN R,HAN S,XU J,et al.Point-based multi-view stereonetwork[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Long Beach:IEEE,2019:1538-1547. [26]REN F,CHANG Q L,LIU X L,et,al.Overview of 3D Reconstruction of Indoor Structures Based on Point Clouds[J/OL].https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=21167. [27]RAVI N,REIZENSTEIN J,NOVOTNY D,et al.Accelerating3d deep learning with pytorch3d[J].arXiv:2007.08501,2020. [28]CHEN J,WU X J.3D Human Body Shape and Motion Tracking by LBS and Snake[J].Journal of Computer-Aided Design & Computer Graphics,2012,24(3):357-363,371. [29]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [30]GROPP A,YARIV L,HAIM N,et al.Implicit geometric regularization for learning shapes[J].arXiv:2002.10099,2020. [31]KE Z,SUN J,LI K,et al.Modnet:Real-time trimap-free portrait matting via objective decomposition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Vancouver:AAAI,2022,36(1):1140-1147. [32]SALIMANS T,KINGMA D P.Weight normalization:A simple reparameterization to accelerate training of deep neural networks[C]//30th Conference on Neural Information Processing Systems(NIPS 2016).Barcelona,Spain,2016. [33]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014. [34]WANG L,ZHAO X,SUN J,et al.StyleAvatar:Real-time Photo-realistic Portrait Avatar from a Single Video[J].arXiv:2305.00942,2023. [35]KARRAS T,LAINE S,AILA T.A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach:IEEE,2019:4401-4410. [36]WANG Z,BOVIK A C,SHEIKH H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE Transactions on Image Pocessing,2004,13(4):600-612. [37]ZHANG R,ISOLA P,EFROS A A,et al.The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:586-595. |
|