计算机科学 ›› 2026, Vol. 53 ›› Issue (4): 326-336.doi: 10.11896/jsjkx.251200015
詹奇玮1, 任好佳2, 肖甜甜3
ZHAN Qiwei1, REN Haojia2, XIAO Tiantian3
摘要: 近年来,基于扩散模型的语音驱动面部动画生成方法已取得突破性进展,此类方法能够高效生成长时序、音频嘴型同步的高分辨率讲话视频。然而,当前方法生成的视频在嘴部区域普遍存在显著的模糊与伪影问题,严重制约了合成视频的真实感与视觉可信度。针对这一问题,提出一种基于 EchoMimic 改进的面部动画生成算法LiveEchoMimic,并深入探讨其标准化应用规范。首先,在技术应用层面,以 EchoMimic 扩散模型与隐式关键点模型为双核心基础架构,构建了一套端到端的自然化讲话视频生成框架。其中,EchoMimic 扩散模型借助音频特征与面部关键点的联合控制机制,完成粗粒度讲话视频的生成任务;隐式关键点模型则采用视频驱动的范式,通过控制隐式关键点空间的位移特征,实现高质量面部动画视频的精细化生成。其次,构建音频-嘴型映射模型,用于精准建模音频特征与嘴部运动状态间的内在关联,并针对性设计映射网络,以强化生成视频的音频-嘴型同步精度。最后,在公开数据集CelebV-HQ、MEAD及私有数据集Avatar上开展大规模实验验证,定量与定性结果表明,LiveEchoMimic方法在视觉质量、音频-嘴型同步性等核心指标上显著优于当前主流方法,实现了最佳的视频生成性能。在应用规范层面,鉴于高度逼真的语音驱动面部动画技术可能引发身份与行为的失真问题,从面临挑战、应用理念、实施措施等方面提出了可操作性的建议,以促进语音驱动面部动画技术在可控、安全前提下更好地契合社会发展需求。
中图分类号:
| [1]YAN W B.An Audio-Driven Facial Animation Generation Mo-del with Controllable Emotional Intensity[J].Information Technology and Informatization,2025(8):161-165. [2]LIU L,LI H,ZHANG M,et al.A survey of deep learning-based facial animation driving methods[J].Journal of Xidian University,2025,52(2):57-84. [3]HU Q H.Research on Key Technologies of Facial Animation Generation for Digital Humans[D].Chengdu:University of Electronic Science and Technology of China,2024. [4]JI X J.Research on 3D Facial Animation Generation Based on Deep Learning[D].Hefei:University of Science and Technology of China,2023. [5]LIU X M,LIU L,JIA D,et al.A Survey of 3D Facial Animation Technology Driven by Speech[J].Computer Systems & Applications,2022,31(10):44-50. [6]DING N.The Performativity of Human Translators in the Era of Artificial General Intelligence:A Case Study of Conference Interpreters[J].Technology Enhanced Foreign Language Education,2025(2):10-16,98. [7]WANG H W.Research on AR Interaction Design of Remote Office Meeting Platform from the Perspective of Embodiment[D].Nanjing:Nanjing University of Science and Technology,2022. [8]XU H X,LIU L,WANG J,etal.An Augmented Reality-Enabled Motion Monitoring and Interaction System for Mobile Robots[J/OL].China Mechanical Engineering,1-10[2024-11-24].https://link.cnki.net/urlid/42.1294.th.20251111.1619.007. [9]HU J W,HE H Y,LEI Y J,et al.An Augmented Reality Human-Robot Interaction Teleoperation System for Dual-Arm Collaborative Robots[J].Transducer and Microsystem Technologies,2025,44(11):87-92. [10]ZHENG F,LIU X Y.The Application of Virtual Reality Technology in Film and Television Production[J].China Information Times,2025 (3):49-51. [11]AN J.The Integration and Innovation of Cross-Media Art and Film Production in the Digital-Intelligence Era[N].Henan Economic Daily,2024-06-22(11). [12]CHENG C,ZHAO Z K,DONG W J,et al.A Multimodal-Driven Facial Animation Generation Model with Controllable Emotion[J].Science Technology and Engineering,2025,25(28):12120-12129. [13]CHENG C,ZHAO Z K,DONG W J,et al.Multimodal-driven facial animation generation model with controllable emotion[J].Science Technology and Engineering,2025,25(28):12120-12129. [14]DOU Z W,LI W S.Facial Animation Generation Based onTransformer[J].Software Engineering,2023,26(12):59-62. [15]CAI G X.Audio2Face:Intelligently Generating Facial Animation for Virtual Characters from Audio Files[J].Modern Film Technology,2021(9):60-61. [16]BLANZ V,VETTER T.Face recognition based on fitting a 3D morphable model[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(9):1063-1074. [17]BOOTH J,ROUSSOS A,ZAFEIRIOU S,et al.A 3d morphable model learnt from 10,000 faces[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5543-5552. [18]GENOVA K,COLE F,MASCHINOT A,et al.Unsupervisedtraining for 3d morphable model regression[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:8377-8386. [19]ROMDHANI V.Efficient,robust and accurate fitting of a 3D morphable model[C]//Proceedings Ninth IEEE International Conference on Computer Vision.IEEE,2003:59-66. [20]EGGER B,SMITH W A P,TEWARI A,et al.3d morphable face models-past,present,and future[J].ACM Transactions on Graphics,2020,39(5):1-38. [21]REMONDINO F,KARAMI A,YAN Z,et al.A critical analysis of NeRF-based 3D reconstruction[J].Remote Sensing,2023,15(14):3585. [22]WANG Z,WU S,XIE W,et al.NeRF--:Neural radiance fields without known camera parameters[J].arXiv:2102.07064,2021. [23]YARIV L,GU J,KASTEN Y,et al.Volume rendering of neural implicit surfaces[J].Advances in Neural Information Processing Systems,2021,34:4805-4815. [24]NIEMEYER M,MESCHEDER L,OECHSLE M,et al.Diffe-rentiable volumetric rendering:Learning implicit 3d representations without 3d supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3504-3515. [25]TEWARI A,THIES J,MILDENHALL B,et al.Advances in neural rendering[C]//Computer Graphics Forum.2022:703-735. [26]CHEN S,SUN P,SONG Y,et al.Diffusiondet:Diffusion model for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:19830-19843. [27]BERRY F S,BERRY W D.Innovation and diffusion models in policy research[J].Theories of the Policy Process,2018:253-297. [28]HO J,SALIMANS T,GRITSENKO A,et al.Video diffusion models[J].Advances in Neural Information Processing Systems,2022,35:8633-8646. [29]WU L,SUN P,FU Y,et al.A neural influence diffusion model for social recommendation[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:235-244. [30]WU L,SUN P,FU Y,et al.A neural influence diffusion model for social recommendation[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval.2019:235-244. [31]ZHU H,WU W,ZHU W,et al.CelebV-HQ:A large-scale video facial attributes dataset[C]//European Conference on Computer Vision.Cham:Springer,2022:650-667. [32]WANG K,WU Q,SONG L,et al.Mead:A large-scale audio-visual dataset for emotional talking-face generation[C]//European Conference on Computer Vision.Cham:Springer,2020:700-717. [33]SHAO M H,LU H R,WANG G D.A Speech-Driven Digital Human Face Generation Method Based on 4D Gaussian Splatting[EB/OL].https://link.cnki.net/urlid/50.1075.TP.20251205.1334.026. [34]HUANG C X,LU T L,PENG S F.Research on Active Defense Against Face Forgery Based on Hybrid Color Space and Attention Mechanism[EB/OL].https://link.cnki.net/urlid/50.1075.tp.20251219.1739.045. [35]CUDEIRO D,BOLKART T,LAIDLAW C,et al.Capture,learning,and synthesis of 3D speaking styles[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10101-10111. [36]FAN Y,LIN Z,SAITO J,et al.Faceformer:Speech-driven 3d facial animation with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:18770-18780. [37]ZHANG W,CUN X,WANG X,et al.Sadtalker:Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:8652-8661. [38]WEI H,YANG Z,WANG Z.Aniportrait:Audio-driven synthesis of photorealistic portrait animation[J].arXiv:2403.17694,2024. [39]TAN S,JI B,BI M,et al.Edtalk:Efficient disentanglement for emotional talking head synthesis[C]//European Conference on Computer Vision.Cham:Springer,2024:398-416. [40]CHEN Z,CAO J,CHEN Z,et al.Echomimic:Lifelike audio-driven portrait animations through editable landmark conditions[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2025:2403-2410. [41]CUI J,LI H,YAO Y,et al.Hallo2:Long-duration and high-resolution audio-driven portrait image animation[J].arXiv:2410.07718,2024. [42]LI W,ZHANG L,WANG D,et al.One-shot high-fidelity talking-head synthesis with deformable neural radiance field[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:17969-17978. [43]MA Z,ZHU X,QI G J,et al.Otavatar:One-shot talking faceavatar with controllable tri-plane rendering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16901-16910. [44]SUN J,WANG X,WANG L,et al.Next3d:Generative neural texture rasterization for 3d-aware head avatars[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:20991-21002. [45]DENG Y,WANG D,REN X,et al.Portrait4d:Learning one-shot 4d head avatar synthesis using synthetic data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:7119-7130. [46]GUO J,ZHANG D,LIU X,et al.LivePortrait:Efficient Portrait Animation with Stitching and Retargeting Control[J].arXiv:2407.03168,2024. [47]SUN R,FA Y W,FENG H D,et al.Research progress on face presentation attack detection method based on deep learning[J].Computer Science,2025,52(2):323-335. [48]JIANG J,ZHANG Q,WANG C Y.A review of iris recognition based on deep learning[J].Computer Science and Exploration,2024,18(6):1421-1437. [49]CHEN C F,XU Y F.Deepfakes in the Intelligent Era and Approaches to Their Governance[J].News and Writing,2020(4):66-71. [50]XIE J C,TANG E S.The Social Harm and Governance Controversy of Deepfakes[J].News and Writing,2023(4):96-105. [51]XIONG B.The Expansive Risks of Criminal Governance inDeepfake Technology and Its Limits[J].Journal of Anhui University (Philosophy and Social Science),2020(6):105-113. [52]CHEN Z B,ZHANG L H.Legal Regulation of Algorithm Technology:Governance Dilemma,Development Logic,and Optimization Path[J].China Journal of Applied Jurisprudence,2024(4):155-166. [53]ZHAN Q W.Review and Optimization of China’s Criminal Law Protection Model for Personality Rights[J].China Criminal Law Journal,2025(4):69-86. [54]LI H S.Criminal Law Responses to Identity Fraud in the Digital Age[J].Jianghuai Forum,2024(3):124-132. [55]LI H S.On the Criminal Responsibility about the Abuse of Personal Biometric Information-Taking Artificial Intelligence “Deepfake” as an Example[J].Tribune of Political Science and Law,2020,38(4):144-154. [56]LIU Y H.The Iterative Upgrading of Cyber Crime to DigitalCrime and The Response From Criminal Law[J].Journal of Comparative Law,2025(1):1-15. [57]ZHAO B Z,ZHAN Q W.Reality Challenges and Future Prospects:A Reflection on Artificial Intelligence in Criminal Jurisprudence[J].Jinan Journal (Philosophy and Social Science),2019(1):98-110. [58]SHEN W X.Reconstruction of the Digital Rights System:Toward a Pattern of Differential Order of Privacy,Information and Data[J].Tribune of Political Science and Law,2022,40(3):89-102. |
| [1] | 刘德华, 喻赛萱, 乔金兰, 黄河清, 程文辉. 基于去噪扩散模型增强的换电需求数据生成算法 Denoising Diffusion Model-enhanced Algorithm for Battery Swap Demand Data Generation 计算机科学, 2026, 53(4): 163-172. https://doi.org/10.11896/jsjkx.250600205 |
| [2] | 赵海华, 唐瑞, 莫先. 图扩散模型方法与应用研究综述 Review of Methods and Applications of Graph Diffusion Models 计算机科学, 2026, 53(3): 115-128. https://doi.org/10.11896/jsjkx.250200118 |
| [3] | 王一鸣, 焦敏, 赵素云, 陈红, 李翠平. 基于指示词表征学习的半监督聚类方法 Prompt-conditioned Representation Learning with Diffusion Models for Semi-supervised Clustering 计算机科学, 2026, 53(3): 158-165. https://doi.org/10.11896/jsjkx.250600063 |
| [4] | 侯哲晓, 李弼程, 蔡炳炎, 许逸飞. 基于改进扩散模型的高质量图像生成方法 High Quality Image Generation Method Based on Improved Diffusion Model 计算机科学, 2025, 52(6A): 240500094-9. https://doi.org/10.11896/jsjkx.240500094 |
| [5] | 邹睿, 杨鉴, 张凯. 基于音素大语言模型及扩散模型的低资源越南语语音合成 Low-resource Vietnamese Speech Synthesis Based on Phoneme Large Language Model andDiffusion Model 计算机科学, 2025, 52(6A): 240700138-6. https://doi.org/10.11896/jsjkx.240700138 |
| [6] | 康凯, 王家宝, 徐堃. 平衡可迁移与不可察觉的对抗攻击 Balancing Transferability and Imperceptibility for Adversarial Attacks 计算机科学, 2025, 52(6): 381-389. https://doi.org/10.11896/jsjkx.240300083 |
| [7] | 耿胜, 丁卫平, 鞠恒荣, 黄嘉爽, 姜舒, 王海鹏. FDiff-Fusion:基于模糊逻辑驱动的医学图像扩散融合网络分割模型 FDiff-Fusion:Medical Image Diffusion Fusion Network Segmentation Model Driven Based onFuzzy Logic 计算机科学, 2025, 52(6): 274-285. https://doi.org/10.11896/jsjkx.240600006 |
| [8] | 杨岚, 赵金雄, 李志茹, 张驯, 狄磊, 蔡云婕, 张和慧. 面向电力缺陷场景的小样本图像生成适应 Few-shot Image Generative Adaptation for Power Defect Scenes 计算机科学, 2025, 52(11A): 241100149-8. https://doi.org/10.11896/jsjkx.241100149 |
| [9] | 李思慧, 蔡国永, 蒋航, 文益民. 一种新的基于凸损失函数的离散扩散文本生成模型 Novel Discrete Diffusion Text Generation Model with Convex Loss Function 计算机科学, 2025, 52(10): 231-238. https://doi.org/10.11896/jsjkx.240800147 |
| [10] | 黄飞虎, 李沛东, 彭舰, 董石磊, 赵红磊, 宋卫平, 李强. 计及风电的发电商报价多智能体模型 Multi-agent Based Bidding Strategy Model Considering Wind Power 计算机科学, 2024, 51(6A): 230600179-8. https://doi.org/10.11896/jsjkx.230600179 |
| [11] | 葛胤池, 张辉, 孙浩航. 基于隐空间扩散模型的差分隐私数据合成方法研究 Differential Privacy Data Synthesis Method Based on Latent Diffusion Model 计算机科学, 2024, 51(3): 30-38. https://doi.org/10.11896/jsjkx.230700177 |
| [12] | 刘增科, 殷继彬. 文本驱动的情绪多样化人脸动画生成研究 Text-driven Generation of Emotionally Diverse Facial Animations 计算机科学, 2024, 51(11A): 240100094-8. https://doi.org/10.11896/jsjkx.240100094 |
| [13] | 闫志浩, 周长兵, 李小翠. 生成扩散模型研究综述 Survey on Generative Diffusion Model 计算机科学, 2024, 51(1): 273-283. https://doi.org/10.11896/jsjkx.230300057 |
| [14] | 郑红波, 吴斌, 徐菲, 张美玉, 秦绪佳. 基于高斯扩散模型的垃圾焚烧废气排放可视化 Visualization of Solid Waste Incineration Exhaust Emissions Based on Gaussian Diffusion Model 计算机科学, 2019, 46(6A): 527-531. |
| [15] | 赵海勇,贾仰理. 一种改进的各向异性扩散去噪模型 Improved Anisotropic Diffusion Denosing Model 计算机科学, 2013, 40(Z11): 147-149. |
|
||