文本驱动的情绪多样化人脸动画生成研究

doi:10.11896/jsjkx.240100094

Abstract

Abstract: This paper presents an innovative text-driven facial animation synthesis technique,which integrates emotion models to enhance the expressiveness of facial expressions.The methodology is composed of two core components:facial emotion simulation and the consistency between lip movements and speech.Initially,a deep analysis of the input text identifies the types of emotions contained and their intensities.Subsequently,these emotional cues are utilized to generate corresponding facial expressions using the three-dimensional free-form deformation algorithm(DFFD).Concurrently,phonemes and lip movement data from human speech are collected.These are then precisely aligned with the phonemes in the text over time using forced alignment technology,resulting in a sequence of changes in lip key points.Following this,intermediate frames are generated through linear interpolation to further refine the timeline of lip movements.Finally,the DFFD algorithm synthesizes the lip animation based on this time series data.By meticulously balancing the weights between facial emotions and lip animations,this approach successfully achieves highly realistic virtual facial expressions.

Key words: Text-driven animation, Emotion model, DFFD, Facial animation synthesis, Emotion intensity, Lip-Sync consistency

CLC Number:

TP315.69

LIU Zengke, YIN Jibin. Text-driven Generation of Emotionally Diverse Facial Animations[J].Computer Science, 2024, 51(11A): 240100094-8.

References

[1]YANG D,LI R,YANG Q,et al.3d head-talk:speech synthesis3d head movement face animation[J].Soft Computing,2024,28(1):363-379.
[2]ZHANG H,YIN J,ZHANG X.The study of a five-dimensional emotional model for facial emotion recognition[J].Mobile Information Systems,2020,2020(1):8860608.
[3]ILIC S,FUA P.Using dirichlet free form deformation to fit deformable models to noisy 3-D data[C]//European Conference on Computer Vision(Springer,2002).2002:704-717.
[4]MUZAHIDIN S,RAKUN E.Text-driven talking head using dy-namic viseme and DFFD for SIBI[C]//2020 7th International Conference on Information Technology,Computer,and Electrical Engineering(ICITACEE 2020).IEEE,2020:173-178.
[5]IGARASHI T,MOSCOVICH T,HUGHES J F.Spatial key-framing for performance-driven animation[J].ACM SIGGRAPH 2006 Courses,2006:17-es.
[6] MAI H N,KIM J,CHOI Y H,et al.Accuracy of portable face-scanning devices for obtaining three-dimensional face models:a systematic review and meta-analysis[J].International Journal of Environmental Research and Public Health,2021,18(1):94.
[7]DENG Z,CHIANG P Y,FOX P,et al.Animating blendshape faces by cross-mapping motion capture data[C]//Proceedings of the 2006 symposium on Interactive 3D Graphics and Games.2006:43-48.
[8]JAVAID M,HALEEM A,SINGH R P,et al.Industrial perspectives of 3d scanning:features,roles and it's analytical applications[J].Sensors International,2021(2):100114.
[9]PELEG S,BEN-EZRA M.Stereo panorama with a single camera[C]//1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(Cat.No PR00149),vol.1.IEEE,1999:395-401.
[10] ZHANG L,SNAVELY N,CURLESS B,et al.Spacetime faces:high resolution capture for modeling and animation[J].ACM Transactions on Graphics,2004,23(3):548-558.
[11] FURUKAWA Y,PONCE J.Dense 3d motion capture for human faces[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:1674-1681.
[12]DOUKAS M C,SHARMANSKA V,ZAFEIRIOU S.Video-to-video translation for visual speech synthesis[J].arXiv:1905.12043,2019.
[13] TAYLOR S,KIM T,YUE Y,et al.A deep learning approach for generalized speech animation[J].ACM Transactions on Graphics(TOG),2017,36(4):1-11.
[14] LING Z H,RICHMOND K,YAMAGISHI J.An analysis ofhmm-based prediction of articulatory movements[J].Speech Communication,2010,52(10):834-846.
[15] YU L,YU J,LING Q.Bltrcnn-based 3-d articulatory movement prediction:Learning articulatory synchronicity from both text and audio inputs[J].IEEE Transactions on Multimedia,2018,21(7):1621-1632.
[16]ZHU P,XIE L,CHEN Y.Articulatory movement predictionusing deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings[J].Sixteenth Annual Conference of the International Speech Communication Association,2015.
[17]KING S A,PARENT R E.Creating speech-synchronized animation[J].IEEE Transactions on Visualization and Computer Graphics,2005,11(3):341-352.
[18]ZHOU Y,XU Z,LANDRETH C,et al.Visemenet:Audio-dri-ven animator-centric speech animation[J].ACM Transactions on Graphics(TOG),2018,37(4):1-10.
[19]LIU K,OSTERMANN J.Realistic facial expression synthesis for an image-based talking head[C]//IEEE International Conference on Multimedia and Expo.IEEE,2011:1-6.
[20]MEHRABIAN A.Framework for a comprehensive description and measurement of emotional states[J].Genetic,Social,and General Psychology Monographs,1995,121(3):339-361.
[21]MEHRABIAN A.Pleasure-arousal-dominance:A general frame-work for describing and measuring individual differences in temperament[J].Current Psychology,1996(14):261-292.
[22]MEHRABIAN A,WIHARDJA C,LJUNGGREN E,Emotional correlates of preferences for situation-activity combinations in everyday life[J].Genetic,Social,and General Psychology Monographs,1997,123(4):461-478.
[23]FISCHER A H,VAN KLEEF G A.Where have all the people gone?a plea for including social interaction in emotion research[J].Emotion Review,2010,2(3):208-211.
[24]IZARD C E,ACKERMAN B P,SCHULTZ D.Independentemotions and consciousness:Self-consciousness and dependent emotions[J].At play in the fields of consciousness:Essays in honor of Jerome L.Singer,1999,83:102.
[25]EKMAN P,FRIESEN W V.Facial Action Coding System(FACS):a Technique for the Measurement of Facial Actions[J].Rivista Di Psichiatria,1978,47(2):126-138.
[26]KURENKOV A,JI J,GARG A,et al.Deformnet:Free-form deformation network for 3d shape reconstruction from a single image[C]//2018 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2018:858-866.
[27]SEDERBERG T W,PARRY S R.Free-form deformation of so-lid geometric models[C]//Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques.1986:151-160.
[28]ZADEH M,IMANI M,MAJIDI B.Fast facial emotion recognition using convolution alneural networks and gabor filters[C]//2019 5th Conference on Knowledge Based Engineering and Innovation(KBEI).IEEE,2019:577-581.
[29]JIANG P,WAN B,WANG Q,et al.Fast and efficient facial expression recognition using a gabor convolutional network[J].IEEE Signal Processing Letters,2020,27:1954-1958.
[30]AKHAND M H A,SHUVENDU R,NAZMUL S,et al.Facial emotionrecognition using transfer learning in the deep cnn[J].Electronics,2021,10(9):1036.
[31]SWAMINATHAN A,ADIVEL A V,AROCK M.Ferce:facialexpression recognition for combined emotions using ferce algorithm[J].IETE Journal of Research,2022),68(5):3235-3250.
[32]DOROTA ,KADIR A,DAVIT R,et al.Two-stage recognition and beyond for compound facial emotion recognition[J].Electronics,2021,10(22):2847.
[33]PENDHARI H,NAGDEOTE S,RATHOD S,et al.Compound emotions:a mixed emotion detection[C]//Proceedings of the International Conference on Innovative Computing & Communication(ICICC).2022.
[34]MACEDONIA M.A bizarre virtual trainer outperforms a hu-man trainer in foreign language word learning[J].International Journal of Computer Science and Artificial Intelligence,2014,4(2):24-34.
[35]FAN Y,LIN Z,SAITO J,et al.Faceformer:Speech-driven 3d fa-cial animation with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:18770-18780.
[36]ZENG J,HE X,LI S,et al.Virtual Face Animation GenerationBased on Conditional Generative Adversarial Networks[C]//2022 International Conference on Image Processing,Computer Vision and Machine Learning(ICICML).IEEE,2022:580-583.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Text-driven Generation of Emotionally Diverse Facial Animations

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 1

Metrics

Comments

Recommended 0