Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 210900094-5.doi: 10.11896/jsjkx.210900094

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Speech-driven Personal Style Gesture Generation Method Based on Spatio-Temporal GraphConvolutional Networks

ZHANG Bin, LIU Chang-hong, ZENG Sheng, JIE An-quan   

  1. School of Computer & Information Engineering,Jiangxi Normal University,Nanchang 330022,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:ZHANG Bin,born in 1997,postgra-duate.His main research interests include cross-modal generation and computer vison.
    LIU Chang-hong,born in 1977,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include computer vison,cross-modal retrieval and hyper-spectral image processing.
  • Supported by:
    National Natural Science Foundation of China(62067004,61662030).

Abstract: People’s gestures in speaking often have their own unique personal style.Researchers have proposed a speech-driven personal style gesture generation method based on generative adversarial networks.However,the generated actions are unnatural for temporal discontinuity.To solve this problem,this paper proposes a speech-driven personal style gesture generation method based on the spatio-temporal graph convolutional networks,which adds the temporal dynamic discriminator based on spatio-temporal graph convolutional network.The spatial and temporal structural relationships between gesture joint points is firstly constructed,and then the spatial correlation of gesture joint points is captured and the dynamic characteristics in time sequence are extracted through the spatio-temporal graph convolution network(STGCN),so that the generated gestures maintain the consistency in time sequenceand are more consistent with the behavior and structure of real gestures.The proposed method is verified on the speech and gesture dataset constructed by Ginosar et al.Compared with relevant methods,the percentage of correct keypoints improves by about 2%~5%,and the generated gestures are more natural.

Key words: Cross-modal generation, Gesture generation, Personal style learning, Spatio-Temporal graph convolutional networks, Temporal dynamics

CLC Number: 

  • TP391.1
[1]YAGHOUBZADEH R,KRAMER M,PITSCH K,et al.Virtual agents as daily assistants for elderly or cognitively impaired people[C]//International Workshop on Intelligent Virtual Agents.Berlin,Springer:2013:79-91.
[2]LI J,KIZILCEC R,BAILENSONJ,et al.Social robots and vir-tual agents as lecturers for video instruction[J].Computers in Human Behavior,2016,55:1222-1230.
[3]PACELLA D,LÓPEZ-PÉREZ B.Assessing children’s interpersonal emotion regulation with virtual agents:The serious game Emodiscovery[J].Computers & Education,2018,123:1-12.
[4]TAN S M,LIEW T W.Designing embodied virtual agents asproduct specialists in a multi-product category E-commerce:The roles of source credibility and social presence[J].International Journal of Human-Computer Interaction,2020,36(12):1136-1149.
[5]YOON Y,KO W R,JANG M,et al.Robots learn social skills:End-to-end learning of co-speech gesture generation for huma-noid robots[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:4303-4309.
[6]VAN VUUREN S,CHERNEY L R.A virtual therapist forspeech and language therapy[C]//International Conference on Intelligent Virtual Agents.Cham:Springer,2014:438-448.
[7]KANG S H,FENG A W,SEYMOURM,et al.Smart Mobile Virtual Characters:Video Characters vs.Animated Characters[C]//Proceedings of the Fourth International Conference on Human Agent Interaction.2016:371-374.
[8]HOLLER J,LEVINSONS C.Multimodal language processing in human communication[J].Trends in Cognitive Sciences,2019,23(8):639-652.
[9]BAVELAS J,GERWING J,SUTTON C,et al.Gesturing on the telephone:Independent effects of dialogue and visibility[J].Journal of Memory and Language,2008,58(2):495-520.
[10]POUW W,HARRISON S J,DIXON J A.Gesture-speech phy-sics:The biomechanical basis for the emergence of gesture-speech synchrony[J].Journal of Experimental Psychology:General,2020,149(2):391.
[11]BUTTERWORTH B,HADARU.Gesture,speech,and computational stages:A reply to McNeill[J].Psychological Review,1989,96(1):168-174.
[12]GINOSAR S,BAR A,KOHAVI G,et al.Learning individual styles of conversational gesture[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3497-3506.
[13]WANG X,MENG H H,JIANG X T,et al.Survey on Character Motion Synthesis Based on Neural Network[J].Computer Science,2019,46(9):22-27.
[14]XIN Q Q,CHEN Z X,FENG X X,et al.Movement Drive andControl Constraints of Virtual Hand Based on Multi-curve Spectrum[J].Computer Science,2014,41(1):126-129,151.
[15]MARSELLA S,XU Y,LHOMMET M,et al.Virtual character performance from speech[C]//Proceed-ings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation.2013:25-35.
[16]THIEBAUX M,MARSELLA S,MARSHALLA N,et al.Smartbody:Behavior realization for embodied conversational agents[C]//Proceedings of the 7th International Joint Confe-rence on Autonomous Agents and Multiagent Systems-Volume 1.2008:151-158.
[17]NEFF M,KIPP M,ALBRECHT I,et al.Gesture modeling and animation based on a probabilistic recreation of speaker style[J].ACM Transactions on Graphics(TOG),2008,27(1):1-24.
[18]SADOUGHI N,BUSSO C.Speech-driven animation with mea-ningful behaviors[J].Speech Communication,2019,110:90-100.
[19]ALEXANDERSON S,HENTER G E,KUCHERENKO T,et al.Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows[C]//Computer Graphics Forum.2020,39(2):487-496.
[20]GUO D,TANG S G,HONG R C,et al.Review of Sign Language Recognition,Translation and Generation[J].Computer Science,2021,48(3):60-70.
[21]KUCHERENKO T,NAGY R,JONELL P,et al.Speech Properties Gestures:Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech[J].arXiv:2106.14736,2021.
[22]HASEGAWA D,KANEKO N,SHIRAKAWA S,et al.Evaluation of speech-to-gesture generation using bi-directional LSTM network[C]//Proceedings of the 18th International Conference on Intelligent Virtual Agents.2018:79-86.
[23]KUCHERENKO T,HASEGAWA D,HENTER G E,et al.Analyzing input and output representations for speech-driven gesture generation[C]//Proceedings of the 19th ACM Internatio-nal Conference on Intelligent Virtual Agents.2019:97-104.
[24]YUNUS F,CLAVEL C,PELACHAUD C.Sequence-to-Seque-nce Predictive Model:From Prosody To Communicative Gestures[C]//International Conference on Human-Computer Interaction.Springer,Cham,2021:355-374.
[25]REBOL M,GÜTI C,PIETROSZEK K.Passing a Non-verbalTuring Test:Evaluatina Gesture Animations Generated from Speech[C]//2021 IEEE Virtual Reality and 3D User Interfaces(VR).IEEE,2021:573-581.
[26]HABIBIE I,XU W,MEHTA D,et al.Learning Speech-driven 3D Conversational Gestures from Video[J].arXiv:2102.06837,2021.
[27]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-se-cond AAAI Conference on Artificial Intelligence.2018.
[28]REN X,LI H,HUANG Z,et al.Music-oriented dance videosynthesis with pose perceptual loss[J].arXiv:1912.06606,2019.
[29]CAO Z,HIDALGO G,SIMON T,et al.OpenPose:realtimemulti-person 2D pose estimation using Part Affinity Fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):172-186.
[30]ABADI M.TensorFlow:learning functions at scale[C]//Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming.2016.
[31]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[32]YANG Y,RAMANAN D.Articulated human detection withflexible mixtures of parts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,35(12):2878-2890.
[1] LI Chang-qing and ZHANG Yan-lan. Updating Approximations for a Type of Covering-based Rough Sets [J]. Computer Science, 2016, 43(1): 73-76.
[2] ZUO Wan-li, HAN Jia-yu, LIU Lu, WANG Ying and PENG Tao. Incremental User Interest Mining Based on Artificial Immune Algorithm [J]. Computer Science, 2015, 42(5): 34-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!