计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210900094-5.doi: 10.11896/jsjkx.210900094
张斌, 刘长红, 曾胜, 揭安全
ZHANG Bin, LIU Chang-hong, ZENG Sheng, JIE An-quan
摘要: 人们在发言时的手势动作往往具有自己独特的个人风格,研究者们提出了基于生成式对抗网络的语音驱动个人风格手势生成的方法,然而所生成的动作不自然,存在时序上动作不连贯的问题。针对该问题,文中提出了一种基于时空图卷积网络的语音驱动个人风格手势生成的方法,引入以时空图卷积网络为基础的时序动态性判别器,构建手势动作关节点之间空间和时间上的结构关系,并通过时空图卷积网络捕获手势动作关节点在空间上的相关性和提取时序上的动态性特征,使所生成的手势动作保持时序上的连贯性,以更符合真实手势的行为和结构。在Ginosar等构建的语音手势数据集上进行实验验证,与相关方法相比,正确关键点百分比指标提高了2%~5%,所生成的手势动作更自然。
中图分类号:
[1]YAGHOUBZADEH R,KRAMER M,PITSCH K,et al.Virtual agents as daily assistants for elderly or cognitively impaired people[C]//International Workshop on Intelligent Virtual Agents.Berlin,Springer:2013:79-91. [2]LI J,KIZILCEC R,BAILENSONJ,et al.Social robots and vir-tual agents as lecturers for video instruction[J].Computers in Human Behavior,2016,55:1222-1230. [3]PACELLA D,LÓPEZ-PÉREZ B.Assessing children’s interpersonal emotion regulation with virtual agents:The serious game Emodiscovery[J].Computers & Education,2018,123:1-12. [4]TAN S M,LIEW T W.Designing embodied virtual agents asproduct specialists in a multi-product category E-commerce:The roles of source credibility and social presence[J].International Journal of Human-Computer Interaction,2020,36(12):1136-1149. [5]YOON Y,KO W R,JANG M,et al.Robots learn social skills:End-to-end learning of co-speech gesture generation for huma-noid robots[C]//2019 International Conference on Robotics and Automation(ICRA).IEEE,2019:4303-4309. [6]VAN VUUREN S,CHERNEY L R.A virtual therapist forspeech and language therapy[C]//International Conference on Intelligent Virtual Agents.Cham:Springer,2014:438-448. [7]KANG S H,FENG A W,SEYMOURM,et al.Smart Mobile Virtual Characters:Video Characters vs.Animated Characters[C]//Proceedings of the Fourth International Conference on Human Agent Interaction.2016:371-374. [8]HOLLER J,LEVINSONS C.Multimodal language processing in human communication[J].Trends in Cognitive Sciences,2019,23(8):639-652. [9]BAVELAS J,GERWING J,SUTTON C,et al.Gesturing on the telephone:Independent effects of dialogue and visibility[J].Journal of Memory and Language,2008,58(2):495-520. [10]POUW W,HARRISON S J,DIXON J A.Gesture-speech phy-sics:The biomechanical basis for the emergence of gesture-speech synchrony[J].Journal of Experimental Psychology:General,2020,149(2):391. [11]BUTTERWORTH B,HADARU.Gesture,speech,and computational stages:A reply to McNeill[J].Psychological Review,1989,96(1):168-174. [12]GINOSAR S,BAR A,KOHAVI G,et al.Learning individual styles of conversational gesture[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:3497-3506. [13]WANG X,MENG H H,JIANG X T,et al.Survey on Character Motion Synthesis Based on Neural Network[J].Computer Science,2019,46(9):22-27. [14]XIN Q Q,CHEN Z X,FENG X X,et al.Movement Drive andControl Constraints of Virtual Hand Based on Multi-curve Spectrum[J].Computer Science,2014,41(1):126-129,151. [15]MARSELLA S,XU Y,LHOMMET M,et al.Virtual character performance from speech[C]//Proceed-ings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation.2013:25-35. [16]THIEBAUX M,MARSELLA S,MARSHALLA N,et al.Smartbody:Behavior realization for embodied conversational agents[C]//Proceedings of the 7th International Joint Confe-rence on Autonomous Agents and Multiagent Systems-Volume 1.2008:151-158. [17]NEFF M,KIPP M,ALBRECHT I,et al.Gesture modeling and animation based on a probabilistic recreation of speaker style[J].ACM Transactions on Graphics(TOG),2008,27(1):1-24. [18]SADOUGHI N,BUSSO C.Speech-driven animation with mea-ningful behaviors[J].Speech Communication,2019,110:90-100. [19]ALEXANDERSON S,HENTER G E,KUCHERENKO T,et al.Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows[C]//Computer Graphics Forum.2020,39(2):487-496. [20]GUO D,TANG S G,HONG R C,et al.Review of Sign Language Recognition,Translation and Generation[J].Computer Science,2021,48(3):60-70. [21]KUCHERENKO T,NAGY R,JONELL P,et al.Speech Properties Gestures:Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech[J].arXiv:2106.14736,2021. [22]HASEGAWA D,KANEKO N,SHIRAKAWA S,et al.Evaluation of speech-to-gesture generation using bi-directional LSTM network[C]//Proceedings of the 18th International Conference on Intelligent Virtual Agents.2018:79-86. [23]KUCHERENKO T,HASEGAWA D,HENTER G E,et al.Analyzing input and output representations for speech-driven gesture generation[C]//Proceedings of the 19th ACM Internatio-nal Conference on Intelligent Virtual Agents.2019:97-104. [24]YUNUS F,CLAVEL C,PELACHAUD C.Sequence-to-Seque-nce Predictive Model:From Prosody To Communicative Gestures[C]//International Conference on Human-Computer Interaction.Springer,Cham,2021:355-374. [25]REBOL M,GÜTI C,PIETROSZEK K.Passing a Non-verbalTuring Test:Evaluatina Gesture Animations Generated from Speech[C]//2021 IEEE Virtual Reality and 3D User Interfaces(VR).IEEE,2021:573-581. [26]HABIBIE I,XU W,MEHTA D,et al.Learning Speech-driven 3D Conversational Gestures from Video[J].arXiv:2102.06837,2021. [27]YAN S,XIONG Y,LIN D.Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Thirty-se-cond AAAI Conference on Artificial Intelligence.2018. [28]REN X,LI H,HUANG Z,et al.Music-oriented dance videosynthesis with pose perceptual loss[J].arXiv:1912.06606,2019. [29]CAO Z,HIDALGO G,SIMON T,et al.OpenPose:realtimemulti-person 2D pose estimation using Part Affinity Fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,43(1):172-186. [30]ABADI M.TensorFlow:learning functions at scale[C]//Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming.2016. [31]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014. [32]YANG Y,RAMANAN D.Articulated human detection withflexible mixtures of parts[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,35(12):2878-2890. |
[1] | 叶松涛, 周扬正, 范红杰, 陈正雷. 融合因果关系和时空图卷积网络的人体动作识别 Joint Learning of Causality and Spatio-Temporal Graph Convolutional Network for Skeleton- based Action Recognition 计算机科学, 2021, 48(11A): 130-135. https://doi.org/10.11896/jsjkx.201200205 |
|