计算机科学 ›› 2014, Vol. 41 ›› Issue (10): 12-18.doi: 10.11896/j.issn.1002-137X.2014.10.003

• 2013’和谐人机环境联合学术会议 • 上一篇    下一篇

面向自然交互的多通道人机对话系统

杨明浩,陶建华,李昊,巢林林   

  1. 中国科学院自动化研究所模式识别国家重点实验室 北京100190;中国科学院自动化研究所模式识别国家重点实验室 北京100190;中国科学院自动化研究所模式识别国家重点实验室 北京100190;中国科学院自动化研究所模式识别国家重点实验室 北京100190
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受对话管理为中心的双向多模态人机交互研究(90820303),基于影像和语音分析的发音器官运动可视化(61273288),语音产生过程的神经生理建模与控制(61233009),文本无关的语音转换方法研究(60873160),基于维度模型的情感语音建模及生成方法研究(61203258)资助

Nature Multimodal Human-Computer-Interaction Dialog System

YANG Ming-hao,TAO Jian-hua,LI Hao and CHAO Lin-lin   

  • Online:2018-11-14 Published:2018-11-14

摘要: 人们在对话过程中,除了使用口语交互外,还会很自然地利用表情、姿态等多模态信息辅助交流。重点分析并阐述了如何将这些多模态交互方式有效地融合到人机对话模型中,并实现一个面向自然交互的多模态人机对话系统。首先根据不同通道(如情感、头姿)对语音交互的影响,将它们主要分为信息互补、信息融合和信息独立3种模式,并针对3种模式分别采用不同的方式实现输入信息的多模态融合。信息融合后的对话管理,采用有限自动机、填槽法和混合主导方式的对话管理策略。针对对话中的情感处理,提出一种情感状态预测网络来记录用户的情感变化,并根据话语的轮转的不同对话上下文对用户情绪变化进行及时反馈,该对话模型能比较灵活地处理用户在对话过程中呈现的多模态信息。信息输出方面,针对人机对话中较为常用的数字虚拟人的行为控制,提出了一种简化的多模态协同置标语言,实现了虚拟人的包括情感、姿态与语音的同步表达,提高了虚拟人的表现力。最后基于以上关键技术,实现了一个面向城市路况信息查询的多模态自然人机对话系统。多个用户的体验表明,相对于传统的语音人机对话模型,多通道自然人机对话系统能有效提高用户交互的自然度。

关键词: 多模态信息融合,人机交互,对话管理

Abstract: During the dialogue,people naturally use multimodal information,e.g.facial expressions and gestures,in addition to using spoken interaction,to support the content expression.The paper proposed a framework on how to efficiently fuse the multimodal information with human-computer dialog model and finally created a multimodal human-computer dialog system.The paper classified the fused methods into three modes,complementary,mixed and indepen-dent,according to their relations between speech channel and other channels.For the dialog framework,the paper proposed a multimodal dialog management model by combining finite state machine,slot filling method and mixed initiative method.The new module can flexibly process the multimodal information during the dialogue.The paper also proposed a Multimodal Markup Language (MML) to control the action of the virtual human for the dialog system.The MML can help to coordinate the complicated actions among different channels for virtual human.Finally,based on above technologies,the paper created a multimodal dialog system and used it for weather information retrieval service.

Key words: Multimodal fusion,Human computer interaction,Dialog management

[1] Ananova.[2013-08-21].http://en.wikipedia.org/wiki/Ananova
[2] MMDAgent.[2013-08-21].http://www.mmdagent.jp/
[3] Morbini F,DeVault D,Sagae K,et al.FLoReS:A Forward Looking,Reward Seeking,Dialogue Manager[C]∥4th International Workshop on Spoken Dialog Systems.2014:313-325
[4] Bohus D,Rudnicky A.Sorry,i didn’t catch that! - an investigation of non-understanding errors and recovery strategies[C]∥Proceedings of SIGdial.Lisbon,Portugal,2005:128-143
[5] Goddeau D,Meng H,Poliforni J,et al.A Form-Based Dialogue Management For Spoken Language Applications[C]∥International Conference on Spoken Language Processing(ICSLP’1996).Pittsburgh,PA,1996 :701-704
[6] Michael F,Mctear.Spoken Dialogue Technology:Enabling theConversational User Interface[J].ACM Computing Surveys,2002,34(1):90-169
[7] Badler N,Steedman M,Achorn B,et al.Animated conversation:Rule-based generation of facial expression gesture and spoken intonation for multiple conversation agents[C]∥Proceedings of SIGGRAPH.1994:73-80
[8] Pietquin O.A probabilistic framework for dialog simulation and optimal strategy learning[J].IEEE Transactions on Audio,Speech,and Language Processing,2004,14(2):589-599
[9] Schwarzlery S,Maiery S,Schenk J,et al.Using graphical models for mixed-initiative dialog management systems with realtime Policies[C]∥Annual Conference of the International Speech Communication Association - INTERSPEECH 2009.2009:260-263
[10] Schatzmann J,Weilhammer K,Stuttle M,et al.A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies[J].Journal:Knowledge Engineering Review,2006,21(2):97-126
[11] Williams J D,Poupart P,Young S.Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management[J]∥Springer Netherlands,2008,9:191-217
[12] Young S.Using POMDPs for Dialog Management,Conference[C]∥IEEE Workshop on Spoken Language Technology-SLT.2006
[13] Raux A,Eskenazi M.A Finite-State Turn-Taking Model forSpoken Dialog Systems,Conference[C]∥North American Chapter of the Association for Computational Linguistics-NAACL.2009:629-637
[14] Hori C,Ohtake K,Misu T,et al.Weighted Finite State Transducer Based Statistical Dialog Management,Conference[C]∥IEEE Workshop on Automatic Speech Recognition and Understanding-ASRU.2009
[15] Tur G,Celikyilmaz A,Hakkani-Tur D.Latent Semantic Mode-liong for Slot Filling In Conversationl Understadjing[C]∥2013 IEEE International Conference on Acoustics,Speech,and Signal Processing.Vancouver,Canada,2013 (下转第35页)(上接第18页)
[16] Schatzmann J,Weilhammer K,Stuttle M,et al.A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies[J].The Knowledge Enginee-ring Review,2006,21(2)
[17] Eyben F,Wollmer M,Graves A,et al.On-line Emotion Recognition in a 3-D Activation-Valence-Time Continuum using Acoustic and Linguistic Cues [J].Journal on Multimodal User Interfaces (JMUI) Special Issue on Real-Time Affect Analysis and Interpretation:Closing the Affective Loop in Virtual Agents and Robots,2010,3(1/2):7-12
[18] Lee C,Jung S,Kim K,et al.Recent Approaches to Dialog Management for Spoken Dialog Systems[J].Journal of Computing Science and Engineering,2010,4(1):1-22
[19] Carolis B D,Pelachaud C,Poggi I,et al.Behavior planning for a reflexive agent[C]∥Proceedings of the International Joint Conference on Articial Intelligence (IJCAI’01).Seattle,2001
[20] Cerekovic A,Pejsa T.Pandzic Igors RealActor:Character Animation and Multimodal Behavior Realization System[J].IVA:2009,5773:486-487
[21] Van Welbergen H,Reidsma D,Ruttkay Z M,et al.A BML Reali-zer for continuous,multimodal interaction with a Virtual Human [J].Journal on Multimodal User Interfaces,2010,3(4):271-284
[22] Kipp M,Heloir A,Gebhard P.Schroeder,Realizing MultimodalBehavior:Closing the gap between behavior planning and embodied agent presentation[C]∥Proceedings of the 10th International Conference on Intelligent Virtual Agents.Springer,2010
[23] Tao Jian-hua,Mu Kai-hui,Che Jian-feng,et al.Audio-VisualBased Emotion Recognition with the Balance of Dominances[C]∥International Conference on Artificial Intelligence (ICAI1010).Oct.2010: 100-110
[24] EMMA:Extensible MultiModal Annotation markup language.[2013-08-21 ].http://www.w3.org/TR/emma/
[25] Speech Synthesis Markup Language(SSML) Verison 1.1.[2013-08-21 ].http://www.w3.org/TR/speech-synthesis11/
[26] Tao Jian-hua,Xin Le,Yin Pan-rong.Realistic Visual SpeechSynthesis based on Hybrid Concatenation Method [J].IEEE Transactions on Audio,Speech and Language Processing,2009,17(3):469-477
[27] 3D CHARACTER ANIMATION LIBRARY.[2013-8-21].http://home.gna.org/cal3d/

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!