面向自然交互的多通道人机对话系统

doi:10.11896/j.issn.1002-137X.2014.10.003

Abstract

Abstract: During the dialogue,people naturally use multimodal information,e.g．facial expressions and gestures,in addition to using spoken interaction,to support the content expression．The paper proposed a framework on how to efficiently fuse the multimodal information with human-computer dialog model and finally created a multimodal human-computer dialog system．The paper classified the fused methods into three modes,complementary,mixed and indepen-dent,according to their relations between speech channel and other channels．For the dialog framework,the paper proposed a multimodal dialog management model by combining finite state machine,slot filling method and mixed initiative method．The new module can flexibly process the multimodal information during the dialogue．The paper also proposed a Multimodal Markup Language (MML) to control the action of the virtual human for the dialog system．The MML can help to coordinate the complicated actions among different channels for virtual human．Finally,based on above technologies,the paper created a multimodal dialog system and used it for weather information retrieval service.

Key words: Multimodal fusion,Human computer interaction,Dialog management

YANG Ming-hao,TAO Jian-hua,LI Hao and CHAO Lin-lin. Nature Multimodal Human-Computer-Interaction Dialog System[J].Computer Science, 2014, 41(10): 12-18.

References

[1] Ananova．[2013-08-21]．http://en.wikipedia.org/wiki/Ananova
[2] MMDAgent．[2013-08-21]．http://www.mmdagent.jp/
[3] Morbini F,DeVault D,Sagae K,et al．FLoReS:A Forward Looking,Reward Seeking,Dialogue Manager[C]∥4th International Workshop on Spoken Dialog Systems．2014:313-325
[4] Bohus D,Rudnicky A．Sorry,i didn’t catch that! - an investigation of non-understanding errors and recovery strategies[C]∥Proceedings of SIGdial．Lisbon,Portugal,2005:128-143
[5] Goddeau D,Meng H,Poliforni J,et al．A Form-Based Dialogue Management For Spoken Language Applications[C]∥International Conference on Spoken Language Processing(ICSLP’1996)．Pittsburgh,PA,1996 :701-704
[6] Michael F,Mctear．Spoken Dialogue Technology:Enabling theConversational User Interface[J]．ACM Computing Surveys,2002,34(1):90-169
[7] Badler N,Steedman M,Achorn B,et al．Animated conversation:Rule-based generation of facial expression gesture and spoken intonation for multiple conversation agents[C]∥Proceedings of SIGGRAPH．1994:73-80
[8] Pietquin O．A probabilistic framework for dialog simulation and optimal strategy learning[J]．IEEE Transactions on Audio,Speech,and Language Processing,2004,14(2):589-599
[9] Schwarzlery S,Maiery S,Schenk J,et al．Using graphical models for mixed-initiative dialog management systems with realtime Policies[C]∥Annual Conference of the International Speech Communication Association - INTERSPEECH 2009．2009:260-263
[10] Schatzmann J,Weilhammer K,Stuttle M,et al.A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies[J]．Journal:Knowledge Engineering Review,2006,21(2):97-126
[11] Williams J D,Poupart P,Young S．Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management[J]∥Springer Netherlands,2008,9:191-217
[12] Young S．Using POMDPs for Dialog Management,Conference[C]∥IEEE Workshop on Spoken Language Technology-SLT．2006
[13] Raux A,Eskenazi M．A Finite-State Turn-Taking Model forSpoken Dialog Systems,Conference[C]∥North American Chapter of the Association for Computational Linguistics-NAACL．2009:629-637
[14] Hori C,Ohtake K,Misu T,et al．Weighted Finite State Transducer Based Statistical Dialog Management,Conference[C]∥IEEE Workshop on Automatic Speech Recognition and Understanding-ASRU．2009
[15] Tur G,Celikyilmaz A,Hakkani-Tur D．Latent Semantic Mode-liong for Slot Filling In Conversationl Understadjing[C]∥2013 IEEE International Conference on Acoustics,Speech,and Signal Processing．Vancouver,Canada,2013 (下转第35页)(上接第18页)
[16] Schatzmann J,Weilhammer K,Stuttle M,et al.A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies[J]．The Knowledge Enginee-ring Review,2006,21(2)
[17] Eyben F,Wollmer M,Graves A,et al.On-line Emotion Recognition in a 3-D Activation-Valence-Time Continuum using Acoustic and Linguistic Cues [J]．Journal on Multimodal User Interfaces (JMUI) Special Issue on Real-Time Affect Analysis and Interpretation:Closing the Affective Loop in Virtual Agents and Robots,2010,3(1/2):7-12
[18] Lee C,Jung S,Kim K,et al．Recent Approaches to Dialog Management for Spoken Dialog Systems[J]．Journal of Computing Science and Engineering,2010,4(1):1-22
[19] Carolis B D,Pelachaud C,Poggi I,et al.Behavior planning for a reflexive agent[C]∥Proceedings of the International Joint Conference on Articial Intelligence (IJCAI’01)．Seattle,2001
[20] Cerekovic A,Pejsa T．Pandzic Igors RealActor:Character Animation and Multimodal Behavior Realization System[J]．IVA:2009,5773:486-487
[21] Van Welbergen H,Reidsma D,Ruttkay Z M,et al．A BML Reali-zer for continuous,multimodal interaction with a Virtual Human [J]．Journal on Multimodal User Interfaces,2010,3(4):271-284
[22] Kipp M,Heloir A,Gebhard P．Schroeder,Realizing MultimodalBehavior:Closing the gap between behavior planning and embodied agent presentation[C]∥Proceedings of the 10th International Conference on Intelligent Virtual Agents．Springer,2010
[23] Tao Jian-hua,Mu Kai-hui,Che Jian-feng,et al.Audio-VisualBased Emotion Recognition with the Balance of Dominances[C]∥International Conference on Artificial Intelligence (ICAI1010)．Oct．2010: 100-110
[24] EMMA:Extensible MultiModal Annotation markup language．[2013-08-21 ]．http://www.w3.org/TR/emma/
[25] Speech Synthesis Markup Language(SSML) Verison 1.1．[2013-08-21 ]．http://www.w3.org/TR/speech-synthesis11/
[26] Tao Jian-hua,Xin Le,Yin Pan-rong．Realistic Visual SpeechSynthesis based on Hybrid Concatenation Method [J]．IEEE Transactions on Audio,Speech and Language Processing,2009,17(3):469-477
[27] 3D CHARACTER ANIMATION LIBRARY．[2013-8-21]．http://home.gna.org/cal3d/

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Nature Multimodal Human-Computer-Interaction Dialog System

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0