计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 176-183.doi: 10.11896/jsjkx.220900201

• 人工智能 • 上一篇    下一篇

基于变分自编码器的多隐变量双向推理模型

赵雁斌, 苏锦钿   

  1. 华南理工大学计算机科学与工程学院 广州541001
  • 收稿日期:2022-09-21 修回日期:2022-12-09 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 苏锦钿(sujd@scut.edu.cn)
  • 作者简介:(zhaoyanbin98@foxmail.com)
  • 基金资助:
    广东省2019年引进创新团队和产学研合作专项资金项目(2019C002001);广东省基础与应用研究基金(2019B151502057);国家自然科学基金(61936003)

Bidirectional Inference Model with Multiple Latent Variables Based on Variational Auto-encoders

ZHAO Yanbin, SU Jindian   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 541001,China
  • Received:2022-09-21 Revised:2022-12-09 Online:2023-10-10 Published:2023-10-10
  • About author:ZHAO Yanbin,born in 1998,master.His main research interests include na-tural language processing and multi-turn dialogue.SU Jindian,born in 1980,Ph.D,asso-ciate professor.His main research in-terests include deep learning and natural language processing.
  • Supported by:
    Guangdong Provincial Special Funding Projects for Introducing Innovation Team and Industry University Research Cooperation,China(2019C002001), Research and Development Program in Key Areas of Guangdong Province, China(2019B151502057) and National Natural Science Foundation of China(61936003).

摘要: 开放域对话系统的关键任务之一是生成丰富多样且连贯的对话回复,但是仅从上文信息进行单向推理无法达到这一目标。针对该问题,提出了基于多隐变量的双向推理模型MLVBI(Multiple Latent Variables Bidirectional Inference)。首先,在语言模型中结合变分自动编码器并将单向推理扩充到双向推理,将语料分割为上文、查询与回复后,使用正向推理从查询中推理出回复用于学习正常语序信息,同时使用反向推理从回复中推理出查询用于学习额外主题信息,最后融合成双向推理,使得模型生成更连贯的回复。其次,针对双向推理过程中单个隐变量解释能力不足的问题,引入多个隐变量进一步提高生成对话的多样性。实验结果表明,MLVBI在两个开放域数据集DailyDialog和PersonalChat上的准确性和多样性都达到了当前最佳的效果,并且消融实验也证明了双向推理和多隐变量的有效性。

关键词: 对话生成, 变分自动编码器, 隐变量, 双向推理, 长短时记忆网络

Abstract: One of the key tasks of open-domain dialog system is to generate diverse and coherent dialog responses.However,one-way inference from above information alone cannot achieve this goal.To solve this problem,this paper proposes a bidirectional inference model MLVBI(multiple latent variables bidirectional inference) based on multiple latent variables.First,variational auto-encoder is incorporated into the language model and one-way inference is extended to two-way inference.That is,after the corpus is divided into context,query and response,forward inference is used to infer the response from the query to learn the word order information,and reverse inference is used to infer the query from the response to learn additional topic information at the same time.Finally,the model is integrated into bidirectional inference to generate more coherent responses.Then,in order to solve the problem of insufficient explanation ability of a single latent variable in the two-way inference process,this paper introduces multiple latent variables in the inference process to further improve the diversity of generated conversations.Experimental results show that MLVBI obtains the best accuracy and diversity on two open-domain datasets,DailyDialog and PersonalChat,and ablation experiments also show the effectiveness of two-way inference and multiple latent variables.

Key words: Dialogue generation, Variational auto-encoder, Latent variable, Bidirectional inference, Long short-term memory

中图分类号: 

  • TP391
[1]SUTSKEVER I,VINYALS O,LE Q V.Sequence to seq-uencelearning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems(Volume 2).2014:3104-3112.
[2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[3]ORIOL V,QUOC V L.A neural conversational model[J].ar-Xiv:1506.05869,2015.
[4]LI J,GALLEY M,BROCKETT C,et al.A Diversity-Promoting Objective Function for Neural Conversation Models[C]//Proceedings of NAACL-HLT.2016:110-119.
[5]BAHETI A,RITTER A,LI J,et al.Generating More Interes-ting Responses in Neural Conversation Models with Distribu-tional Constraints[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3970-3980.
[6]ZHAO T,RAN Z,ESKENAZI M.Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017:654-664.
[7]BOWMAN S R,VILNIS L,VINYALS O,et al.Generatingsentences from a continuous space[C]//20th SIGNLL Confe-rence on Computational Natural Language Learning(CoNLL 2016).Association for Computational Linguistics(ACL),2016:10-21.
[8]FENG S,REN X,CHEN H,et al.Regularizing Dialogue Gene-ration by Imitating Implicit Scenarios[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:6592-6604.
[9]ZHOU W,LI Q,LI C.Learning from Perturbations:Diverse and Informative Dialogue Generation with Inverse Adversarial Training[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Vo-lume 1:Long Papers).2021:694-703.
[10]LI Z,KISELEVA J,DE RIJKE M.Improving response qualitywith backward reasoning in open-domain dialogue systems[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.2021:1940-1944.
[11]BAO S,HE H,WANG F,et al.PLATO-2:Towards Build-ing an Open-Domain Chatbot via Curriculum Learning[C]//Fin-dings of the Association for Computational Linguistics:ACL(IJCNLP 2021).2021:2513-2525.
[12]LI C,GAO X,LI Y,et al.Optimus:Organizing Sentences viaPre-trained Modeling of a Latent Space[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:4678-4699.
[13]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[14]SUN B,FENG S,LI Y,et al.Generating Relevant and Coherent Dialogue Responses using Self-Separated Conditional Variational AutoEncoders[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:5624-5637.
[15]ZHAO Y,YU P,MAHAPATRA S,et al.Improve Variational Autoencoder for Text Generation with Discrete Latent Bottleneck[J].arXiv:2004.10603,2020.
[16]YOOKOON P,JAEMIN C,AND GUNHEE K.A hierarchical latent structure for variational conversation modeling[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,New Orleans,Louisiana.Association for Computational Linguistics.2018:1792-1801.
[17]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[18]SHAO H,XIAO Z,YAO S,et al.ControlVAE:Tuning,Analytical Properties,and Performance Analysis[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2021,44(12):9285-9297.
[19]LI Y,SU H,SHEN X,et al.DailyDialog:A Manually Labelled Multi-turn Dialogue Dataset[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2017:986-995.
[20]ZHANG S,DINAN E,URBANEK J,et al.Personalizing Dialogue Agents:I have a dog,do you have pets too?[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2018:2204-2213.
[21]PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]//Procee-dings of the 40th Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2002:311-318.
[22]LIU C W,LOWE R,SERBAN I V,et al.How not to evaluate your dialogue system:An empirical study of unsupervised eva-luation metrics for dialogue response generation[C]//Procee-dings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:2122-2132.
[23]CHEN B,CHERRY C.A systematic comparison of smoothing techniques for sentence level BLEU[C]//Proceedings of the Ninth Workshop on Statistical Machine Translation.2014:362-367.
[24]RUS V,LINTEAN M.A Comparison of Greedy and OptimalAssessment of Natural Language Student Input Using Word-to-Word Similarity Metrics[C]//Proceedings of the Seventh Workshop on Building Educational Applications Using NLP.Association for Computational Linguistics,2012:157-162.
[25]LAPATA M,MITCHELL J.Vector-based models of semanticcomposition[C]//Proceedings of ACL-08:HLT.2008:236-244.
[26]FORGUES G,PINEAU J,LARCHEVÊQUE J M,et al.Bootstrapping dialog systems with word embeddings[C]//Nips,Modern Machine Learning and Natural Language Processing Workshop.2014:168-173.
[27]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543.
[28]FANG L,LI C,GAO J,et al.Implicit Deep Latent VariableModels for Text Generation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3946-3956.
[29]GU X,CHO K,HA J W,et al.Dialogwae:Multimodal response generation with conditional Wasserstein auto-encoder[C]//7th International Conference on Learning Representations.2019:2-12.
[30]LI R,LI X,CHEN G,et al.Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation[C]//Proceedings of the 28th International Conference on Computational Linguistics.International Committee on Computational Linguistics,2020:2381-2397.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!