Computer Science ›› 2023, Vol. 50 ›› Issue (10): 176-183.doi: 10.11896/jsjkx.220900201

• Artificial Intelligence • Previous Articles     Next Articles

Bidirectional Inference Model with Multiple Latent Variables Based on Variational Auto-encoders

ZHAO Yanbin, SU Jindian   

  1. School of Computer Science and Engineering,South China University of Technology,Guangzhou 541001,China
  • Received:2022-09-21 Revised:2022-12-09 Online:2023-10-10 Published:2023-10-10
  • About author:ZHAO Yanbin,born in 1998,master.His main research interests include na-tural language processing and multi-turn dialogue.SU Jindian,born in 1980,Ph.D,asso-ciate professor.His main research in-terests include deep learning and natural language processing.
  • Supported by:
    Guangdong Provincial Special Funding Projects for Introducing Innovation Team and Industry University Research Cooperation,China(2019C002001), Research and Development Program in Key Areas of Guangdong Province, China(2019B151502057) and National Natural Science Foundation of China(61936003).

Abstract: One of the key tasks of open-domain dialog system is to generate diverse and coherent dialog responses.However,one-way inference from above information alone cannot achieve this goal.To solve this problem,this paper proposes a bidirectional inference model MLVBI(multiple latent variables bidirectional inference) based on multiple latent variables.First,variational auto-encoder is incorporated into the language model and one-way inference is extended to two-way inference.That is,after the corpus is divided into context,query and response,forward inference is used to infer the response from the query to learn the word order information,and reverse inference is used to infer the query from the response to learn additional topic information at the same time.Finally,the model is integrated into bidirectional inference to generate more coherent responses.Then,in order to solve the problem of insufficient explanation ability of a single latent variable in the two-way inference process,this paper introduces multiple latent variables in the inference process to further improve the diversity of generated conversations.Experimental results show that MLVBI obtains the best accuracy and diversity on two open-domain datasets,DailyDialog and PersonalChat,and ablation experiments also show the effectiveness of two-way inference and multiple latent variables.

Key words: Dialogue generation, Variational auto-encoder, Latent variable, Bidirectional inference, Long short-term memory

CLC Number: 

  • TP391
[1]SUTSKEVER I,VINYALS O,LE Q V.Sequence to seq-uencelearning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems(Volume 2).2014:3104-3112.
[2]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[3]ORIOL V,QUOC V L.A neural conversational model[J].ar-Xiv:1506.05869,2015.
[4]LI J,GALLEY M,BROCKETT C,et al.A Diversity-Promoting Objective Function for Neural Conversation Models[C]//Proceedings of NAACL-HLT.2016:110-119.
[5]BAHETI A,RITTER A,LI J,et al.Generating More Interes-ting Responses in Neural Conversation Models with Distribu-tional Constraints[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3970-3980.
[6]ZHAO T,RAN Z,ESKENAZI M.Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017:654-664.
[7]BOWMAN S R,VILNIS L,VINYALS O,et al.Generatingsentences from a continuous space[C]//20th SIGNLL Confe-rence on Computational Natural Language Learning(CoNLL 2016).Association for Computational Linguistics(ACL),2016:10-21.
[8]FENG S,REN X,CHEN H,et al.Regularizing Dialogue Gene-ration by Imitating Implicit Scenarios[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:6592-6604.
[9]ZHOU W,LI Q,LI C.Learning from Perturbations:Diverse and Informative Dialogue Generation with Inverse Adversarial Training[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Vo-lume 1:Long Papers).2021:694-703.
[10]LI Z,KISELEVA J,DE RIJKE M.Improving response qualitywith backward reasoning in open-domain dialogue systems[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.2021:1940-1944.
[11]BAO S,HE H,WANG F,et al.PLATO-2:Towards Build-ing an Open-Domain Chatbot via Curriculum Learning[C]//Fin-dings of the Association for Computational Linguistics:ACL(IJCNLP 2021).2021:2513-2525.
[12]LI C,GAO X,LI Y,et al.Optimus:Organizing Sentences viaPre-trained Modeling of a Latent Space[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:4678-4699.
[13]KINGMA D P,WELLING M.Auto-encoding variational bayes[J].arXiv:1312.6114,2013.
[14]SUN B,FENG S,LI Y,et al.Generating Relevant and Coherent Dialogue Responses using Self-Separated Conditional Variational AutoEncoders[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:5624-5637.
[15]ZHAO Y,YU P,MAHAPATRA S,et al.Improve Variational Autoencoder for Text Generation with Discrete Latent Bottleneck[J].arXiv:2004.10603,2020.
[16]YOOKOON P,JAEMIN C,AND GUNHEE K.A hierarchical latent structure for variational conversation modeling[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,New Orleans,Louisiana.Association for Computational Linguistics.2018:1792-1801.
[17]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[18]SHAO H,XIAO Z,YAO S,et al.ControlVAE:Tuning,Analytical Properties,and Performance Analysis[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2021,44(12):9285-9297.
[19]LI Y,SU H,SHEN X,et al.DailyDialog:A Manually Labelled Multi-turn Dialogue Dataset[C]//Proceedings of the Eighth International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2017:986-995.
[20]ZHANG S,DINAN E,URBANEK J,et al.Personalizing Dialogue Agents:I have a dog,do you have pets too?[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2018:2204-2213.
[21]PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]//Procee-dings of the 40th Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2002:311-318.
[22]LIU C W,LOWE R,SERBAN I V,et al.How not to evaluate your dialogue system:An empirical study of unsupervised eva-luation metrics for dialogue response generation[C]//Procee-dings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:2122-2132.
[23]CHEN B,CHERRY C.A systematic comparison of smoothing techniques for sentence level BLEU[C]//Proceedings of the Ninth Workshop on Statistical Machine Translation.2014:362-367.
[24]RUS V,LINTEAN M.A Comparison of Greedy and OptimalAssessment of Natural Language Student Input Using Word-to-Word Similarity Metrics[C]//Proceedings of the Seventh Workshop on Building Educational Applications Using NLP.Association for Computational Linguistics,2012:157-162.
[25]LAPATA M,MITCHELL J.Vector-based models of semanticcomposition[C]//Proceedings of ACL-08:HLT.2008:236-244.
[26]FORGUES G,PINEAU J,LARCHEVÊQUE J M,et al.Bootstrapping dialog systems with word embeddings[C]//Nips,Modern Machine Learning and Natural Language Processing Workshop.2014:168-173.
[27]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543.
[28]FANG L,LI C,GAO J,et al.Implicit Deep Latent VariableModels for Text Generation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:3946-3956.
[29]GU X,CHO K,HA J W,et al.Dialogwae:Multimodal response generation with conditional Wasserstein auto-encoder[C]//7th International Conference on Learning Representations.2019:2-12.
[30]LI R,LI X,CHEN G,et al.Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation[C]//Proceedings of the 28th International Conference on Computational Linguistics.International Committee on Computational Linguistics,2020:2381-2397.
[1] WANG Luo, LI Biao, FU Ruigang. Infrared Ground Multi-object Tracking Method Based on Improved ByteTrack Algorithm [J]. Computer Science, 2023, 50(9): 176-183.
[2] ZAHO Peng, ZHOU Jiantao, ZHAO Daming. Cloud Computing Load Prediction Method Based on Hybrid Model of CEEMDAN-ConvLSTM [J]. Computer Science, 2023, 50(6A): 220300272-9.
[3] WANG Lin, MENG Zuqiang, YANG Lina. Chinese Sentiment Analysis Based on CNN-BiLSTM Model of Multi-level and Multi-scale Feature Extraction [J]. Computer Science, 2023, 50(5): 248-254.
[4] LIU Hong, ZHU Yan, LI Chunping. Cross-network User Identification Based on Multiple Spatio-Temporal Trajectory Features [J]. Computer Science, 2023, 50(3): 114-120.
[5] WANG Xin-tong, WANG Xuan, SUN Zhi-xin. Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network [J]. Computer Science, 2022, 49(8): 314-322.
[6] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[7] KANG Yan, XU Yu-long, KOU Yong-qi, XIE Si-yu, YANG Xue-kun, LI Hao. Drug-Drug Interaction Prediction Based on Transformer and LSTM [J]. Computer Science, 2022, 49(6A): 17-21.
[8] WANG Shan, XU Chu-yi, SHI Chun-xiang, ZHANG Ying. Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM [J]. Computer Science, 2022, 49(6A): 675-679.
[9] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[10] PAN Zhi-hao, ZENG Bi, LIAO Wen-xiong, WEI Peng-fei, WEN Song. Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification [J]. Computer Science, 2022, 49(3): 294-300.
[11] XU Hui, WANG Zhong-qing, LI Shou-shan, ZHANG Min. Personalized Dialogue Generation Integrating Sentimental Information [J]. Computer Science, 2022, 49(11A): 211100019-6.
[12] SONG Mei-qi, FU Xiang-ling, YAN Chen-wei, WU Wei-qiang, REN Yun. Prediction Model of Enterprise Resilience Based on Bi-directional Long Short-term Memory Network [J]. Computer Science, 2022, 49(11): 197-205.
[13] ZHANG Ren-jie, CHEN Wei, HANG Meng-xin, WU Li-fa. Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder [J]. Computer Science, 2021, 48(7): 62-69.
[14] LIU Meng-yang, WU Li-juan, LIANG Hui, DUAN Xu-lei, LIU Shang-qing, GAO Yi-bo. A Kind of High-precision LSTM-FC Atmospheric Contaminant Concentrations Forecasting Model [J]. Computer Science, 2021, 48(6A): 184-189.
[15] DING Ling, XIANG Yang. Chinese Event Detection with Hierarchical and Multi-granularity Semantic Fusion [J]. Computer Science, 2021, 48(5): 202-208.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!