计算机科学 ›› 2023, Vol. 50 ›› Issue (5): 270-276.doi: 10.11896/jsjkx.220400275

• 人工智能 • 上一篇    下一篇

基于多层感知机和语义矩阵的答案选择模型

罗亮1, 程春玲1, 刘倩1, 归耀城2   

  1. 1 南京邮电大学计算机学院、软件学院、网络空间安全学院 南京 210023
    2 南京邮电大学现代邮政学院 南京 210023
  • 收稿日期:2022-04-27 修回日期:2022-09-10 出版日期:2023-05-15 发布日期:2023-05-06
  • 通讯作者: 程春玲(chengcl@njupt.edu.cn)
  • 作者简介:(1220045114@njupt.edu.cn)
  • 基金资助:
    江苏省双创博士项目(JSSCBS20210507);南京邮电大学引进人才科研启动基金(NY220176)

Answer Selection Model Based on MLP and Semantic Matrix

LUO Liang1, CHENG Chunling1, LIU Qian1, GUI Yaocheng2   

  1. 1 School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 School of Modern Posts,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
  • Received:2022-04-27 Revised:2022-09-10 Online:2023-05-15 Published:2023-05-06
  • About author:LUO Liang,born in 1998,postgraduate.His main research interests include deep learning and natural language processing.
    CHENG Chunling,born in 1972,professor,is a member of China Computer Federation.Her main research interests include data mining and data management
  • Supported by:
    Jiangsu Provincial Double-Innovation Doctor Program(JSSCBS20210507) and NUPTSF(NY220176).

摘要: 答案选择是问答系统领域的关键子任务,其性能表现支撑着问答系统的发展。基于参数冻结的BERT模型生成的动态词向量存在句级语义特征匮乏、问答对词级交互关系缺失等问题。多层感知机具有多种优势,不仅能够实现深度特征挖掘,且计算成本较低。在动态文本向量的基础上,文中提出了一种基于多层感知机和语义矩阵的答案选择模型,多层感知机主要实现文本向量句级语义维度重建,而通过不同的计算方法生成语义矩阵能够挖掘不同的文本特征信息。多层感知机与基于线性模型生成的语义理解矩阵相结合,实现一个语义理解模块,旨在分别挖掘问题句和答案句的句级语义特征;多层感知机与基于双向注意力计算方法生成的语义交互矩阵相结合,实现一个语义交互模块,旨在构建问答对之间的词级交互关系。实验结果表明,所提模型在WikiQA数据集上MAPMRR分别为0.789和0.806,相比基线模型,该模型在性能上有一致的提升,在SelQA数据集上MAPMRR分别为0.903和0.911,也具有较好的性能表现。

关键词: 答案选择, BERT模型, 动态词向量, 多层感知机, 语义矩阵

Abstract: Answer selection is a key sub-task in the field of question answering systems,and its performance supports the deve-lopment of question answering systems.The dynamic word vector generated by the BERT model based on parameter freezing also has problems such as lack of sentence-level semantic features and the lack of word-level interaction between question and answer.Multilayer perceptrons have a variety of advantages,they not only can achieve deep feature mining,but also have low computational costs.On the basis of dynamic text vectors,this paper proposes an answer selection model based on multi-layer perceptrons and semantic matrix,which mainly realizes the semantic dimension reconstruction of text vector sentences,and generates semantic matrix through different calculation methods to mine different text feature information.The multi-layer perceptron is combined with the semantic understanding matrix generated by the linear model to implement a semantic understanding module,which aims to excavate the sentence-level semantic characteristics of the question sentence and the answer sentence respectively; the multi-layer perceptron is combined with the semantic interaction matrix generated based on the two-way attention calculation method to achieve a semantic interaction module,which aims to build the word-level interaction relationship between the question and answer pairs.Experimental results show that the proposed model has a MAP and MRR of 0.789 and 0.806 on the WikiQA dataset,respectively,which has a consistent performance improvement over the baseline model,on the SelQA dataset,MAP and MRR is 0.903 and 0.911,respectively,which also has a good performance.

Key words: Answer selection, BERT model, Dynamic word vector, Multilayer perceptron, Semantic matrix

中图分类号: 

  • TP391.1
[1]TAN M,DOS SANTOS C,XIANG B,et al.Improved representation learning for question answer matching[C]//Proceedings of the 54th Annual Meeting of the Associationfor Computational Linguistics.2016:464-473.
[2]MIN S,ZHONG V,SOCHER R,et al.Efficient and robustquestion answering from minimal context over documents[J].arXiv:1805.08092,2018.
[3]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543.
[4]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[5]LIU R H,YE X,YUE Z Y.Review of pre-trained models fornatural language processing tasks[J].Journal of Computer Applications,2121,41(5):1236-1246.
[6]QIU X,SUN T,XU Y,et al.Pre-trained models for natural language processing:A survey[J].Science China Technological Sciences,2020,63(10):1872-1897.
[7]TAN M,DOS SANTOS C,XIANG B,et al.Improved representation learning for question answer matching[C]//Proceedings of the 54th Annual Meeting of the Association for Computa-tional Linguistics.2016:464-473.
[8]CHEN Q,HU Q,HUANG J X,et al.Enhancing recurrent neu-ral networks with positional attention for question answering[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.2017:993-996.
[9]HUANG J.A Multi-Size Neural Network with Attention Mecha-nism for Answer Selection[J].arXiv:2105.03278,2021.
[10]WANG S,JIANG J.A compare-aggregate model for matchingtext sequences[J].arXiv:1611.01747,2016.
[11]BIAN W,LI S,YANG Z,et al.A compare-aggregate model with dynamic-clip attention for answer selection[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:1987-1990.
[12]YOON S,DERNONCOURT F,KIM D S,et al.A compare-aggregate model with latent clustering for answer selection[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.2019:2093-2096.
[13]LI Z C,TURDI T,ASKAR H.Answer selection model based on dynamic attention and multi-perspective matching[J].Journal of Computer Applications,2021,41(11):3156-3163.
[14]PETERS M E,NEUMANN M,IYYER M,et al.Deep contex-tualized word representations[J].arXiv:1802.05365,2018.
[15]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[16]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J/OL].[2022-06-19].http://cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
[17]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-trainingof deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[18]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907.11692,2019.
[19]LASKAR M T R,HUANG X,HOQUE E.Contextualized embeddings based transformer encoder for sentence similarity mo-deling in answer selection task[C]//Proceedings of The 12th Language Resources and Evaluation Conference.2020:5505-5514.
[20]CHEN Q,HU Q,HUANG J X,et al.Can:Enhancing sentence similarity modeling with collaborative and adversarial network[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:815-824.
[21]LI W,WU Y.Exploiting WordNet Synset and Hypernym Representations for Answer Selection[C]//Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing.2020:106-115.
[22]TOLSTIKHIN I O,HOULSBY N,KOLESNIKOV A,et al.Mlp-mixer:An all-mlp architecture for vision[J].Advances in Neural Information Processing Systems,2021,34:24261-24272.
[23]LIU H,DAI Z,SO D,et al.Pay attention to MLPs[J].Advances in Neural Information Processing Systems,2021,34:9204-9215.
[24]SANTOS C,TAN M,XIANG B,et al.Attentive pooling net-works[J].arXiv:1602.03609,2016.
[25]SHA L,ZHANG X,QIAN F,et al.A multi-view fusion neural network for answer selection[C]//Thirty-second AAAI Confe-rence on Artificial Intelligence.2018.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!