计算机科学 ›› 2017, Vol. 44 ›› Issue (9): 256-260.doi: 10.11896/j.issn.1002-137X.2017.09.048

• 人工智能 • 上一篇    下一篇

基于Word2vec的句子语义相似度计算研究

李晓,解辉,李立杰   

  1. 安阳师范学院计算机与信息工程学院 安阳455002,清华大学计算机科学与技术系 北京100084,北京理工大学软件学院 北京100081
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金:面向甲骨学知识图谱的实体发现及语义关系挖掘研究(U1504612),河南省高等学校重点科研项目计划:基于语义向量空间模型的中文文本相似度计算研究(16A520037)资助

Research on Sentence Semantic Similarity Calculation Based on Word2vec

LI Xiao, XIE Hui and LI Li-jie   

  • Online:2018-11-13 Published:2018-11-13

摘要: word2vec利用深度学习的思想,可以从大规模的文本数据中自动学习数据的本质信息。因此,借助哈尔滨工业大学的LTP平台,设计利用word2vec模型将对句子的处理简化为向量空间中的向量运算,采用向量空间上的相似度表示句子语义上的相似度。此外,将句子的结构信息添加到句子相似度计算中,并就特殊句式对算法进行了改进,同时考虑到了词汇之间的句法关系。实验结果表明,该方法更准确地揭示了句子之间的语义关系,句法结构的提取和算法的改进解决了复杂句式的相似度计算问题,提高了相似度计算的准确率。

关键词: 句子相似度,word2vec,词向量,语义,句法结构

Abstract: Using the idea of deep learning,word2vec can automatically learn the essential information of data from large-scale text data.Therefore,with the help of LTP platform of Harbin Institute of Technology,based on the word2vec model,the processing of the sentence is simplified as a vector in the vector space algorithm,and the similarity of vector space represents the sentence semantic similarity.In addition,the sentence structure information is added to the sentence similarity calculation,the algorithm are improved on the special sentence pattern,and the syntax relationship between words is taken into account.The experimental results show that this method is more accurately to reveal the semantic relations between sentences,syntactic structure and improved extraction algorithm also solve the problem of computing the similarity of complex sentences,finally improve the accuracy of the similarity calculation.

Key words: Sentence similarity,Word2vec,Distributed representation,Semantic,Syntactic structure

[1] ZHANG D,XU H,SU Z,et al.Chinese comments sentiment classification based on word2vec and SVM perf[J].Expert Systems with Applications,2015,42(4):1857-1863.
[2] ENRQUEZ F,TROYANO J A,LPEZ-SOLAZ T.An ap-proach to the use of word embeddings in an opinion classification task[J].Expert Systems with Applications,2016,66:1-6.
[3] YUAN Y,HE L,PENG L,et al.A new study based on Word-2vec and cluster for document categorization[J].Journal of Computational Information Systems,2014,10(21):9301-9308.
[4] SHAREF N M,MARTIN T,KASMIRAN K A,et al.A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization[J].Soft Computing,2015,19(6):1701-1714.
[5] SONG M,HEO G E,DING Y.SemPathFinder:Semantic pathanalysis for discovering publicly unknown knowledge[J].Journal of Informetrics,2015,9(4):686-703.
[6] O’SHEA K.Natural language scripting within conversationalagent design[J].Applied Intelligence,2014,40(1):189-197.
[7] CHONG C C,LIM T Y,SOON L K,et al.Meaning preservation in Example-based Machine Translation with structural semantics[J].Expert Systems with Applications,2017,78:242-258.
[8] WEI L,LI D M,LIU C C,et al.Study on the construction ofheterogeneous resource ontology based on FCA and Word2vec[J].Information Science,2017(3):69-75.(in Chinese) 韦炼,李端明,刘超超,等.基于FCA和Word2vec的异构资源本体构建研究[J].情报科学,2017(3):69-75.
[9] YAN L,MA R,LI D,et al.RDF approximate queries based on semantic similarity[J].Computing,2017,99(5):481-491.
[10] WEI X C,LIN H F.Transfer learning oriented text feature alignment algorithm[J].Computer Engineering,2017,43(2):215-219.(in Chinese) 魏晓聪,林鸿飞.面向迁移学习的文本特征对齐算法[J].计算机工程,2017,43(2):215-219.
[11] HUANG R,ZHANG W.Study on Sentiment Analyzing of Internet Commodities Review Based on Word2vec[J].Computer Science,2016,43(s1):387-389.(in Chinese) 黄仁,张卫.基于word2vec的互联网商品评论情感倾向研究[J].计算机科学,2016,43(s1):387-389.
[12] WANG M W,XU X F,XU F,et al.Word2vec Based WordAlignment Corpus for the Greater China Region[J].Journal of Chinese Information Processing,2015,29(5):76-83.(in Chinese) 王明文,徐雄飞,徐凡,等.基于word2vec的大中华区词对齐库的构建[J].中文信息学报,2015,29(5):76-83.
[13] ZHANG L,YAN Q,LV X Q.Short Text-oriented Sentiment Refraction Model[J].Journal of the China Society for Scientific and Technical Information,2017,36(2):180-189.(in Chinese) 张乐,闫强,吕学强.面向短文本的情感折射模型[J].情报学报,2017,36(2):180-189.
[14] CHENG C P,WU Z G.A method of sentence similarity computing based on HowNet[J].Computer Engineering & Science,2012,2(2):172-175.(in Chinese) 程传鹏,吴志刚.一种基于知网的句子相似度计算方法[J].计算机工程与科学,2012,2(2):172-175.
[15] HE W,WANG Y.Text representation based on sentence and Chinese text categorization[J].Journal of the China Society for Scientific and Technical Information,2009,8(6):839-843.(in Chinese) 何维,王宇.基于句子的文本表示和中文文本分类研究[J].情报学报,2009,8(6):839-843.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!