Computer Science ›› 2017, Vol. 44 ›› Issue (9): 256-260.doi: 10.11896/j.issn.1002-137X.2017.09.048

Previous Articles     Next Articles

Research on Sentence Semantic Similarity Calculation Based on Word2vec

LI Xiao, XIE Hui and LI Li-jie   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Using the idea of deep learning,word2vec can automatically learn the essential information of data from large-scale text data.Therefore,with the help of LTP platform of Harbin Institute of Technology,based on the word2vec model,the processing of the sentence is simplified as a vector in the vector space algorithm,and the similarity of vector space represents the sentence semantic similarity.In addition,the sentence structure information is added to the sentence similarity calculation,the algorithm are improved on the special sentence pattern,and the syntax relationship between words is taken into account.The experimental results show that this method is more accurately to reveal the semantic relations between sentences,syntactic structure and improved extraction algorithm also solve the problem of computing the similarity of complex sentences,finally improve the accuracy of the similarity calculation.

Key words: Sentence similarity,Word2vec,Distributed representation,Semantic,Syntactic structure

[1] ZHANG D,XU H,SU Z,et al.Chinese comments sentiment classification based on word2vec and SVM perf[J].Expert Systems with Applications,2015,42(4):1857-1863.
[2] ENRQUEZ F,TROYANO J A,LPEZ-SOLAZ T.An ap-proach to the use of word embeddings in an opinion classification task[J].Expert Systems with Applications,2016,66:1-6.
[3] YUAN Y,HE L,PENG L,et al.A new study based on Word-2vec and cluster for document categorization[J].Journal of Computational Information Systems,2014,10(21):9301-9308.
[4] SHAREF N M,MARTIN T,KASMIRAN K A,et al.A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization[J].Soft Computing,2015,19(6):1701-1714.
[5] SONG M,HEO G E,DING Y.SemPathFinder:Semantic pathanalysis for discovering publicly unknown knowledge[J].Journal of Informetrics,2015,9(4):686-703.
[6] O’SHEA K.Natural language scripting within conversationalagent design[J].Applied Intelligence,2014,40(1):189-197.
[7] CHONG C C,LIM T Y,SOON L K,et al.Meaning preservation in Example-based Machine Translation with structural semantics[J].Expert Systems with Applications,2017,78:242-258.
[8] WEI L,LI D M,LIU C C,et al.Study on the construction ofheterogeneous resource ontology based on FCA and Word2vec[J].Information Science,2017(3):69-75.(in Chinese) 韦炼,李端明,刘超超,等.基于FCA和Word2vec的异构资源本体构建研究[J].情报科学,2017(3):69-75.
[9] YAN L,MA R,LI D,et al.RDF approximate queries based on semantic similarity[J].Computing,2017,99(5):481-491.
[10] WEI X C,LIN H F.Transfer learning oriented text feature alignment algorithm[J].Computer Engineering,2017,43(2):215-219.(in Chinese) 魏晓聪,林鸿飞.面向迁移学习的文本特征对齐算法[J].计算机工程,2017,43(2):215-219.
[11] HUANG R,ZHANG W.Study on Sentiment Analyzing of Internet Commodities Review Based on Word2vec[J].Computer Science,2016,43(s1):387-389.(in Chinese) 黄仁,张卫.基于word2vec的互联网商品评论情感倾向研究[J].计算机科学,2016,43(s1):387-389.
[12] WANG M W,XU X F,XU F,et al.Word2vec Based WordAlignment Corpus for the Greater China Region[J].Journal of Chinese Information Processing,2015,29(5):76-83.(in Chinese) 王明文,徐雄飞,徐凡,等.基于word2vec的大中华区词对齐库的构建[J].中文信息学报,2015,29(5):76-83.
[13] ZHANG L,YAN Q,LV X Q.Short Text-oriented Sentiment Refraction Model[J].Journal of the China Society for Scientific and Technical Information,2017,36(2):180-189.(in Chinese) 张乐,闫强,吕学强.面向短文本的情感折射模型[J].情报学报,2017,36(2):180-189.
[14] CHENG C P,WU Z G.A method of sentence similarity computing based on HowNet[J].Computer Engineering & Science,2012,2(2):172-175.(in Chinese) 程传鹏,吴志刚.一种基于知网的句子相似度计算方法[J].计算机工程与科学,2012,2(2):172-175.
[15] HE W,WANG Y.Text representation based on sentence and Chinese text categorization[J].Journal of the China Society for Scientific and Technical Information,2009,8(6):839-843.(in Chinese) 何维,王宇.基于句子的文本表示和中文文本分类研究[J].情报学报,2009,8(6):839-843.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!