计算机科学 ›› 2016, Vol. 43 ›› Issue (5): 188-192.doi: 10.11896/j.issn.1002-137X.2016.05.034

• 人工智能 • 上一篇    下一篇

一种领域语料驱动的句子相关性计算方法研究

李峰,黄金柱,李舟军,杨伟铭   

  1. 北京航空航天大学软件开发环境国家重点实验室 北京100191;中国人民解放军后勤科学研究所 北京100166,中国人民解放军外国语学院语言工程系 洛阳471003,北京航空航天大学软件开发环境国家重点实验室 北京100191,中国人民解放军后勤科学研究所 北京100166
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目(61170189,61370126),高等学校博士学科点专项科研基金(20111102130003),软件开发环境国家重点实验室自选课题(SKLSDE-2013ZX-19)资助

Study on Domain-corpus Driven Calculation Method of Sentence Relevance

LI Feng, HUANG Jin-zhu, LI Zhou-jun and YANG Wei-ming   

  • Online:2018-12-01 Published:2018-12-01

摘要: 句子相关性计算在自然语言处理的多个实践应用中均具有十分重要的作用,如舆情监测、信息检索、统计机器翻译等。在明确相似性与相关性之间的关系之后,设计了一种基于领域语料驱动的句子相关性计算方法,该方法基于同一领域的语料构建一个“句-段-篇”3层的领域语义空间,通过度量词语在各个层级间的共现概率、共现平均距离和句长等因子来测量词间的主题相关性。与基于字面特征、HowNet和同义词词林的方法进行了实验对比,结果表明该方法具有较好的实践应用价值。

关键词: 句子相关度,语料驱动,主题相关性,计算模型

Abstract: Sentence relevance calculation plays a very important role in various fields of NLP,such as public opinion monitoring,information retrieval and statistical machine translation(SMT) etc.This paper,after a clear definition of relationship between similarity and relevance,designed a domain-specific corpus-driven calculation model of sentence relevance.The model applies the linguistic data of the same domain to construct a “sentence-paragraph-article” three-level domanial semantic space.The topic relevance of words can be figured out through calculating different factors of various levels such as co-occurrence probability,co-occurrence average distance and sentence length etc.The paper made comparative experiments between the model and methods based on literal features,HowNet and Tongyici Cilin respectively and the results show that this model has great practical value.

Key words: Sentence relevance,Corpus driven,Topic relevance,Calculation model

[1] Fei Yue,Hong Yi-hong,Yang Jian-wu.Handling Topic Drift for Topic Tracking in Microblogs [M]∥Advances in Information Retrieval:Proceedings of 37th European Conference on IR Research.2015:477-488
[2] Zhou Gang,Zou Hong-cheng,Xiong Xiao-bing,et al.MB-SinglePass:Microblog Topic Detection Based on Combined Similarity[J].Computer Science,2012,39(10):198-202(in Chinese) 周刚,邹鸿程,熊小兵,等.MB-SinglePass:基于组合相似度的微博话题检测[J].计算机科学,2012,9(10):198-202
[3] Dan ?瘙塁,Rajendra B,Vasile R.A Sentence Similarity MethodBased on Chunking and Information Content [M]∥Computational Linguistics and Intelligent Text Processing:Proceedings of 15th International Conference on CICLing 2014.2014:442-453
[4] Chung-Hsien W,Chao-Hong L,Po-Hsun S.Sentence extraction with topic modeling for question-answer pair generation[J].Soft Computing,2015,9(1):39-46
[5] Ercan C,Igor K.Multi-document Summarization via Archetypal Analysis of the Content-graph Joint Model [J].Knowledge and Information Systems,2014,41(3):821-842
[6] Zhou Fang.Study and Application on Chinese Sentence Similarity Computation [D].Zhengzhou:Henan University,2005(in Chinese 周舫.汉语句子相似度计算方法及其应用的研究[D].郑州:河南大学,2005
[7] Li Bin,Liu Ting,Qin Bing,et al.Chinese Sentence Similarity Computing Based on Semantic Dependency Relationship Analysis [J].Applications Research of Computers,2003,20(12):15-17
[8] Liu Qun,Li Su-jian.Calculation of lexical semantic similaritybased on the “HowNet”[C]∥Proceedings of the 3rd Chinese Lexical Semantics Workshop.Taipei,Taiwan,2002:59-76
[9] Tian Jiu-le,Zhao Wei.Words Similarity Algorithm Based onTongyici Cilin in Semantic web Adaptive Learning System [J].Journal of Jilin University(Information Science Edition),2010,28(6):602-608(in Chinese) 田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,8(6):602-608
[10] Mostafa A,Mahmoud S,Ehsan Y.Semantic similarity assessment of words using weighted WordNet [J].International Journal of Machine Learning and Cybernetics,2014,5(3):479-490
[11] Zhang Hui-chang.Chinese Text Similarity Matching Based onDomain Dictionary [D].Qiandao:Shandong University,2014(in Chinese) 张会昌.基于领域词典的中文文本相似度匹配[D].青岛:山东大学,2014
[12] Liu Hong-zhe.Ontology based Sentence Similarity Measurement[J].Computer Science,2013,40(1):251-256(in Chinese) 刘宏哲.一种基于本体的句子相似度计算方法[J].计算机科学,2013,0(1):251-256
[13] Li Su-jian.Research of Relevancy between Sentences Based on Semantic Computation [J].Computer Engineering and Applications,2002(7):75-76(in Chinese) 李素建.基于语义计算的语句相关度研究[J].计算机工程与应用,2002(7):75-76
[14] Zhang Pei-ying.Model for sentence similarity computing based on multi-features combination [J].Computer Engineering and Applications,2010,46(26):136-137(in Chinese) 张培颖.多特征融合的语句相似度计算模型[J].计算机工程与应用,2010,6(26):136-137
[15] Liu Bao-yan,Lin Hong-fei,Zhao Jing.Chinese sentence similarity computing based on improved edit-distance and dependency grammar [J].Computer Applications and Software,2008,25(7):33-34(in Chinese) 刘宝艳,林鸿飞,赵晶.基于改进编辑距离和依存文法的汉语句子相似度计算[J].计算机应用与软件,2008,5(7):33-34
[16] Wang Dong,Wu Jun-hua.Based on LSI and Dictionary Text Semantic Similarity Algorithm [J].Coal Technology,2010,29(12):217-218(in Chinese) 王栋,吴军华.基于LSI和词典的文本语义相似度算法[J].煤炭技术,2010,9(12):217-218
[17] Wang Zhi-qing.Chinese Sentence Similarity Based on Semantic Role Labeling [D].Beijing:Beijing University of Posts and Tele-communications,2014(in Chinese) 王志青.基于语义角色标注的句子相似度计算[D].北京:北京邮电大学,2014
[18] Vu H H,Villaneau J,Sad F,et al.Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams [M]∥Text,Speech and Dialogue:Proceedings of 17th International Conference on TSD 2014.2014:201-208
[19] Jiang Min,Xiao Shi-bin,Wang Hong-wei,et al.An Improved Word Similarity Computing Method Based on HowNet [J].Journal of Chinese Information Processing,2008,22(5):84-89(in Chinese) 江敏,肖诗斌,王弘蔚,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报,2008,2(5):84-89
[20] 新浪军事新闻[EB/OL].(2014-03)[2015-04-05].http://mil.news.sina.com.cn
[21] NLPIR[EB/OL].(2015-03-27)[2015-04-05].http://www.nl-pir.org
[22] 新浪国际军情新闻[EB/OL].(2015-04)[2015-04].http://roll.mil.news.sina.com.cn/col/gjjq/index.shtml
[23] 凤凰网新闻频道[EB/OL].(2015) [2015-04].http://news.ifeng.com
[24] Xia Tian.Study on Chinese Words Semantic Similarity Computation[J].Computer Engineering,2007,3(6):191-194(in Chinese) 夏天.汉语词语语义相似度计算研究[J].计算机工程,2007,3(6):191-194
[25] Xia Tian.X-Similarity[EB/OL].(2014-08) [2015-04-07].http://code.google.com/p/xsimilarity

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!