Computer Science ›› 2016, Vol. 43 ›› Issue (5): 188-192, 208.doi: 10.11896/j.issn.1002-137X.2016.05.034

Previous Articles     Next Articles

Study on Domain-corpus Driven Calculation Method of Sentence Relevance

LI Feng, HUANG Jin-zhu, LI Zhou-jun and YANG Wei-ming   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Sentence relevance calculation plays a very important role in various fields of NLP,such as public opinion monitoring,information retrieval and statistical machine translation(SMT) etc.This paper,after a clear definition of relationship between similarity and relevance,designed a domain-specific corpus-driven calculation model of sentence relevance.The model applies the linguistic data of the same domain to construct a “sentence-paragraph-article” three-level domanial semantic space.The topic relevance of words can be figured out through calculating different factors of various levels such as co-occurrence probability,co-occurrence average distance and sentence length etc.The paper made comparative experiments between the model and methods based on literal features,HowNet and Tongyici Cilin respectively and the results show that this model has great practical value.

Key words: Sentence relevance,Corpus driven,Topic relevance,Calculation model

[1] Fei Yue,Hong Yi-hong,Yang Jian-wu.Handling Topic Drift for Topic Tracking in Microblogs [M]∥Advances in Information Retrieval:Proceedings of 37th European Conference on IR Research.2015:477-488
[2] Zhou Gang,Zou Hong-cheng,Xiong Xiao-bing,et al.MB-SinglePass:Microblog Topic Detection Based on Combined Similarity[J].Computer Science,2012,39(10):198-202(in Chinese) 周刚,邹鸿程,熊小兵,等.MB-SinglePass:基于组合相似度的微博话题检测[J].计算机科学,2012,9(10):198-202
[3] Dan ?瘙塁,Rajendra B,Vasile R.A Sentence Similarity MethodBased on Chunking and Information Content [M]∥Computational Linguistics and Intelligent Text Processing:Proceedings of 15th International Conference on CICLing 2014.2014:442-453
[4] Chung-Hsien W,Chao-Hong L,Po-Hsun S.Sentence extraction with topic modeling for question-answer pair generation[J].Soft Computing,2015,9(1):39-46
[5] Ercan C,Igor K.Multi-document Summarization via Archetypal Analysis of the Content-graph Joint Model [J].Knowledge and Information Systems,2014,41(3):821-842
[6] Zhou Fang.Study and Application on Chinese Sentence Similarity Computation [D].Zhengzhou:Henan University,2005(in Chinese 周舫.汉语句子相似度计算方法及其应用的研究[D].郑州:河南大学,2005
[7] Li Bin,Liu Ting,Qin Bing,et al.Chinese Sentence Similarity Computing Based on Semantic Dependency Relationship Analysis [J].Applications Research of Computers,2003,20(12):15-17
[8] Liu Qun,Li Su-jian.Calculation of lexical semantic similaritybased on the “HowNet”[C]∥Proceedings of the 3rd Chinese Lexical Semantics Workshop.Taipei,Taiwan,2002:59-76
[9] Tian Jiu-le,Zhao Wei.Words Similarity Algorithm Based onTongyici Cilin in Semantic web Adaptive Learning System [J].Journal of Jilin University(Information Science Edition),2010,28(6):602-608(in Chinese) 田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,8(6):602-608
[10] Mostafa A,Mahmoud S,Ehsan Y.Semantic similarity assessment of words using weighted WordNet [J].International Journal of Machine Learning and Cybernetics,2014,5(3):479-490
[11] Zhang Hui-chang.Chinese Text Similarity Matching Based onDomain Dictionary [D].Qiandao:Shandong University,2014(in Chinese) 张会昌.基于领域词典的中文文本相似度匹配[D].青岛:山东大学,2014
[12] Liu Hong-zhe.Ontology based Sentence Similarity Measurement[J].Computer Science,2013,40(1):251-256(in Chinese) 刘宏哲.一种基于本体的句子相似度计算方法[J].计算机科学,2013,0(1):251-256
[13] Li Su-jian.Research of Relevancy between Sentences Based on Semantic Computation [J].Computer Engineering and Applications,2002(7):75-76(in Chinese) 李素建.基于语义计算的语句相关度研究[J].计算机工程与应用,2002(7):75-76
[14] Zhang Pei-ying.Model for sentence similarity computing based on multi-features combination [J].Computer Engineering and Applications,2010,46(26):136-137(in Chinese) 张培颖.多特征融合的语句相似度计算模型[J].计算机工程与应用,2010,6(26):136-137
[15] Liu Bao-yan,Lin Hong-fei,Zhao Jing.Chinese sentence similarity computing based on improved edit-distance and dependency grammar [J].Computer Applications and Software,2008,25(7):33-34(in Chinese) 刘宝艳,林鸿飞,赵晶.基于改进编辑距离和依存文法的汉语句子相似度计算[J].计算机应用与软件,2008,5(7):33-34
[16] Wang Dong,Wu Jun-hua.Based on LSI and Dictionary Text Semantic Similarity Algorithm [J].Coal Technology,2010,29(12):217-218(in Chinese) 王栋,吴军华.基于LSI和词典的文本语义相似度算法[J].煤炭技术,2010,9(12):217-218
[17] Wang Zhi-qing.Chinese Sentence Similarity Based on Semantic Role Labeling [D].Beijing:Beijing University of Posts and Tele-communications,2014(in Chinese) 王志青.基于语义角色标注的句子相似度计算[D].北京:北京邮电大学,2014
[18] Vu H H,Villaneau J,Sad F,et al.Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams [M]∥Text,Speech and Dialogue:Proceedings of 17th International Conference on TSD 2014.2014:201-208
[19] Jiang Min,Xiao Shi-bin,Wang Hong-wei,et al.An Improved Word Similarity Computing Method Based on HowNet [J].Journal of Chinese Information Processing,2008,22(5):84-89(in Chinese) 江敏,肖诗斌,王弘蔚,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报,2008,2(5):84-89
[20] 新浪军事新闻[EB/OL].(2014-03)[2015-04-05].
[21] NLPIR[EB/OL].(2015-03-27)[2015-04-05].
[22] 新浪国际军情新闻[EB/OL].(2015-04)[2015-04].
[23] 凤凰网新闻频道[EB/OL].(2015) [2015-04].
[24] Xia Tian.Study on Chinese Words Semantic Similarity Computation[J].Computer Engineering,2007,3(6):191-194(in Chinese) 夏天.汉语词语语义相似度计算研究[J].计算机工程,2007,3(6):191-194
[25] Xia Tian.X-Similarity[EB/OL].(2014-08) [2015-04-07].

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .