计算机科学 ›› 2017, Vol. 44 ›› Issue (5): 280-284.doi: 10.11896/j.issn.1002-137X.2017.05.051

• 人工智能 • 上一篇    下一篇

基于汉语复句的语义相关度计算及类别的标识

杨进才,陈忠忠,沈显君,胡金柱   

  1. 华中师范大学计算机学院 武汉430079,华中师范大学计算机学院 武汉430079,华中师范大学计算机学院 武汉430079,华中师范大学计算机学院 武汉430079
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家社科基金(14BYY093)资助

Word Semantic Relevancy Computation and Categories Identification Based on Chinese Compound Sentences

YANG Jin-cai, CHEN Zhong-zhong, SHEN Xian-jun and HU Jin-zhu   

  • Online:2018-11-13 Published:2018-11-13

摘要: 语义相关度计算作为中文信息处理领域中的一项关键技术,在信息检索、语义消岐、文本分类中起着重要的作用。利用汉语复句的句法理论和关系标记搭配理论,以汉语复句语料库以及搜索引擎获取的复句为语料,提出了一种基于汉语复句的语义相关度计算方法——SRCCS。本方法不仅能够计算词语的相关度,而且能够表明相关的性质与类别。与通过短文计算相关度的方法相比,本方法选取的计算对象范围更小,因而结果更准确,计算复杂度更低。在同一测试集上与搜索引擎方法的对比分析证明了基于汉语复句的语义相关度计算方法的有效性与优越性。

关键词: 复句,语义相关度,关系标记,关系类别

Abstract: As a critical technique in the field of Chinese information processing,word semantic relevancy computation plays an important role in information retrieval,ambiguity elimination,and text processing.Using syntactic theory and the collocation theory of the relation markers of Chinese compound sentences,as well as making the corpus of Chinese compound sentences and some compound sentences from search engine as the data resource,a semantic relevancy computation method was proposed based on Chinese compound sentence (SRCCS).This method can not only compute the word semantic relevancy,but also show the property and category of the word semantic relevancy.Compared with the method of short text semantic relevancy,this method chooses a smaller scope of evaluation objects,so the results are more accurate and have little computational complexity.Compared with the result by Google Distance,the new measure is more reliable and effective.

Key words: Complex sentences,Semantic relevancy,Relations marker,Relations category

[1] KJOS-HANSSEN B,Evangelista A J.Google distance between words.http:/math.hawaii.edu/~bjoern/Publications/Evangelista_Kjos-Hanssen.pdf.
[2] 姚双云.复句关系标记的搭配研究[M].武汉:华中师范大学出版社,2008.
[3] YOU B.Measuring Semantic Relatedness between Words[D].Wuhan:Central China Normal University Press,2013.(in Chinese) 游博.词语语义相关度计算研究[D].武汉:华中师范大学,2013.
[4] XU Y,FAN X Z,ZHANG F.Semantic Relevancy Computing Based on Hownet[J].Transactions of Beijing Institute of Technology,2005,5(5):411-414.(in Chinese) 许云,樊孝忠,张锋.基于知网的语义相关度计算[J].北京理工大学学报,2005,5(5):411-414.
[5] WANG H L,LV Q,XU R.Computation model of Chinese semantic relevancy based on HowNet[C]∥The National Acade-mic Conference on Information Retrieval and Information Content Security.2007.(in Chinese) 王红玲,吕强,徐瑞.一种基于知网的中文语义相关度计算模型[C]∥全国信息检索与内容安全学术会议.2007.
[6] WANG J H,ZUO W L,YAN Z.Word Semantic Similarity Mea-surement Based on Naive Bayes Model[J].Journal of Computer Research and Development,2015,2(7):1499-1509.(in Chinese) 王俊华,左万利,闫昭.基于朴素贝叶斯模型的单词语义相似度度量[J].计算机研究与发展,2015,2(7):1499-1509.
[7] AOUICHA M B,TAIEB M A H,HAMADOU A B.Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness[J].Applied Intelligence,2016,5(2):1-37.
[8] LI W,YANG C,FU X.Combining How Net and Extension StrategyGeneration Method to Improve Customer Values[J].Procedia Computer Science,2015,55:451-460.
[9] XIANG C C,SUI Z F,ZHAN W D.On Mapping between HowNet and CCD[J].Journal of Chinese Information Processing,2015,9(3):44-51.(in Chinese) 向春丞,穗志方,詹卫东.HowNet与CCD映射方法研究[J].中文信息学报,2015,9(3):44-51.
[10] KIMTANI D K,CHOUDHURY J,C HAKRABARTY A.Improvement in Word Sense Disambiguation by introducing enhancements in English WordNetStructure[J].International Journal on Computer Science & Engineering,2012,4(7):1366-1370.
[11] XIAO S,HU J Z,YAO S Y,et al.Objectorient ontology modeling for tag complex sentence[J].Application Research of Computer,2010,27(2):552-554.(in Chinese) 肖升,胡金柱,姚双云,等.面向对象有标复句本体建模[J].计算机应用研究,2010,27(2):552-554.
[12] WANG Z H,WANG L Y,DANG H,et al.Web ClusteringBased on Hybrid Probabilistic Latent Semantic Analysis Model[J].Journal of Computer Applications,2012,2(11):3018-3022.(in Chinese) 王治和,王凌云,党辉,等.基于混合概率潜在语义分析模型的Web聚类[J].计算机应用,2012,2(11):3018-3022.
[13] STRUBE B M,PONZETTO S P.WikiRelate! Computing semantic relatedness using Wikipedia[C]∥Proc.of AAAI-06.2015:1419-1424.
[14] WAN F Q,WU Y F.Computing Lexical Semantic relevancywith Chinese Wikipedia[J].Journal of Chinese Information Processing,2013,7(6):31-37,9.(in Chinese) 万富强,吴云芳.基于中文维基百科的词语语义相关度计算[J].中文信息学报,2013,7(6):31-37,9.
[15] 邢福义.汉语复句研究[M].北京:商务印书馆,2001.
[16] CRISTIANINI N,SHAWE-TAYLOR J,L ODHI H.Latent semantic kernels[J].Journal of Intelligent Information Systems,2002,18(2/3):127-152.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!