计算机科学 ›› 2013, Vol. 40 ›› Issue (5): 168-172.

• 软件与数据库技术 • 上一篇    下一篇

一种适用于复合术语的本体概念学习方法

李江华,时鹏,胡长军   

  1. 北京科技大学国家材料服役安全科学中心 北京100083;北京科技大学国家材料服役安全科学中心 北京100083;北京科技大学计算机与通信工程学院 北京100083
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家“十二五”科技支撑计划项目(2011BAK08B04),中央高校基本科研业务费专项资金资助

Ontology Concept Learning Method for Compound Terms

LI Jiang-hua,SHI Peng and HU Chang-jun   

  • Online:2018-11-16 Published:2018-11-16

摘要: 术语的提取显然在本体概念学习中起着重要作用,由于汉语文本中词与词之间没有明显的界限,使得领域术语特别是复合术语的提取尤为困难。针对传统提取方法缺乏语义支持、计算量大、准确率低等不足,提出了一种适用于复合术语提取的本体概念学习方法。首先利用自然语言处理技术过滤掉与术语无关的成分,对语句进行自然切割,为领域术语提取提供完整的候选数据集,以保证候选领域复合术语不被误分。在此基础上,根据术语的领域统计和分布特征,利用术语频率和信息熵进行多策略的领域术语筛选,经同义术语识别与合并,获得领域概念集。经实验验证,提出的方法能够以较高的准确率从领域文本中提取出领域单词术语和复合术语。

关键词: 术语提取,术语筛选,复合术语,本体概念学习

Abstract: Term extraction plays an important role in ontology concept learning based on text.Because of no clear boundary among words in Chinese text,domain terms,especially compound terms,are difficult to be extracted.Traditional term extraction methods usually need large amount of calculation and lack of semantic supporting.A novel ontologyconcept learning method for compound terms was presented in this paper.At first,natural language processing technology is utilized to remove the irrelevant parts to get candidate terms.Sentences in the text are cut by punctuation marks and removed parts,so that the candidate compound terms can be reserved from wrong cutting.The candidate domain-specific terms are filtered by term frequency and information entropy with multi-strategy,according to the characteristics of distribution and statistics of terms.Then domain-specific concept set is obtained after the synonymous terms recog-nition.Experimental results show that the method can extract domain-specific word terms and compound terms with higher precision.

Key words: Term extraction,Term filtering,Compound terms,Ontology concept learning

[1] Borst W N.Construction of Engineering Ontologies for Knowled-ge Sharing and Reuse[D].University of Twente,Enschede,1997
[2] Gomez P A,Macho M D.An over view of methods and tools for ontology learning from texts[J].The Knowledge Engineering Review,2004,3(19):187-212
[3] Maedche A.Ontology Learning for the Semantic Web [M].Boston:Kluwer Academic Publishers,2002
[4] Frantzi K T,Ananiadou S.The C-Value/ NC-Value Domain Independent Method for Multi-Word Term Extraction[J].Journal of Natural Language Processing,1999,6(3):145-179
[5] Shamsfard M,Barforoush A A.Learning ontologies from natural language texts[J].Int’l Journal Human-Computer Studies,2004,60(1):17-63
[6] Navigli R,Velardi P,Gangemi A.Ontology learning and its application to automated terminology translation[J].IEEE Intelligent Systems,2003,18(1):22-31
[7] Maedche A,Staab S.Discovering Conceptual Relations FromText[C]∥Proc.European Conf.Artificial Intelligence(ECAI-00).2000,1:321-325
[8] 陈文亮,朱靖波,姚天顺.基于BootstrapPing的领域词汇自动获取[C]∥第7届全国计算语言学联合学术会议论文集.哈尔滨,2003:67-72
[9] 张锋,许云,侯艳.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,2(5):72-77
[10] 杜波,田怀凤,王立.基于多策略的专业领域术语抽取器的设计[J].计算机工程,2005,1(14):159-160
[11] 程勇.基于本体的不确定性知识管理研究[D].北京:中国科学院计算研究所,2005
[12] 刘柏嵩.基于Web的通用本体学习研究[D].杭州:浙江大学,2007
[13] 何婷婷,张勇.基于质子串分解的中文术语自动抽取[J].计算机工程,2006,2(23):188-190
[14] 张春霞.领域文本知识获取方法研究及其在考古领域中的应用[D].北京:中国科学院计算研究所,2005
[15] 于娟,党延忠.结合词性分析与串频统计的词语提取方法[J].系统工程理论与实践,2010,0(1):105-111
[16] 赵军,黄昌宁.汉语基木名词短语结构分析模型[J].计算机学报,1999,2(2):141-146
[17] 董强,郝长伶,董振东.基于《知网》的中文信息结构抽取[EB/OL].http://www.keenage.com/html/c_index.html,2010
[18] 刘桃,刘秉权,徐志明,等.领域术语自动抽取及其在文本分类中的应用[J].电子学报,2007,5(2):328-332
[19] 田久乐,赵蔚.基于同义词词林的词语相似度计算方法 [J].吉林大学学报,2010,8(6):602-608
[20] 董振东,董强.知网导论[EB/OL].http://www.keenage.com/ html/c_index.html,2010
[21] 张玉芳,杨芬,熊忠阳.基于上下文的领域本体概念和关系的提取[J].计算机应用研究,2010,7(1):74-76

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!