计算机科学 ›› 2015, Vol. 42 ›› Issue (8): 7-12.

• 目次 • 上一篇    下一篇

术语自动抽取方法研究综述

袁劲松,张小明,李舟军   

  1. 北京航空航天大学 北京100191,北京航空航天大学 北京100191,北京航空航天大学 北京100191
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61170189,6,61202239),教育部博士点基金(20111102130003)资助

Survey of Automatic Terminology Extraction Methodologies

YUAN Jin-song, ZHANG Xiao-ming and LI Zhou-jun   

  • Online:2018-11-14 Published:2018-11-14

摘要: 术语抽取是文本处理领域的一项基础性研究工作,好的术语自动抽取方法能够提高本体构建的质量和语义检索的精度。首先,对术语的定义、特性以及术语抽取效果的评价方法进行了概述。然后,在分析和总结近20年术语自动抽取相关文献的基础上,对术语自动抽取的各种方法进行了详细的综述。介绍了这些方法的研究进展,分析了其优缺点,并详细描述了部分经典算法。最后,对术语自动抽取未来研究的趋势进行了展望。

关键词: 术语抽取,文本处理,评价方法,自动抽取方法

Abstract: Terminology extraction is a fundamental research work for text processing domain.The quality of ontology and accuracy of sematic retrieval can be improved by using a better automatic terminology extraction method.Firstly,the definition and characteristic of terminology,as well as the evaluation of terminology extraction were briefly introduced.Secondly,through a thorough analysis and summarization of literatures about automatic terminology extraction in recent twenty years,a comprehensive survey of state-of-the-art automatic terminology extraction methodologies was conducted,which includes domestic and international current research,their advantages and disadvantages and detailed descriptions of some classical algorithms.Finally,the trend of future study was discussed.

Key words: Terminology extraction,Text processing,Evaluation measures,Automatic extraction methodologies

[1] Brewster C A,Iria J,Zhang Z,et al.Dynamic Iterative Ontology Learning[C]∥Recent Advances in Natural Language Processing(RANLP’07).2007
[2] Wolf P,Bernardi U,Federmann C,et al.From Statistical Term Extraction to Hybrid Machine Translation[C]∥15th International Conference of the European Association for Machine Translation.2011:225
[3] Liang Y H,Li J,Ye L,et al.The Chinese Unknown TermTranslation Mining with Supervised Candidate Term Extraction Strategy[J].Procedia Engineering,2011,15:1388-1392
[4] Pavlopoulos J,Androutsopoulos I.Aspect Term Extraction for Sentiment Analysis:New Datasets,New Evaluation Measures and an Improved Unsupervised Method[C]∥Proceedings of the 5th Workshop on Language Analysis for Social Media(LASM).2014:44-52
[5] Bhagdev R,Butters J,Chakravarthy A,et al.Doris:Managing Document-based Knowledge in Large Organisations via Semantic Web Technologies[C]∥Semantic Web Challenge.2007
[6] Kozakov L,Park Y,Fin T,et al.Glossary extraction and utiliza-tion in the information search and delivery system for IBM Technical Support[J].IBM Systems Journal,2004,43(3):546-563
[7] Sager J C,Dungworth D,McDonald P F.English special languages:principles and practice in science and technology[M].Wiesbaden:Brandstetter,1980
[8] 冯志伟.现代术语学引论[M].北京:语文出版社,1997 Feng Zhi-wei.An introduction to modern terminology[M].Beijing:Language and Literature Press,1997
[9] 术语工作原则与方法:GB/T 10112-1999[S].北京:中国标准出版社,2000 Terminology work principles and methods:GB/T 10112-1999[S].Beijing:Standards Press of China,2000
[10] Kageura K,Umino B.Methods of automatic term recognition:A review[J].Terminology,1996,3(2):259-289
[11] Vivaldi J,Rodríguez H.Evaluation of terms and term extraction systems:A practical approach[J].Terminology,2007,13(2):225-248
[12] Zheng Y,Dou W,Wu G,et al.Automated Chinese domain onto-logy construction from text documents[M]∥Bio-Inspired Computational Intelligence and Applications.Springer Berlin Heidelberg,2007:639-648
[13] Korkontzelos I,Klapaftis I P,Manandhar S.Reviewing and eva-luating automatic term recognition techniques[M]∥Advances in Natural Language Processing.Springer Berlin Heidelberg,2008:248-259
[14] Castellví M T C,Bagot R E,Palatresi J V.Automatic term detection:A review of current systems[M]∥Recent advances in computational terminology.2001:53-88
[15] NLPIR汉语分词系统[EB/OL].http://ictclas.nlpir.org,2014 NLPIR Chinese segmentation system[EB/OL].http://ictclas.nlpir.org,2014
[16] Eddy S R.Hidden markov models[J].Current opinion in structural biology,1996,6(3):361-365
[17] Lafferty J,McCallum A,Pereira F C N.Conditional randomfields:Probabilistic models for segmenting and labeling sequence data[C].2001
[18] Agarwal M,Goutam R,Jain A,et al.Comparative Analysis ofthe Performance of CRF,HMM and MaxEnt for Part-of-Speech Tagging,Chunking and Named Entity Recognition for a Morphologically rich language[C]∥Proceedings of the Pacific Association For Computational Lingustics(PACLING2011).2011
[19] Zheng D,Zhao T,Yang J.Research on domain term extraction based on conditional random fields[M]∥Computer Processing of Oriental Languages.Language Technology for the Know-ledge-based Economy:ICCPOL.Springer Berlin Heidelberg,2009:290-296
[20] Li L S,Dang Y Z,Zhang J,et al.Domain Term Extraction Based on Conditional Random Fields Combined with Active Learning Strategy[J].Journal of Information & Computational Science,2012,9(7):1931-1940
[21] Voutilainen A.NPtool,a detector of English noun phrases[C]∥Proceedings of the Workshop on Very Large Corpora Columbus.Ohio:Ohio State University,June 1993
[22] Park Y,Byrd R J,Boguraev B K.Automatic glossary extraction:beyond terminology identification[C]∥Proceedings of the 19th international conference on Computational linguistics-Volume 1.Association for Computational Linguistics,2002:1-7
[23] Evans D A,Lefferts R G.Clarit-trec experiments[J].Information processing & management,1995,31(3):385-395
[24] Bolshakova E,Loukachevitch N,Nokel M.Topic models can improve domain term extraction[M]∥Advances in Information Retrieval.Springer Berlin Heidelberg,2013:684-687
[25] Velardi P,Missikoff M,Basili R.Identification of relevant terms to support the construction of domain ontologies[C]∥Procee-dings of the workshop on Human Language Technology and Knowledge Management-Volume 2001.Association for Computational Linguistics,2001:5
[26] Daille B.Study and implementation of combined techniques for automatic extraction of terminology[M]∥The balancing act:Combining symbolic and statistical approaches to language.MIT Press,Cambridge, 1996,1:49-66
[27] Gelbukh A,Sidorov G,Lavin-Villa E,et al.Automatic term extraction using log-likelihood based comparison with general re-ference corpus[M]∥Natural Language Processing and Information Systems.Springer Berlin Heidelberg,2010:248-255
[28] Cohen J D.Highlights:Language-and domain-independent automatic indexing terms for abstracting[J].Journal of the American Society for Information Science,1995,46(3):162-174
[29] Drouin P.Term extraction using non-technical corpora as a point of leverage[J].Terminology,2003,9(1):99-115
[30] Dunning T.Accurate methods for the statistics of surprise and coincidence[J].Computational linguistics,1993,19(1):61-74
[31] Frantzi K,Ananiadou S.The C-value/NC-value domain inde-pendent method for multi-word term extraction[J].Journal of Natural Language Processing,1999,6(3):20-27
[32] Piao S,Forth J,Gacitua R,et al.Evaluating tools for automatic concept extraction:A case study from the musicology domain[C]∥Proceedings of Digital Futures.2010
[33] Ventura J A L,Jonquet C,Roche M,et al.Combining C-value and Keyword Extraction Methods for Biomedical Terms Extraction[C]∥International Symposium on Languages in Biology and Medicine(LBM’2013).2013:45-49
[34] 周浪,张亮,冯冲,等.基于词频分布变化统计的术语抽取方法[J].计算机科学,2009,36(5):177-180 Zhou Lang,Zhang Liang,Feng Chong,et al.Terminology Extraction Based on Statistical Word Frequency Distribution Variety[J].Computer Science,2009,36(5):177-180
[35] Eddy S R.Hidden markov models[J].Current opinion in structural biology,1996,6(3):361-365
[36] Wikipedia.Hidden_Markov_model[EB/OL].http://en.wikipedia.org/wiki/Hidden_Markov_model,2014
[37] Wikipedia.Conditional_random_field[EB/OL].http://en.wikipedia.org/wiki/Conditional_random_field,2014
[38] 李丽双,党延忠,张婧,等.基于条件随机场的汽车领域术语抽取[J].大连理工大学学报,2013,53(2):267-272 Li Li-shuang,Dang Yan-zhong,Zhang Jing,et al.Automotive Term Extraction Based on Conditional Random Fields[J].Joural of Dalian University of Technology,2013,53(2):267-272
[39] da Silva Conrado M,Pardo T,Rezende S O.A Machine Learning Approach to Automatic Term Extraction using a Rich Feature Set[C]∥HLT-NAACL.2013:16-23
[40] 吴云芳,穗志方,邱利坤,等.信息科学与技术领域术语部件描述[J].语言文字应用,2003(4):34-39 Wu Yun-fang,Sui Zhi-fang,Qiu Li-kun,et al.The Approaches and Strategies to Describe the Term Component in Information Science and Technology[J].Applied Linguistics,2003(4):34-39
[41] 汤青,吕学强,李卓,等.领域本体术语抽取研究[J].现代图书情报技术,2014(1):43-50 Tang Qing,Lv Xue-qiang,Li Zhuo,et al.Research on Domain Ontology Term Extraction[J].New Technology of Library and Information Service,2014(1):43-50

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!