摘要: 术语抽取是文本处理领域的一项基础性研究工作,好的术语自动抽取方法能够提高本体构建的质量和语义检索的精度。首先,对术语的定义、特性以及术语抽取效果的评价方法进行了概述。然后,在分析和总结近20年术语自动抽取相关文献的基础上,对术语自动抽取的各种方法进行了详细的综述。介绍了这些方法的研究进展,分析了其优缺点,并详细描述了部分经典算法。最后,对术语自动抽取未来研究的趋势进行了展望。
[1] Brewster C A,Iria J,Zhang Z,et al.Dynamic Iterative Ontology Learning[C]∥Recent Advances in Natural Language Processing(RANLP’07).2007 [2] Wolf P,Bernardi U,Federmann C,et al.From Statistical Term Extraction to Hybrid Machine Translation[C]∥15th International Conference of the European Association for Machine Translation.2011:225 [3] Liang Y H,Li J,Ye L,et al.The Chinese Unknown TermTranslation Mining with Supervised Candidate Term Extraction Strategy[J].Procedia Engineering,2011,15:1388-1392 [4] Pavlopoulos J,Androutsopoulos I.Aspect Term Extraction for Sentiment Analysis:New Datasets,New Evaluation Measures and an Improved Unsupervised Method[C]∥Proceedings of the 5th Workshop on Language Analysis for Social Media(LASM).2014:44-52 [5] Bhagdev R,Butters J,Chakravarthy A,et al.Doris:Managing Document-based Knowledge in Large Organisations via Semantic Web Technologies[C]∥Semantic Web Challenge.2007 [6] Kozakov L,Park Y,Fin T,et al.Glossary extraction and utiliza-tion in the information search and delivery system for IBM Technical Support[J].IBM Systems Journal,2004,43(3):546-563 [7] Sager J C,Dungworth D,McDonald P F.English special languages:principles and practice in science and technology[M].Wiesbaden:Brandstetter,1980 [8] 冯志伟.现代术语学引论[M].北京:语文出版社,1997 Feng Zhi-wei.An introduction to modern terminology[M].Beijing:Language and Literature Press,1997 [9] 术语工作原则与方法:GB/T 10112-1999[S].北京:中国标准出版社,2000 Terminology work principles and methods:GB/T 10112-1999[S].Beijing:Standards Press of China,2000 [10] Kageura K,Umino B.Methods of automatic term recognition:A review[J].Terminology,1996,3(2):259-289 [11] Vivaldi J,Rodríguez H.Evaluation of terms and term extraction systems:A practical approach[J].Terminology,2007,13(2):225-248 [12] Zheng Y,Dou W,Wu G,et al.Automated Chinese domain onto-logy construction from text documents[M]∥Bio-Inspired Computational Intelligence and Applications.Springer Berlin Heidelberg,2007:639-648 [13] Korkontzelos I,Klapaftis I P,Manandhar S.Reviewing and eva-luating automatic term recognition techniques[M]∥Advances in Natural Language Processing.Springer Berlin Heidelberg,2008:248-259 [14] Castellví M T C,Bagot R E,Palatresi J V.Automatic term detection:A review of current systems[M]∥Recent advances in computational terminology.2001:53-88 [15] NLPIR汉语分词系统[EB/OL].http://ictclas.nlpir.org,2014 NLPIR Chinese segmentation system[EB/OL].http://ictclas.nlpir.org,2014 [16] Eddy S R.Hidden markov models[J].Current opinion in structural biology,1996,6(3):361-365 [17] Lafferty J,McCallum A,Pereira F C N.Conditional randomfields:Probabilistic models for segmenting and labeling sequence data[C].2001 [18] Agarwal M,Goutam R,Jain A,et al.Comparative Analysis ofthe Performance of CRF,HMM and MaxEnt for Part-of-Speech Tagging,Chunking and Named Entity Recognition for a Morphologically rich language[C]∥Proceedings of the Pacific Association For Computational Lingustics(PACLING2011).2011 [19] Zheng D,Zhao T,Yang J.Research on domain term extraction based on conditional random fields[M]∥Computer Processing of Oriental Languages.Language Technology for the Know-ledge-based Economy:ICCPOL.Springer Berlin Heidelberg,2009:290-296 [20] Li L S,Dang Y Z,Zhang J,et al.Domain Term Extraction Based on Conditional Random Fields Combined with Active Learning Strategy[J].Journal of Information & Computational Science,2012,9(7):1931-1940 [21] Voutilainen A.NPtool,a detector of English noun phrases[C]∥Proceedings of the Workshop on Very Large Corpora Columbus.Ohio:Ohio State University,June 1993 [22] Park Y,Byrd R J,Boguraev B K.Automatic glossary extraction:beyond terminology identification[C]∥Proceedings of the 19th international conference on Computational linguistics-Volume 1.Association for Computational Linguistics,2002:1-7 [23] Evans D A,Lefferts R G.Clarit-trec experiments[J].Information processing & management,1995,31(3):385-395 [24] Bolshakova E,Loukachevitch N,Nokel M.Topic models can improve domain term extraction[M]∥Advances in Information Retrieval.Springer Berlin Heidelberg,2013:684-687 [25] Velardi P,Missikoff M,Basili R.Identification of relevant terms to support the construction of domain ontologies[C]∥Procee-dings of the workshop on Human Language Technology and Knowledge Management-Volume 2001.Association for Computational Linguistics,2001:5 [26] Daille B.Study and implementation of combined techniques for automatic extraction of terminology[M]∥The balancing act:Combining symbolic and statistical approaches to language.MIT Press,Cambridge, 1996,1:49-66 [27] Gelbukh A,Sidorov G,Lavin-Villa E,et al.Automatic term extraction using log-likelihood based comparison with general re-ference corpus[M]∥Natural Language Processing and Information Systems.Springer Berlin Heidelberg,2010:248-255 [28] Cohen J D.Highlights:Language-and domain-independent automatic indexing terms for abstracting[J].Journal of the American Society for Information Science,1995,46(3):162-174 [29] Drouin P.Term extraction using non-technical corpora as a point of leverage[J].Terminology,2003,9(1):99-115 [30] Dunning T.Accurate methods for the statistics of surprise and coincidence[J].Computational linguistics,1993,19(1):61-74 [31] Frantzi K,Ananiadou S.The C-value/NC-value domain inde-pendent method for multi-word term extraction[J].Journal of Natural Language Processing,1999,6(3):20-27 [32] Piao S,Forth J,Gacitua R,et al.Evaluating tools for automatic concept extraction:A case study from the musicology domain[C]∥Proceedings of Digital Futures.2010 [33] Ventura J A L,Jonquet C,Roche M,et al.Combining C-value and Keyword Extraction Methods for Biomedical Terms Extraction[C]∥International Symposium on Languages in Biology and Medicine(LBM’2013).2013:45-49 [34] 周浪,张亮,冯冲,等.基于词频分布变化统计的术语抽取方法[J].计算机科学,2009,36(5):177-180 Zhou Lang,Zhang Liang,Feng Chong,et al.Terminology Extraction Based on Statistical Word Frequency Distribution Variety[J].Computer Science,2009,36(5):177-180 [35] Eddy S R.Hidden markov models[J].Current opinion in structural biology,1996,6(3):361-365 [36] Wikipedia.Hidden_Markov_model[EB/OL].http://en.wikipedia.org/wiki/Hidden_Markov_model,2014 [37] Wikipedia.Conditional_random_field[EB/OL].http://en.wikipedia.org/wiki/Conditional_random_field,2014 [38] 李丽双,党延忠,张婧,等.基于条件随机场的汽车领域术语抽取[J].大连理工大学学报,2013,53(2):267-272 Li Li-shuang,Dang Yan-zhong,Zhang Jing,et al.Automotive Term Extraction Based on Conditional Random Fields[J].Joural of Dalian University of Technology,2013,53(2):267-272 [39] da Silva Conrado M,Pardo T,Rezende S O.A Machine Learning Approach to Automatic Term Extraction using a Rich Feature Set[C]∥HLT-NAACL.2013:16-23 [40] 吴云芳,穗志方,邱利坤,等.信息科学与技术领域术语部件描述[J].语言文字应用,2003(4):34-39 Wu Yun-fang,Sui Zhi-fang,Qiu Li-kun,et al.The Approaches and Strategies to Describe the Term Component in Information Science and Technology[J].Applied Linguistics,2003(4):34-39 [41] 汤青,吕学强,李卓,等.领域本体术语抽取研究[J].现代图书情报技术,2014(1):43-50 Tang Qing,Lv Xue-qiang,Li Zhuo,et al.Research on Domain Ontology Term Extraction[J].New Technology of Library and Information Service,2014(1):43-50 |
No related articles found! |
|