Computer Science ›› 2015, Vol. 42 ›› Issue (8): 7-12.

Previous Articles     Next Articles

Survey of Automatic Terminology Extraction Methodologies

YUAN Jin-song, ZHANG Xiao-ming and LI Zhou-jun   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Terminology extraction is a fundamental research work for text processing domain.The quality of ontology and accuracy of sematic retrieval can be improved by using a better automatic terminology extraction method.Firstly,the definition and characteristic of terminology,as well as the evaluation of terminology extraction were briefly introduced.Secondly,through a thorough analysis and summarization of literatures about automatic terminology extraction in recent twenty years,a comprehensive survey of state-of-the-art automatic terminology extraction methodologies was conducted,which includes domestic and international current research,their advantages and disadvantages and detailed descriptions of some classical algorithms.Finally,the trend of future study was discussed.

Key words: Terminology extraction,Text processing,Evaluation measures,Automatic extraction methodologies

[1] Brewster C A,Iria J,Zhang Z,et al.Dynamic Iterative Ontology Learning[C]∥Recent Advances in Natural Language Processing(RANLP’07).2007
[2] Wolf P,Bernardi U,Federmann C,et al.From Statistical Term Extraction to Hybrid Machine Translation[C]∥15th International Conference of the European Association for Machine Translation.2011:225
[3] Liang Y H,Li J,Ye L,et al.The Chinese Unknown TermTranslation Mining with Supervised Candidate Term Extraction Strategy[J].Procedia Engineering,2011,15:1388-1392
[4] Pavlopoulos J,Androutsopoulos I.Aspect Term Extraction for Sentiment Analysis:New Datasets,New Evaluation Measures and an Improved Unsupervised Method[C]∥Proceedings of the 5th Workshop on Language Analysis for Social Media(LASM).2014:44-52
[5] Bhagdev R,Butters J,Chakravarthy A,et al.Doris:Managing Document-based Knowledge in Large Organisations via Semantic Web Technologies[C]∥Semantic Web Challenge.2007
[6] Kozakov L,Park Y,Fin T,et al.Glossary extraction and utiliza-tion in the information search and delivery system for IBM Technical Support[J].IBM Systems Journal,2004,43(3):546-563
[7] Sager J C,Dungworth D,McDonald P F.English special languages:principles and practice in science and technology[M].Wiesbaden:Brandstetter,1980
[8] 冯志伟.现代术语学引论[M].北京:语文出版社,1997 Feng Zhi-wei.An introduction to modern terminology[M].Beijing:Language and Literature Press,1997
[9] 术语工作原则与方法:GB/T 10112-1999[S].北京:中国标准出版社,2000 Terminology work principles and methods:GB/T 10112-1999[S].Beijing:Standards Press of China,2000
[10] Kageura K,Umino B.Methods of automatic term recognition:A review[J].Terminology,1996,3(2):259-289
[11] Vivaldi J,Rodríguez H.Evaluation of terms and term extraction systems:A practical approach[J].Terminology,2007,13(2):225-248
[12] Zheng Y,Dou W,Wu G,et al.Automated Chinese domain onto-logy construction from text documents[M]∥Bio-Inspired Computational Intelligence and Applications.Springer Berlin Heidelberg,2007:639-648
[13] Korkontzelos I,Klapaftis I P,Manandhar S.Reviewing and eva-luating automatic term recognition techniques[M]∥Advances in Natural Language Processing.Springer Berlin Heidelberg,2008:248-259
[14] Castellví M T C,Bagot R E,Palatresi J V.Automatic term detection:A review of current systems[M]∥Recent advances in computational terminology.2001:53-88
[15] NLPIR汉语分词系统[EB/OL].,2014 NLPIR Chinese segmentation system[EB/OL].,2014
[16] Eddy S R.Hidden markov models[J].Current opinion in structural biology,1996,6(3):361-365
[17] Lafferty J,McCallum A,Pereira F C N.Conditional randomfields:Probabilistic models for segmenting and labeling sequence data[C].2001
[18] Agarwal M,Goutam R,Jain A,et al.Comparative Analysis ofthe Performance of CRF,HMM and MaxEnt for Part-of-Speech Tagging,Chunking and Named Entity Recognition for a Morphologically rich language[C]∥Proceedings of the Pacific Association For Computational Lingustics(PACLING2011).2011
[19] Zheng D,Zhao T,Yang J.Research on domain term extraction based on conditional random fields[M]∥Computer Processing of Oriental Languages.Language Technology for the Know-ledge-based Economy:ICCPOL.Springer Berlin Heidelberg,2009:290-296
[20] Li L S,Dang Y Z,Zhang J,et al.Domain Term Extraction Based on Conditional Random Fields Combined with Active Learning Strategy[J].Journal of Information & Computational Science,2012,9(7):1931-1940
[21] Voutilainen A.NPtool,a detector of English noun phrases[C]∥Proceedings of the Workshop on Very Large Corpora Columbus.Ohio:Ohio State University,June 1993
[22] Park Y,Byrd R J,Boguraev B K.Automatic glossary extraction:beyond terminology identification[C]∥Proceedings of the 19th international conference on Computational linguistics-Volume 1.Association for Computational Linguistics,2002:1-7
[23] Evans D A,Lefferts R G.Clarit-trec experiments[J].Information processing & management,1995,31(3):385-395
[24] Bolshakova E,Loukachevitch N,Nokel M.Topic models can improve domain term extraction[M]∥Advances in Information Retrieval.Springer Berlin Heidelberg,2013:684-687
[25] Velardi P,Missikoff M,Basili R.Identification of relevant terms to support the construction of domain ontologies[C]∥Procee-dings of the workshop on Human Language Technology and Knowledge Management-Volume 2001.Association for Computational Linguistics,2001:5
[26] Daille B.Study and implementation of combined techniques for automatic extraction of terminology[M]∥The balancing act:Combining symbolic and statistical approaches to language.MIT Press,Cambridge, 1996,1:49-66
[27] Gelbukh A,Sidorov G,Lavin-Villa E,et al.Automatic term extraction using log-likelihood based comparison with general re-ference corpus[M]∥Natural Language Processing and Information Systems.Springer Berlin Heidelberg,2010:248-255
[28] Cohen J D.Highlights:Language-and domain-independent automatic indexing terms for abstracting[J].Journal of the American Society for Information Science,1995,46(3):162-174
[29] Drouin P.Term extraction using non-technical corpora as a point of leverage[J].Terminology,2003,9(1):99-115
[30] Dunning T.Accurate methods for the statistics of surprise and coincidence[J].Computational linguistics,1993,19(1):61-74
[31] Frantzi K,Ananiadou S.The C-value/NC-value domain inde-pendent method for multi-word term extraction[J].Journal of Natural Language Processing,1999,6(3):20-27
[32] Piao S,Forth J,Gacitua R,et al.Evaluating tools for automatic concept extraction:A case study from the musicology domain[C]∥Proceedings of Digital Futures.2010
[33] Ventura J A L,Jonquet C,Roche M,et al.Combining C-value and Keyword Extraction Methods for Biomedical Terms Extraction[C]∥International Symposium on Languages in Biology and Medicine(LBM’2013).2013:45-49
[34] 周浪,张亮,冯冲,等.基于词频分布变化统计的术语抽取方法[J].计算机科学,2009,36(5):177-180 Zhou Lang,Zhang Liang,Feng Chong,et al.Terminology Extraction Based on Statistical Word Frequency Distribution Variety[J].Computer Science,2009,36(5):177-180
[35] Eddy S R.Hidden markov models[J].Current opinion in structural biology,1996,6(3):361-365
[36] Wikipedia.Hidden_Markov_model[EB/OL].,2014
[37] Wikipedia.Conditional_random_field[EB/OL].,2014
[38] 李丽双,党延忠,张婧,等.基于条件随机场的汽车领域术语抽取[J].大连理工大学学报,2013,53(2):267-272 Li Li-shuang,Dang Yan-zhong,Zhang Jing,et al.Automotive Term Extraction Based on Conditional Random Fields[J].Joural of Dalian University of Technology,2013,53(2):267-272
[39] da Silva Conrado M,Pardo T,Rezende S O.A Machine Learning Approach to Automatic Term Extraction using a Rich Feature Set[C]∥HLT-NAACL.2013:16-23
[40] 吴云芳,穗志方,邱利坤,等.信息科学与技术领域术语部件描述[J].语言文字应用,2003(4):34-39 Wu Yun-fang,Sui Zhi-fang,Qiu Li-kun,et al.The Approaches and Strategies to Describe the Term Component in Information Science and Technology[J].Applied Linguistics,2003(4):34-39
[41] 汤青,吕学强,李卓,等.领域本体术语抽取研究[J].现代图书情报技术,2014(1):43-50 Tang Qing,Lv Xue-qiang,Li Zhuo,et al.Research on Domain Ontology Term Extraction[J].New Technology of Library and Information Service,2014(1):43-50

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[3] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[4] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[5] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[6] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111, 142 .
[7] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[8] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .
[9] LIAO Xing, YUAN Jing-ling and CHEN Min-cheng. Parallel PSO Container Packing Algorithm with Adaptive Weight[J]. Computer Science, 2018, 45(3): 231 -234, 273 .
[10] SHI Chao, XIE Zai-peng, LIU Han and LV Xin. Optimization of Container Deployment Strategy Based on Stable Matching[J]. Computer Science, 2018, 45(4): 131 -136 .