计算机科学 ›› 2018, Vol. 45 ›› Issue (1): 167-172.doi: 10.11896/j.issn.1002-137X.2018.01.029
李佳,郭剑毅,刘艳超,余正涛,线岩团,阮氏青娥
LI Jia, GUO Jian-yi, LIU Yan-chao, YU Zheng-tao, XIAN Yan-tuan and NGUY~N Qing’e
摘要: 组合歧义消解是分词中的关键问题之一,直接影响到分词的准确率。为了解决越南语组合歧义对分词的影响问题,结合越南语组合型词的特点,提出了一种基于集成学习的越南语组合歧义消解方法。该方法首先通过人工选取越南语组合歧义词,构建出越南语组合歧义字段库,对越南语语料与越南语组合词词典进行匹配,抽取出越南语组合歧义字段;其次,采用三类分类器引入越南语词频特征和上下文信息,构建三类分类器消解模型,得到三类分类器消解结果;最后,计算出各分类器权值,通过阈值对越南语组合歧义进行最终分类。实验表明,所提方法的正确率达到了83.32%,与消歧结果最好的单个分类器相比准确率提高了5.81%。
[1] BAR-HILLEL Y.The present status of automatic translation of languages[J].Advances in Computers,1960,1:91-163. [2] SCHMID H.Tokenizing.In:Anke Lüdeling and Merja Kyt[M]∥An International Handbook.Mouton de Gruyter,Berlin,2007. [3] LIANG N Y.Written Chinese divided into automatic system—CDWS [J].Journal of Chinese Information Processing,1987,1(2):46-54.(in Chinese) 梁南元.书面汉语自动分词系统—CDWS[J].中文信息学报,1987,1(2):46-54. [4] L H P N T M,HUY 'n A R,Vinh H T.A Hybrid Approach to Word Segmentation of Vietnamese Texts[C]∥Proceedings of the 2nd International Conference on Language and Automata Theory and Applications.2008. [5] FENG S Q,CHEN H M.Context-based Approach to Combinational Ambiguity Resolution in Chinese Word Segmentation[J].Journal of Chinese Information Processing,2007,21(6):13-16.(in Chinese) 冯素琴,陈惠明.基于语境信息的汉语组合型歧义消歧方法[J].中文信息学报,2007,21(6):13-16. [6] NGO Q H,DIEN D,WINIWARTER W.A hybrid method for word segmentation with English-Vietnamese bilingual text[C]∥ 2013 International Conference on Control,Automation and Information Sciences (ICCAIS).IEEE,2013:48-52. [7] WANG S L,WANG B.A Chinese Overlapping Ambiguity Resolution Method Based on Coupling Degree of Double Characters [J].Journal of Chinese Information Processing,2007,21(5):14-17.(in Chinese) 王思力,王斌.基于双字耦合度的中文分词交叉歧义处理方法[J].中文信息学报,2007,21(5):14-17. [8] LI M,GAO J,HUANG C,et al.Unsupervised training for overlapping ambiguity resolution in Chinese word segmentation[C]∥Proceedings of the second SIGHAN workshop on Chinese language processing.Association for Computational Linguistics,2003:1-7. [9] XIONG M M.Vietnamese news event element extraction me-thod study[D].Kunming:Kunming University of Science and Technology,2016.(in Chinese) 熊明明.越南语词法分析研究[D].昆明:昆明理工大学,2016. [10] PHAM D D,TRAN G B,PHAM S B.A hybrid approach tovietnamese word segmentation using part of speech tags[C]∥International Conference on Knowledge and Systems Enginee-ring,2009(KSE’09).IEEE,2009:154-161. [11] QIN Y,WANG X J,ZHANG S X.Research on Combinational Ambiguity in Chinese Word Segmentation [J].Journal of Chinese Information Processing,2007,21(1):1-8.(in Chinese) 秦颖,王小捷,张素香.汉语分词中组合歧义字段的研究[J].中文信息学报,2007,21(1):1-8. [12] ZHANG Y H,PAN L L,PENG Z P,et al.Resolving combinational ambiguity in Chinese word segmentation based on rule mining and Naive Bayes method [J].Journal of Computer Applications,2008,28(7):1686-1688.(in Chinese) 张严虎,潘璐璐,彭子平,等.基于规则挖掘和Naive Bayes 方法的组合型歧义字段切分[J].计算机应用,2008,28(7):1686-1688. [13] SAHA S,EKBAL A.Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition[J].Data & Knowledge Engineering,2013,85:15-39. [14] REMYA K R,RAMYA J S.Using weighted majority votingclassifier combination for relation classification in biomedical texts[C]∥2014 International Conference on Control,Instrumentation,Communication and Computational Technologies (ICCICCT).IEEE,2014:1205-1209. [15] REYHANIAN S,ARBABI E.Weighted Vote Fusion in prototype random subspace for thermal to visible face recognition[C]∥2015 2nd International Conference on Pattern Recognition and Image Analysis (IPRIA).IEEE,2015:1-5. [16] NIKAN S,AHMADI M.Human face recognition under occlusion using lbp and entropy weighted voting[C]∥2012 21st International Conference on Pattern Recognition (ICPR).IEEE,2012:1699-1702. [17] E SILVA R R V,DE ARAUJO F H D,DOS SANTOS L M R,et al.Optic disc detection in retinal images using algorithms committee with weighted voting[J].IEEE Latin America Tran-sactions,2016,14(5):2446-2454. [18] MAI F,WU S,CUI T.Improved Chinese Word Segmentation Disambiguation Model Based on Conditional Random Fields[C]∥Proceedings of the 4th International Conference on Computer Engineering and Networks.Springer International Publishing,2015:599-605. [19] YAROWSKY D,FLORIAN R.Evaluating Sense Dis2 ambigua-tion Performance Across Diverse Parameter Spaces[J].Natural Language Engineering,2002,8(4):293-310. [20] LU S,BAI S.Quantitative Analysis of Context Field in Nature Language Processing[J].Chinese Journal of Computers,2001,24(7):742-747.(in Chinese) 鲁松,白硕.自然语言处理中词语上下文有效范围的定量描述[J].计算机学报,2001,24(7):742-747. [21] DELLA PIETRA S,DELLA PIETRA V,L AFFERTY J.Inducing features of random fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(4):380-393. [22] WALLACH H.Efficient training of conditional random fields[D].University of Edinburgh,2002. [23] BERGER A L,PIETRA V J D,PIETRA S A D.A maximum entropy approach to natural language processing[J].Computational linguistics,1996,22(1):39-71. [24] VAPNIK V.The nature of statistical learning theory[M].Springer Science & Business Media,2013. [25] VAPNIK V N,VAPNIK V.Statistical learning theory[M].New York:Wiley,1998. [26] LI Y,TAX D M J,DUIN R P W,et al.Multiple-instance lear-ning as a classifier combining problem[J].Pattern Recognition,2013,46(3):865-874. [27] 周志华.机器学习[M].北京:清华大学出版社,2016:171-184. |
No related articles found! |
|