计算机科学 ›› 2023, Vol. 50 ›› Issue (2): 285-291.doi: 10.11896/jsjkx.211200066
曾楠, 谢志鹏
ZENG Nan, XIE Zhipeng
摘要: 词语的上下位关系判别是自然语言处理中一项基础且具有挑战性的任务。传统的有监督方法通常采用单个模型在整个语义空间中对所有上下位词对进行全局建模,并取得了一定的效果。然而,上下位关系的分布式语义表征具有相当的复杂性,在语义空间的不同区域中往往具有不同的表现,使得全局模型难以学习。针对此问题,文中提出了基于混合专家的上下位关系判别方法。该模型基于分而治之的策略,将语义空间划分为多个子空间,每个子空间对应一个局部专家(模型),局部专家(模型)关注它们自己的子空间,并采用门控机制决定空间的分割和专家的混合。实验结果表明,这种专家混合模型在公开数据集上的性能优于传统的全局模型。
中图分类号:
[1]NAVIGLI R,VELARDI P,FARALLI S.A graph-based algo-rithm for inducing lexical taxonomies from scratch[C]//Twenty-Second International Joint Conference on Artificial Intelligence.Barcelona:IJCAI/AAAI,2011:1872-1877. [2]LAN Y,JIANG J.Embedding WordNet knowledge for textual entailment[C]//Proceedings of the 27th International Confe-rence on Computational Linguistics:Santa Fe.New Mexico:ACL,2018:270-281. [3]CHEN Q,ZHU X,LING Z H,et al.Neural natural language inference models enhanced with external knowledge[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,Melbourne.Australia:ACL,2018:2406-2417. [4]HUANG Z,THINT M,QIN Z.Question classification usinghead words and their hypernyms[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Proces-sing.Honolulu:ACL,2008:927-936. [5]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//1st International Conference on Learning Representations.Scottsdale:ICLR,2013:Workshop Poster. [6]BARONI M,BERNARDI R,DO N Q,et al.Entailment above the word level in distributional semantics[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics.Avignon:ACL,2012:23-32. [7]ROLLER S,ERK K,BOLEDA G.Inclusive yet selective:Supervised distributional hypernymy detection[C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics:Technical Papers.Dublin:ACL,2014:1025-1036. [8]GLAVAŠ G,PONZETTO S P.Dual tensor model for detecting asymmetric lexico-semantic relations[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Copenhagen:ACL,2017:1757-1767. [9]REI M,GERZ D,VULIĆ I.Scoring lexical entailment with a supervised directional similarity network[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,Volume 2:Short Papers.Melbourne:ACL,2018:638-643. [10]MILLER G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41. [11]HEARST M A.Automatic acquisition of hyponyms from large text corpora[C]//The 15th International Conference on Computational Linguistics.Nantes:ACL,1992:539-545. [12]KOZAREVA Z,HOVY E.A semi-supervised method to learn and construct taxonomies using the web[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.Massachusetts:ACL,2010:1110-1118. [13]SHWARTZ V,GOLDBERG Y,DAGAN I.Improving hypernymy detection with an integrated path-based and distributional method[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:2389-2398. [14]SHI Y,SHEN J,LI Y,et al.Discovering hypernymy in text-rich heterogeneous information network by exploiting context granularity[C]//Proceedings of the 28th ACM International Confe-rence on Information and Knowledge Management.Beijing:ACM,2019:599-608. [15]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162. [16]WEEDS J,WEIR D.A general framework for distributional similarity[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing.Sapporo:ACL,2003:81-88. [17]KOTLERMAN L,DAGAN I,SZPEKTOR I,et al.Directionaldistributional similarity for lexical inference[J].Natural Language Engineering,2010,16(4):359-389. [18]SHWARTZ V,SANTUS E,SCHLECHTWEG D.Hypernyms under siege:Linguistically-motivated artillery for hypernymy detection[C]//Proceedings of the 15th Conference of the Euro-pean Chapter of the Association for Computational Linguistics.Valencia:EACL,2016:65-75. [19]CLARKE D.Context-theoretic semantics for natural language:an overview[C]//Proceedings of the Workshop on Geometrical Models of Natural Language Semantics.2009:112-119. [20]TURNEY P D,MOHAMMAD S M.Experiments with three approaches to recognizing lexical entailment[J].Natural Language Engineering,2015,21(3):437-476. [21]WEEDS J,CLARKE D,REFFIN J,et al.Learning to distinguish hypernyms and co-hyponyms[C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics.Dublin:ACL,2014:2249-2259. [22]FU R,GUO J,QIN B,et al.Learning semantic hierarchies via word embeddings[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Baltimore:ACL,2014:1199-1209. [23]NGUYEN K A,KÖPER M,WALDE S S,et al.Hierarchical embeddings for hypernymy detection and directionality[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Copenhagen:ACL,2017:233-243. [24]DASH S,CHOWDHURY M F M,GLIOZZO A,et al.Hypernym detection using strict partial order networks[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.Online:AAAI,2020:7626-7633. [25]XIE Z,ZENG N.A Mixture-of-Experts Model for Antonym-Synonym Discrimination[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 2:Short Papers).Online:ACL,2021:558-564. [26]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605. [27]SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:The sparsely-gated mixture-of-experts layer[C]//5th International Conference on Learning Representations.Toulon:OpenReview.net,2017:1-29. [28]LEVY O,REMUS S,BIEMANN C,et al.Do supervised distributional methods really learn lexical inference relations?[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Denver:ACL,2015:970-976. [29]BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enriching word vectors with subword information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146. [30]WANG C,HE X.Birre:learning bidirectional residual relationembeddings for supervised hypernymy detection[C]//Procee-dings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:ACL,2020:3630-3640. [31]LOSHCHILOV I,HUTTER F.Fixing weight decay regularization in adam[J/OL].CoRR,2017,abs/1711.05101:1-14.https://www.doc88.com/p-9029673865620.html. |
[1] | 李占山,陈 超,叶寒锋. 基于时序的离散事件系统的可诊断性 Diagnosability of Discrete-event Systems Based on Temporal 计算机科学, 2012, 39(8): 210-214. |
|