计算机科学 ›› 2023, Vol. 50 ›› Issue (2): 285-291.doi: 10.11896/jsjkx.211200066

• 人工智能 • 上一篇    下一篇

基于混合专家模型的词语上下位关系判别方法

曾楠, 谢志鹏   

  1. 复旦大学计算机科学技术学院 上海 200438
  • 收稿日期:2021-12-05 修回日期:2022-05-01 出版日期:2023-02-15 发布日期:2023-02-22
  • 通讯作者: 谢志鹏(xiezp@fudan.edu.cn )
  • 作者简介:(nzeng19@fudan.edu.cn)
  • 基金资助:
    国家重点研发计划(2018YFB1005100);国家自然科学基金(62076072)

Mixture-of-Experts Model for Hypernymy Discrimination

ZENG Nan, XIE Zhipeng   

  1. School of Computer Science,Fudan University,Shanghai 200438,China
  • Received:2021-12-05 Revised:2022-05-01 Online:2023-02-15 Published:2023-02-22
  • Supported by:
    National Key Research and Development Program of China(2018YFB1005100) and National Natural Science Foundation of China(62076072)

摘要: 词语的上下位关系判别是自然语言处理中一项基础且具有挑战性的任务。传统的有监督方法通常采用单个模型在整个语义空间中对所有上下位词对进行全局建模,并取得了一定的效果。然而,上下位关系的分布式语义表征具有相当的复杂性,在语义空间的不同区域中往往具有不同的表现,使得全局模型难以学习。针对此问题,文中提出了基于混合专家的上下位关系判别方法。该模型基于分而治之的策略,将语义空间划分为多个子空间,每个子空间对应一个局部专家(模型),局部专家(模型)关注它们自己的子空间,并采用门控机制决定空间的分割和专家的混合。实验结果表明,这种专家混合模型在公开数据集上的性能优于传统的全局模型。

关键词: 上下位关系判别, 混合专家, 局部模型

Abstract: Hypernymy discrimination is an essential and challenging task in NLP.Traditional supervised methods usually model all the hypernymies in the global semantic space,which has achieved fair performance.However,the distributed semantic representation of hypernymies is rather complex,and their manifestations may differ significantly in different areas of the semantic space,making it difficult to learn the global model.This paper employs the mixture-of-experts framework as a solution.It works on the basis of a divide-and-conquer strategy,which divides the semantic space into multiple subspaces,and each subspace corres-ponds to a local expert(model).A number of localized experts(models) focus on their own domains(or subspaces) to learn their specialties,and a gating mechanism determines the space partitioning and the expert aggregation.Experimental results show that the mixture-of-experts model outperforms the traditional global ones on public datasets.

Key words: Hypernymy discrimination, Mixture-of-Experts, Local model

中图分类号: 

  • TP391
[1]NAVIGLI R,VELARDI P,FARALLI S.A graph-based algo-rithm for inducing lexical taxonomies from scratch[C]//Twenty-Second International Joint Conference on Artificial Intelligence.Barcelona:IJCAI/AAAI,2011:1872-1877.
[2]LAN Y,JIANG J.Embedding WordNet knowledge for textual entailment[C]//Proceedings of the 27th International Confe-rence on Computational Linguistics:Santa Fe.New Mexico:ACL,2018:270-281.
[3]CHEN Q,ZHU X,LING Z H,et al.Neural natural language inference models enhanced with external knowledge[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,Melbourne.Australia:ACL,2018:2406-2417.
[4]HUANG Z,THINT M,QIN Z.Question classification usinghead words and their hypernyms[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Proces-sing.Honolulu:ACL,2008:927-936.
[5]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//1st International Conference on Learning Representations.Scottsdale:ICLR,2013:Workshop Poster.
[6]BARONI M,BERNARDI R,DO N Q,et al.Entailment above the word level in distributional semantics[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics.Avignon:ACL,2012:23-32.
[7]ROLLER S,ERK K,BOLEDA G.Inclusive yet selective:Supervised distributional hypernymy detection[C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics:Technical Papers.Dublin:ACL,2014:1025-1036.
[8]GLAVAŠ G,PONZETTO S P.Dual tensor model for detecting asymmetric lexico-semantic relations[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Copenhagen:ACL,2017:1757-1767.
[9]REI M,GERZ D,VULIĆ I.Scoring lexical entailment with a supervised directional similarity network[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,Volume 2:Short Papers.Melbourne:ACL,2018:638-643.
[10]MILLER G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41.
[11]HEARST M A.Automatic acquisition of hyponyms from large text corpora[C]//The 15th International Conference on Computational Linguistics.Nantes:ACL,1992:539-545.
[12]KOZAREVA Z,HOVY E.A semi-supervised method to learn and construct taxonomies using the web[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.Massachusetts:ACL,2010:1110-1118.
[13]SHWARTZ V,GOLDBERG Y,DAGAN I.Improving hypernymy detection with an integrated path-based and distributional method[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:2389-2398.
[14]SHI Y,SHEN J,LI Y,et al.Discovering hypernymy in text-rich heterogeneous information network by exploiting context granularity[C]//Proceedings of the 28th ACM International Confe-rence on Information and Knowledge Management.Beijing:ACM,2019:599-608.
[15]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162.
[16]WEEDS J,WEIR D.A general framework for distributional similarity[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing.Sapporo:ACL,2003:81-88.
[17]KOTLERMAN L,DAGAN I,SZPEKTOR I,et al.Directionaldistributional similarity for lexical inference[J].Natural Language Engineering,2010,16(4):359-389.
[18]SHWARTZ V,SANTUS E,SCHLECHTWEG D.Hypernyms under siege:Linguistically-motivated artillery for hypernymy detection[C]//Proceedings of the 15th Conference of the Euro-pean Chapter of the Association for Computational Linguistics.Valencia:EACL,2016:65-75.
[19]CLARKE D.Context-theoretic semantics for natural language:an overview[C]//Proceedings of the Workshop on Geometrical Models of Natural Language Semantics.2009:112-119.
[20]TURNEY P D,MOHAMMAD S M.Experiments with three approaches to recognizing lexical entailment[J].Natural Language Engineering,2015,21(3):437-476.
[21]WEEDS J,CLARKE D,REFFIN J,et al.Learning to distinguish hypernyms and co-hyponyms[C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics.Dublin:ACL,2014:2249-2259.
[22]FU R,GUO J,QIN B,et al.Learning semantic hierarchies via word embeddings[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Baltimore:ACL,2014:1199-1209.
[23]NGUYEN K A,KÖPER M,WALDE S S,et al.Hierarchical embeddings for hypernymy detection and directionality[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Copenhagen:ACL,2017:233-243.
[24]DASH S,CHOWDHURY M F M,GLIOZZO A,et al.Hypernym detection using strict partial order networks[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.Online:AAAI,2020:7626-7633.
[25]XIE Z,ZENG N.A Mixture-of-Experts Model for Antonym-Synonym Discrimination[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 2:Short Papers).Online:ACL,2021:558-564.
[26]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605.
[27]SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:The sparsely-gated mixture-of-experts layer[C]//5th International Conference on Learning Representations.Toulon:OpenReview.net,2017:1-29.
[28]LEVY O,REMUS S,BIEMANN C,et al.Do supervised distributional methods really learn lexical inference relations?[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Denver:ACL,2015:970-976.
[29]BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enriching word vectors with subword information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146.
[30]WANG C,HE X.Birre:learning bidirectional residual relationembeddings for supervised hypernymy detection[C]//Procee-dings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:ACL,2020:3630-3640.
[31]LOSHCHILOV I,HUTTER F.Fixing weight decay regularization in adam[J/OL].CoRR,2017,abs/1711.05101:1-14.https://www.doc88.com/p-9029673865620.html.
[1] 李占山,陈 超,叶寒锋.
基于时序的离散事件系统的可诊断性
Diagnosability of Discrete-event Systems Based on Temporal
计算机科学, 2012, 39(8): 210-214.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!