Computer Science ›› 2023, Vol. 50 ›› Issue (2): 285-291.doi: 10.11896/jsjkx.211200066

• Artificial Intelligence • Previous Articles     Next Articles

Mixture-of-Experts Model for Hypernymy Discrimination

ZENG Nan, XIE Zhipeng   

  1. School of Computer Science,Fudan University,Shanghai 200438,China
  • Received:2021-12-05 Revised:2022-05-01 Online:2023-02-15 Published:2023-02-22
  • Supported by:
    National Key Research and Development Program of China(2018YFB1005100) and National Natural Science Foundation of China(62076072)

Abstract: Hypernymy discrimination is an essential and challenging task in NLP.Traditional supervised methods usually model all the hypernymies in the global semantic space,which has achieved fair performance.However,the distributed semantic representation of hypernymies is rather complex,and their manifestations may differ significantly in different areas of the semantic space,making it difficult to learn the global model.This paper employs the mixture-of-experts framework as a solution.It works on the basis of a divide-and-conquer strategy,which divides the semantic space into multiple subspaces,and each subspace corres-ponds to a local expert(model).A number of localized experts(models) focus on their own domains(or subspaces) to learn their specialties,and a gating mechanism determines the space partitioning and the expert aggregation.Experimental results show that the mixture-of-experts model outperforms the traditional global ones on public datasets.

Key words: Hypernymy discrimination, Mixture-of-Experts, Local model

CLC Number: 

  • TP391
[1]NAVIGLI R,VELARDI P,FARALLI S.A graph-based algo-rithm for inducing lexical taxonomies from scratch[C]//Twenty-Second International Joint Conference on Artificial Intelligence.Barcelona:IJCAI/AAAI,2011:1872-1877.
[2]LAN Y,JIANG J.Embedding WordNet knowledge for textual entailment[C]//Proceedings of the 27th International Confe-rence on Computational Linguistics:Santa Fe.New Mexico:ACL,2018:270-281.
[3]CHEN Q,ZHU X,LING Z H,et al.Neural natural language inference models enhanced with external knowledge[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,Melbourne.Australia:ACL,2018:2406-2417.
[4]HUANG Z,THINT M,QIN Z.Question classification usinghead words and their hypernyms[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Proces-sing.Honolulu:ACL,2008:927-936.
[5]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//1st International Conference on Learning Representations.Scottsdale:ICLR,2013:Workshop Poster.
[6]BARONI M,BERNARDI R,DO N Q,et al.Entailment above the word level in distributional semantics[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics.Avignon:ACL,2012:23-32.
[7]ROLLER S,ERK K,BOLEDA G.Inclusive yet selective:Supervised distributional hypernymy detection[C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics:Technical Papers.Dublin:ACL,2014:1025-1036.
[8]GLAVAŠ G,PONZETTO S P.Dual tensor model for detecting asymmetric lexico-semantic relations[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Copenhagen:ACL,2017:1757-1767.
[9]REI M,GERZ D,VULIĆ I.Scoring lexical entailment with a supervised directional similarity network[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,Volume 2:Short Papers.Melbourne:ACL,2018:638-643.
[10]MILLER G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41.
[11]HEARST M A.Automatic acquisition of hyponyms from large text corpora[C]//The 15th International Conference on Computational Linguistics.Nantes:ACL,1992:539-545.
[12]KOZAREVA Z,HOVY E.A semi-supervised method to learn and construct taxonomies using the web[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.Massachusetts:ACL,2010:1110-1118.
[13]SHWARTZ V,GOLDBERG Y,DAGAN I.Improving hypernymy detection with an integrated path-based and distributional method[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:2389-2398.
[14]SHI Y,SHEN J,LI Y,et al.Discovering hypernymy in text-rich heterogeneous information network by exploiting context granularity[C]//Proceedings of the 28th ACM International Confe-rence on Information and Knowledge Management.Beijing:ACM,2019:599-608.
[15]HARRIS Z S.Distributional structure[J].Word,1954,10(2/3):146-162.
[16]WEEDS J,WEIR D.A general framework for distributional similarity[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing.Sapporo:ACL,2003:81-88.
[17]KOTLERMAN L,DAGAN I,SZPEKTOR I,et al.Directionaldistributional similarity for lexical inference[J].Natural Language Engineering,2010,16(4):359-389.
[18]SHWARTZ V,SANTUS E,SCHLECHTWEG D.Hypernyms under siege:Linguistically-motivated artillery for hypernymy detection[C]//Proceedings of the 15th Conference of the Euro-pean Chapter of the Association for Computational Linguistics.Valencia:EACL,2016:65-75.
[19]CLARKE D.Context-theoretic semantics for natural language:an overview[C]//Proceedings of the Workshop on Geometrical Models of Natural Language Semantics.2009:112-119.
[20]TURNEY P D,MOHAMMAD S M.Experiments with three approaches to recognizing lexical entailment[J].Natural Language Engineering,2015,21(3):437-476.
[21]WEEDS J,CLARKE D,REFFIN J,et al.Learning to distinguish hypernyms and co-hyponyms[C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics.Dublin:ACL,2014:2249-2259.
[22]FU R,GUO J,QIN B,et al.Learning semantic hierarchies via word embeddings[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Baltimore:ACL,2014:1199-1209.
[23]NGUYEN K A,KÖPER M,WALDE S S,et al.Hierarchical embeddings for hypernymy detection and directionality[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Copenhagen:ACL,2017:233-243.
[24]DASH S,CHOWDHURY M F M,GLIOZZO A,et al.Hypernym detection using strict partial order networks[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.Online:AAAI,2020:7626-7633.
[25]XIE Z,ZENG N.A Mixture-of-Experts Model for Antonym-Synonym Discrimination[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 2:Short Papers).Online:ACL,2021:558-564.
[26]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605.
[27]SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:The sparsely-gated mixture-of-experts layer[C]//5th International Conference on Learning Representations.Toulon:OpenReview.net,2017:1-29.
[28]LEVY O,REMUS S,BIEMANN C,et al.Do supervised distributional methods really learn lexical inference relations?[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Denver:ACL,2015:970-976.
[29]BOJANOWSKI P,GRAVE E,JOULIN A,et al.Enriching word vectors with subword information[J].Transactions of the Association for Computational Linguistics,2017,5:135-146.
[30]WANG C,HE X.Birre:learning bidirectional residual relationembeddings for supervised hypernymy detection[C]//Procee-dings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:ACL,2020:3630-3640.
[31]LOSHCHILOV I,HUTTER F.Fixing weight decay regularization in adam[J/OL].CoRR,2017,abs/1711.05101:1-14.https://www.doc88.com/p-9029673865620.html.
[1] LI Shuai, XU Bin, HAN Yike, LIAO Tongxin. SS-GCN:Aspect-based Sentiment Analysis Model with Affective Enhancement and Syntactic Enhancement [J]. Computer Science, 2023, 50(3): 3-11.
[2] WANG Jingbin, LAI Xiaolian, LIN Xinyu, YANG Xinyi. Context-aware Temporal Knowledge Graph Completion Based on Relation Constraints [J]. Computer Science, 2023, 50(3): 23-33.
[3] CHEN Fuqiang, KOU Jiamin, SU Limin, LI Ke. Multi-information Optimized Entity Alignment Model Based on Graph Neural Network [J]. Computer Science, 2023, 50(3): 34-41.
[4] DENG Liang, QI Panhu, LIU Zhenlong, LI Jingxin, TANG Jiqiang. BGPNRE:A BERT-based Global Pointer Network for Named Entity-Relation Joint Extraction Method [J]. Computer Science, 2023, 50(3): 42-48.
[5] LI Zhifei, ZHAO Yue, ZHANG Yan. Survey of Knowledge Graph Reasoning Based on Representation Learning [J]. Computer Science, 2023, 50(3): 94-113.
[6] RAO Dan, SHI Hongwei. Study on Air Traffic Flow Recognition and Anomaly Detection Based on Deep Clustering [J]. Computer Science, 2023, 50(3): 121-128.
[7] DUAN Shunran, YIN Meijuan, LIU Fenlin, JIAO Longlong, YU Lanlan. Nodes’ Ranking Model Based on Influence Prediction [J]. Computer Science, 2023, 50(3): 155-163.
[8] DONG Yongfeng, HUANG Gang, XUE Wanruo, LI Linhao. Graph Attention Deep Knowledge Tracing Model Integrated with IRT [J]. Computer Science, 2023, 50(3): 173-180.
[9] MEI Pengcheng, YANG Jibin, ZHANG Qiang, HUANG Xiang. Sound Event Joint Estimation Method Based on Three-dimension Convolution [J]. Computer Science, 2023, 50(3): 191-198.
[10] BAI Xuefei, MA Yanan, WANG Wenjian. Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion [J]. Computer Science, 2023, 50(3): 199-207.
[11] LIU Hang, PU Yuanyuan, LYU Dahua, ZHAO Zhengpeng, XU Dan, QIAN Wenhua. Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image [J]. Computer Science, 2023, 50(3): 208-215.
[12] LIU Songyue, WANG Huan. Leaf Classification and Ranking Method Based on Multi-granularity Feature Fusion [J]. Computer Science, 2023, 50(3): 216-222.
[13] ZHANG Weiliang, CHEN Xiuhong. SSD Object Detection Algorithm with Cross-layer Fusion and Receptive Field Amplification [J]. Computer Science, 2023, 50(3): 231-237.
[14] CHEN Liang, WANG Lu, LI Shengchun, LIU Changhong. Study on Visual Dashboard Generation Technology Based on Deep Learning [J]. Computer Science, 2023, 50(3): 238-245.
[15] ZHANG Yi, WU Qin. Crowd Counting Network Based on Feature Enhancement Loss and Foreground Attention [J]. Computer Science, 2023, 50(3): 246-253.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!