计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 256-262.doi: 10.11896/jsjkx.230600204

• 人工智能 • 上一篇    下一篇

基于对比学习的大型语言模型反向词典任务提示生成方法

田思成1, 黄少滨1, 王锐1, 李熔盛1, 杜治娟2,3   

  1. 1 哈尔滨工程大学计算机科学与技术学院 哈尔滨 150001
    2 生态大数据教育部工程研究中心 内蒙古 010021
    3 内蒙古大学计算机学院 内蒙古 010021
  • 收稿日期:2023-06-26 修回日期:2023-11-14 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 黄少滨(huangshaobin@hrbeu.edu.cn)
  • 作者简介:(standby@hrbeu.edu.cn)
  • 基金资助:
    生态大数据教育部工程研究中心开放课题

Contrastive Learning-based Prompt Generation Method for Large-scale Language Model ReverseDictionary Task

TIAN Sicheng1, HUANG Shaobin1, WANG Rui1, LI Rongsheng1, DU Zhijuan2,3   

  1. 1 College of Computer Science and Technology,Harbin Engineering University,Harbin 150001,China
    2 Engineering Research Center of Ecological Big Data,Ministry of Education,Inner Mongolia,010021,China
    3 College of Computer,Inner Mongolia University,Inner Mongolia,010021,China
  • Received:2023-06-26 Revised:2023-11-14 Online:2024-08-15 Published:2024-08-13
  • About author:TIAN Sicheng,born in 1997,Ph.D.His main research interests include natural language processing and smart healthcare.
    HUANG Shaobin,born in 1965,Ph.D,professor,Ph.D supervisor.His main research interests include machine learning and natural language proces-sing.
  • Supported by:
    Open Project of Engineering Research Center of Ecological Big Data,Ministry of Education.

摘要: 反向词典任务是一种新兴的任务,目的是根据给定的定义来查找对应的单词。大规模语言模型为这一任务提供了新的可能性,但是提示语句的质量会影响大模型的性能。为此,提出了一种基于对比学习的提示生成方法。该方法在从多个语义层面上理解定义语义的同时,还利用对比学习的原理在训练过程中引入了负例,提升了模型的泛化能力。通过这种方法,可以将目标单词缩小到一个小范围内,然后用大模型从这个范围内选择最符合定义语义的单词。实验结果表明,该方法可以有效地提升大规模语言模型在反向词典任务上的表现。提示生成模型有 94.7% 的概率生成包含目标词的范围,大规模语言模型有 58.03% 的概率直接选出目标单词,有 74.55% 的概率在给出5个候选单词时包含目标单词。

关键词: 反向词典, 大规模语言模型, 对比学习, 多个语义层面, 对比损失

Abstract: Reverse dictionary task is an emerging task that aims to find the corresponding word based on a given definition.Large-scale language models offer new possibilities for this task,but the quality of the prompt sentences affects the performance of the large models.To this end,this paper proposes a contrastive learning-based prompt generation method.This method extracts definition semantics from multiple semantic levels.It also enhances the model’s generalization ability by incorporating negative examples through contrastive learning.With this method,we can narrow down the target word to a small range,and use a large model to select the most semantically consistent word from this range.Experimental results show that the proposed method can effectively improve the performance of large-scale language models on the reverse dictionary task.The prompt generation model has a 94.7% probability of generating a range that contains the target word.The large-scale language model has a 58.03% pro-bability of directly selecting the target word,and a 74.55% probability of including the target word when five candidate words are given.

Key words: Reverse dictionary, Large-scale language model, Contrastive learning, Multiple semantic scales, Contrastive loss

中图分类号: 

  • TP391
[1]MICKUS T,VAN DEEMTER K,CONSTANT M,et al.Sem-Eval-2022 Task 1:CODWOE-Comparing Dictionaries and Word Embeddings[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:1-14.
[2]QI F,ZHANG L,YANG Y,et al.Wantwords:An open-source online reverse dictionary system[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Proces-sing:system demonstrations.2020:175-181.
[3]ARDOIZ A,ORTEGA-MARTÍN M,GARCÍA-SIERRA Ó,et al.MMG at SemEval-2022 Task 1:A Reverse Dictionary approach based on a review of the dataset from a lexicographic perspective[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:68-74.
[4]TRANT H H,MARTINC M,PURVER M,et al.JSI at Sem-Eval-2022 Task 1:CODWOE-Reverse Dictionary:Monolingual and cross-lingual approaches[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:101-106.
[5]KHOSLA P,TETERWAK P,WANG C,et al.Supervised con-trastive learning [J].Neural Information Processing Systems,2020,33:18661-18673.
[6]YE H,ZHANG N,DENG S,et al.Contrastive triple extraction with generative transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:14257-14265.
[7]GADETSKY A,YAKUBOVSKIY I,VETROV D.ConditionalGenerators of Words Definitions[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2018:266-271.
[8]CHANG T Y,CHI T C,TSAI S C,et al.xSense:Learning sense-separated sparse representations and textual definitions for explainable word sense networks [J].arXiv:1809.03348,2018.
[9]BENDAHMAN N,BRETON J,NICOLAIEFF L,et al.Re-search at SemEval-2022 Task 1:Deep networks for Reverse Dictionary using embeddings and LSTM autoencoders[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:94-100.
[10]LI B,WENG Y,XIA F,et al.LingJing at SemEval-2022 task 1:Multi-task self-supervised pre-training for multilingual reverse dictionary[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:29-35.
[11]CHEN P,ZHAO Z.Edinburgh at SemEval-2022 Task 1:Jointly Fishing for Word Embeddings and Definitions[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(Sem-Eval-2022).2022:75-81.
[12]KORENČIĆD,GRUBISIC I.IRB-NLP at SemEval-2022 Task 1:Exploring the Relationship Between Words and Their Semantic Representations[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:36-59.
[13]KONG C,WANG Y,CHONG R,et al.BLCU-ICALL at SemEval-2022 Task 1:Cross-Attention Multitasking Framework for Definition Modeling[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:23-28.
[14]SRIVASTAVA A,VEMULAPATI H V.TLDR at SemEval-2022 task 1:Using transformers to learn dictionaries and representations[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:60-67.
[15]CERNIAVSKI R,STYMNE S.Uppsala University at SemEval-2022 Task 1:Can Foreign Entries Enhance an English Reverse Dictionary?[C]//Proceedings of the 16th International Workshop on Semantic Evaluation(SemEval-2022).2022:88-93.
[16]LI R,LI Z,HUANG S,et al.TransExplain:Using neural networks to find suitable explanations for Chinese phrases [J].Expert Systems with Applications,2021,183:115440.
[17]CHANG T Y,CHEN Y N.What does this word mean? explaining contextualized embeddings with natural language definition[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJC-NLP).2019:6064-6070.
[18]HAFIDI H,GHOGHO M,CIBLAT P,et al.Negative sampling strategies for contrastive self-supervised learning of graph representations[J].Signal Processing,2022,190:108310.
[19]JIANG R,NGUYEN T,ISHWAR P,et al.Supervised Contrastive Learning with Hard Negative Samples [J].arXiv:2209.00078,2022.
[20]HOPFIELD J J.Neural networks and physical systems withemergent collective computational abilities[J].Proceedings of the National Academy of Sciences,1982,79(8):2554-2558.
[21]AUGUSTYNIAK Ł,KAJDANOWICZ T,KAZIENKO P.Aspect detection using word and char embeddings with(Bi) LSTM and CRF[C]//2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering(AIKE).IEEE,2019:43-50.
[22]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition [J].IEEE,1998.86:2278-2324.
[23]LIN Z,FENG M,DOS SANTOS C,et al.A structured self-attentive sentence embedding[C]//International Conference on Learning Representations.International Conference on Learning Representations.ICLR,2017.
[24]NORASET T,LIANG C,BIRNBAUM L,et al.Definition mo-deling:Learning to define word embeddings in natural language[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2017.
[25]KENTON J D M-W C,TOUTANOVA L K.BERT:Pre-trai-ning of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL-HLT.2019:4171-4186.
[26]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing(EMNLP).2014:1532-1543.
[27]LOSHCHILOV I,HUTTER F.Decoupled Weight Decay Regularization[C]//International Conference on Learning Representations.2017
[28]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners [J].Neural Information Processing Systems,2020,33:1877-1901.
[29]ZENG A,LIU X,DU Z,et al.Glm-130b:An open bilingual pre-trained model [J].arXiv:2210.02414,2022.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!