计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 223-229.doi: 10.11896/jsjkx.220900108

• 人工智能 • 上一篇    下一篇

基于提示学习的生物医学关系抽取方法

文坤建, 陈艳平, 黄瑞章, 秦永彬   

  1. 贵州大学公共大数据国家重点实验室 贵阳550025
    贵州大学计算机科学与技术学院 贵阳550025
  • 收稿日期:2022-09-12 修回日期:2022-12-07 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 陈艳平(ypench@gmail.com)
  • 作者简介:(gs.kjwen21@gzu.edu.cn)
  • 基金资助:
    国家自然科学基金(62166007)

Biomedical Relationship Extraction Method Based on Prompt Learning

WEN Kunjian, CHEN Yanping, HUANG Ruizhang, QIN Yongbin   

  1. State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China
    College of Computer Science and Technology,Guizhou University,Guiyang 550025,China
  • Received:2022-09-12 Revised:2022-12-07 Online:2023-10-10 Published:2023-10-10
  • About author:WEN Kunjian, born in 1998, postgra-duate.His main research interests include biological information extraction and so on. CHEN Yanping, born in 1980,Ph. D,associate professor.His main research interests include artificial intelligence and natural language processing.
  • Supported by:
    National Natural Science Foundation of China(62166007).

摘要: 在非结构化生物医学文本数据中提取出实体之间的关系,对生物医学的信息化发展有着重大意义,同时也是自然语言处理领域的研究热点。目前,在生物医学数据中正确地提取出实体间的关系面临着两个难点:1)由于在生物医学数据中实体单词大多由复合词、未知词组成,模型难以学习到实体内部的语义特征;2)由于生物医学带标注数据较少,而神经网络的参数量较大,使得神经网络容易过拟合。因此,文中提出了基于提示学习的生物医学关系抽取方法,增加了一种针对实体的注解标签,来对实体进行提示以达到实体语义增强以及联系上下文信息的目的。此外,在传统提示调优方法的基础上,文中使用连续性模板来缓解人工设计模板所带来的性能偏差,同时结合深度前缀控制attention的深度提示能力,使模型在处理较少数据的情况时仍能取得良好的效果。

关键词: 关系抽取, 生物信息抽取, 提示调优

Abstract: Extracting the relationship between entities from unstructured biomedical text data is of great significance for the development of biomedical informatization.At the same time,it is also a research hotspot in the field of natural language processing.At present,there are two difficulties in correctly extracting the relationship between entities in biomedical data.One is that in biomedicine,entity words are mostly composed of compound words and unknown words,which makes it difficult for the model to learn the semantic features inside the entity.Second,because there are few biomedical band labeling data and the amount of parameters of neural network is large,the neural network is prone to overfitting.Therefore,a biomedical relationship extraction method based on prompt learning is proposed in this paper.In this paper,an annotation label for entities is added to prompt entities to enhance entity semantics and contact context information.In addition,based on the traditional prompt optimization me-thod,this paper uses the continuity template to alleviate the performance deviation caused by the manual design of the template.At the same time,combined with the depth prefix to control the depth prompt ability of attention,the model can still achieve good results when dealing with a small amount of data.

Key words: Relation extraction, Biological information extraction, Prompt-tuning

中图分类号: 

  • TP391
[1]WEXLER P.The U.S. National Library of Medicine's Toxico-logy and Environmental Health Information Program[J].Toxicology,2004,198(1/2/3):161-168.
[2]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[3]KIM Y.Convolutional neural networks for sentence classification[J].arXiv:1408.5882,2014.
[4]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequencelearning with neural networks[J/OL].Advances in Neural Information Processing Systems,2014,27.https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html.
[5]BELKIN M,HSU D,MA S,et al.Reconciling modern machine-learning practice and the classical bias-variance trade-off[J].Proceedings of the National Academy of Sciences,2019,116(32):15849-15854.
[6]SCHICK T,SCHÜTZE H.Exploiting cloze questions for few shot text classification and natural language inference[J].ar-Xiv:2001.07676,2020.
[7]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[8]BLASCHKE C,ANDRADE M A,OUZOUNIS C A,et al.Automatic extraction of biological information from scientific text:protein-protein interactions[C]//ISMB.1999,7:60-67.
[9]ONO T,HISHIGAKI H,TANIGAMI A,et al.Automated extraction of information on protein-protein interactions from thebiological literature[J].Bioinformatics,2001,17(2):155-161.
[10]KAMBHATLA N.Combining lexical,syntactic,and semanticfeatures with maximum entropy models for information extraction[C]//Proceedings of the ACL Interactive Poster and De-monstration Sessions.2004:178-181.
[11]BUNESCU R C,MOONEY R J.A shortest path dependency kernel for relation extraction[C]//Proceedings of the Confe-rence on Human Language Technology and Empirical Methods in Natural Language Processing.2005:724-731.
[12]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[13]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J/OL].Advances in Neural Information Processing Systems,2017,30.https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
[14]PETERS M E,NEUMANN M,IYYER M,et al.Deep contex-tualized word representations[J].arXiv:1802.05365,2018.
[15]LI Y,CHEN Y,QIN Y,et al.Protein-protein interaction relation extraction based on multigranularity semantic fusion[J].Journal of Biomedical Informatics,2021,123:1532-0464.
[16]HAN X,ZHAO W,DING N,et al.Ptr:Prompt tuning withrules for text classification[J].arXiv:2105.11259,2021.
[17]GAO T,FISCH A,CHEN D.Making pre-trained language mo-dels better few-shot learners[J].arXiv:2012.15723,2020.
[18]LIU X,JI K,FU Y,et al.P-Tuning v2:Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks[J].arXiv:2110.07602,2021.
[19]LIU X,ZHENG Y,DU Z,et al.GPT understands,too[J].ar-Xiv:2103.10385,2021.
[20]DING J,BERLEANT D,NETTLETON D,et al.Mining MEDLINE:abstracts,sentences,or phrases?[M]//Biocomputing 2002.2001:326-337.
[21]FUNDEL K,KÜFFNER R,ZIMMER R.RelEx—Relation extraction using dependency parse trees[J].Bioinformatics,2007,23(3):365-371.
[22]NÉDELLEC C.Learning language in logic-genic interaction extraction challenge[C]//4.Learning Language in Logic Workshop(LLL05).ACM-Association for Computing Machinery,2005.
[23]LEE J,YOON W,KIM S,et al.BioBERT:a pre-trained biome-dical language representation model for biomedical text mining[J].Bioinformatics,2020,36(4):1234-1240.
[24]ZHANG H,GUAN R,ZHOU F,et al.Deep residual convolutional neural network for protein-protein interaction extraction[J].IEEE Access,2019,7:89354-89365.
[25]AHMED M,ISLAM J,SAMEE M R,et al.Identifying protein-protein interaction using tree lstm and structured attention[C]//2019 IEEE 13th International Conference on Semantic Computing(ICSC).IEEE,2019:224-231.
[26]ZHANG Y,LIN H,YANG Z,et al.A hybrid model based on neural networks for biomedical relation extraction[J].Journal of Biomedical Informatics,2018,81:83-92.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!