计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230900030-6.doi: 10.11896/jsjkx.230900030

• 人工智能 • 上一篇    下一篇

融合BERT模型与词汇增强的中医命名实体识别模型

李旻哲, 殷继彬   

  1. 昆明理工大学信息工程与自动化学院 昆明 650500
  • 发布日期:2024-06-06
  • 通讯作者: 殷继彬(41868028@qq.com)
  • 作者简介:(597085899@qq.com)

TCM Named Entity Recognition Model Combining BERT Model and Lexical Enhancement

LI Minzhe, YIN Jibin   

  1. Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
  • Published:2024-06-06
  • About author:LI Minzhe,bornin 1997,postgraduate.His main research interests include deep learning and natural language processing.
    YIN Jibin,born in 1976,Ph.D,associate professor.His main research interests include human-computer interaction and artificial intelligence.

摘要: 现有的中医命名实体识别相关研究较少,基本都是基于中文病例做相关研究,在传统中医编写的病例文本中表现不佳。针对中医案例中命名实体密集且边界模糊难以划分的特点,提出了一种融合词汇增强和预训练模型的中医命名实体识别方法LEBERT-BILSTM-CRF。该方法从词汇增强和预训练模型融合的角度进行优化,将词汇信息输入到BERT模型中进行特征学习,达到划分词类边界和区分词类属性的目的,提高中医医案命名实体识别的精度。实验结果表明,在文中构建的中医病例数据集上针对10个实体进行命名实体识别时,提出的基于LEBERT-BILSTM-CRF的中医案例命名实体识别模型综合准确率、召回率、F1分别为88.69%,87.4%,88.1%,高于BERT-CRF,LEBERT-CRF等常用命名实体识别模型。

关键词: 自然语言处理, 中医案例, 词汇增强, BERT, BLSTM-CRF

Abstract: There are few researches on TCM named entity recognition,and most of them are based on Chinese medical cases,and they do not perform well in TCM case texts.Aiming at the characteristics of dense named entities and fuzzy boundary in TCM cases,this paper proposes a method of TCM named entity recognition,LEBERT-BILSTM-CRF,which combines lexical enhancement and pre-training model.This method is optimized from the perspective of the fusion of vocabulary enhancement and pre-training model,and the vocabulary information is input into the BERT model for feature learning,so as to achieve the purpose of dividing word class boundaries and distinguishing word class attributes,and improve the accuracy of TCM medical case named entity recognition.Experiments show that when ten entities are identified on the TCM case data set constructed in this paper,the comprehensive accuracy rate,recall rate and F1 of the TCM case named entity recognition model based on LEBERT-BILSTM-CRF is 88.69%,87.4% and 88.1%,respectively.It is higher than common named entity recognition models such as BERT-CRF and LEBERT-CRF.

Key words: Natural language processing, Chinese medicine case, Vocabulary enhancement, BERT, BiLSTM-CRF

中图分类号: 

  • TP391
[1]JI T,SU S L,SHANG E X,et al.Determining the rules of traditional Chinese medicine on treatment of consumptive thirst based on association rules mining[J].China Journal of Traditional Chinese Medicine and Pharmacy,2016,31(12):4982-4986.
[2]XU Z H.Statistical Model based Chinese Named Entity Recognition Methods and its Application to Medical Records[D].Beijing:Beijing University of Chemical Technology,2017.
[3]GAO J Y,LIU Z,YANG T,et al.Research on Named Entity Extraction of TCM Clinical Medical Records Symptoms Based on Conditional Random Field[J].Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology,2020,22(6):1947-1954.
[4]ZHAO Z H,YANG Z H,LUO L,et al.Disease named entity recognition from biomedical literature using a novel convolu-tional neural network[J].BMC Medical Genomics,2017,10(S5):73.
[5]CAO C P,GUAN J P.Clinical text named entity recognition based on E-CNN and BLSTM -CRF[J].Application Research of Computers,2019,36(12):3748-3751.
[6]GUO X R,LUO P,WANG W L.Chinese named entity recognition based on Transformer encoder[J].Journal of Jilin University(Engineering and Technology Edition),2021,51(3):989-995.
[7]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2018.
[8]YAN H,DENG B,LI X,et al.TENER:Adapting Transformer Encoder for Named Entity Recognition[J].arXiv:1911.04474,2019.
[9]LASRI K,LENCI A,POIBEAU T.Does BERT really agree?Fine-grained Analysis of Lexical Dependence on a Syntactic Task[J].arXiv:2204.06889,2022.
[10]MA R,PENG M,ZHANG Q,et al.Simplify the Usage of Lexicon in Chinese NER[C]//Proceedings of the 58th ANNUAL Meeting of the Association for Computational Linguistics.2020:5951-5960.
[11]LIU W,FU X,ZHANG Y,et al.Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.2021:5847-5858.
[12]REN Y,YU H,YANG H,et al.Recognition of quantitative indicator of fishery standard using attention mechanism and the BERT+BiLSTM+CRF model[J].Transactions of the Chinese Society of Agricultural Engineering,2021,37(10):135-141.
[13]LIU J G,XIA C H.Innovative deep neural network modeling for fine-grained Chinese entity recognition[J].Electronics,2020,9(6):1001.
[14]LIAO X F,XIE S S.Chinese Named Entity Recognition Based on Attention Mechanism Feature Fusion[J].Computer Engineering,2023,49(4):256-262.
[15]LAN Z,CHEN M,GOODMAN S,et al.ALBERT:A Lite BERT for Self-supervised Learning of Language Representations[J].arXiv:1909.11942,2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!