Computer Science ›› 2020, Vol. 47 ›› Issue (3): 211-216.doi: 10.11896/jsjkx.190200259

Clinical Electronic Medical Record Named Entity Recognition Incorporating Language Model and Attention Mechanism

TANG Guo-qiang,GAO Da-qi,RUAN Tong,YE Qi,WANG Qi   

  1. (School of information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China)
  • Received:2019-02-01 Online:2020-03-15 Published:2020-03-30
  • About author:TANG Guo-qiang,born in 1993,master.His main research interests include nature language processing and so on. GAO Da-qi,born in 1977,Ph.D,professor.His main research interests include pattern recognition and machine lear-ning.
  • Supported by:
    This work was supported by the National Key R&D Program of China (2018YFC0910500).

Abstract: Clinical Named Entity Recognition (CNER) aims to identify and classify named entity such as diseases,symptoms,exams, electronic health records,which is a fundamental and crucial task for clinical and translational research.The task is regarded as a sequence labeling problem.In recent years,deep neural network methods achieve significant success in named entity recognition.However,most of these algorithms do not take full advantages of the large amount of unlabeled data,and ignore the further features from the text.This paper proposed a model which combines language model and multi-head attention.First,chara-cter embeddings and a language model are trained from unlabeled clinical texts.Then,the labeling model are trained from labeled clinical texts.In specific use,the vector representation of the sentence is sent to a BiGRU and a pre-trained language model.This paper further concatenate the output of BiGRU and the features of language model.Afterwards,the outputs are fed to another BiGRU and multi-head attention module.Finally,a CRF layer is employed to predict the label sequence.Experimental results show that the proposed method which takes advantages of language model from the text and multi-head attention mechanism gets 91.34% of F1-score on CCKS-2017 Task2 benchmark dataset.

Key words: Clinical named entity recognition, Deep neural network, GRU, Language model, Multi-head attention

CLC Number: 

  • TP391
