计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 243-250.doi: 10.11896/jsjkx.220400115
黄健格1, 贾真1,2, 张凡1,2, 李天瑞1,2,3
HUANG Jiange1, JIA Zhen1,2, ZHANG Fan1,2, LI Tianrui1,2,3
摘要: 针对基于字符表示的中文医学命名实体识别模型嵌入信息单一、缺失词边界和结构信息的问题,文中提出了一种融合多特征嵌入的医学命名实体识别模型。首先,将字符映射为固定长度的嵌入表示;其次,引入外部资源构建词汇特征,该特征能够补充字符的潜在词组信息;然后,根据中文的象形文字特点和文本序列特点,分别引入字符结构特征和序列结构特征,使用卷积神经网络对两种结构特征进行编码,得到radical-level词嵌入和sentence-level词嵌入;最后,将得到的多种特征嵌入进行拼接,输入长短期记忆网络编码,并使用条件随机场输出实体预测结果。将自建中文医疗数据和CHIP_2020任务提供的医疗数据作为数据集进行实验,实验结果表明,与基准模型相比,所提模型同时融合了词汇特征和文本结构特征,能够有效识别医学命名实体。
中图分类号:
[1]CHO M,HA J,PARK C,et al.Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition[J].Journal of Biomedical Informatics,2020,103(1):1-8. [2]WU F Z,LIU J X,WU C H,et al.Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation [C]//Proceedings of the World Wide Web Confe-rence.2019:3342-3348. [3]YANG J,TENG Z Y,ZHANG M S,et al.Combining discreteand neural features for sequence labeling[C]//International Conference on Intelligent Text Processing and Computational Linguistics.Cham,Switzerland:Springer,2016:140-154. [4]CUI B W,JIN T,WANG J M.Overview of information extraction of free-text electronic medical records[J].Journal of Computer Applications,2021,41(4):1055-1063. [5]AZERAF E,MONFRINI E,VIGNON E,et al.Highly fast text segmentation with pairwise markov chains[C]//Proceedings of the 6th IEEE Congress on Information Science and Technology(CIST).NEW YORK:IEEE,2021:361-366. [6]HARSHITHA C P,SUNITHAR N R.Topic identification for semantic grouping based on hidden markov model[C]//Procee-dings of the 5th International Conference on Communication and Electronics Systems(ICCES).NEW YORK:IEEE,2020:932-937. [7]SONG S L,ZHANG N,HUANG H T.Named entity recognition based on conditional random fields[J].Cluster Computing,2019,22(3):5195-5206. [8]GONG L J.ZHANG Z F.Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J].Chinese Journal of Engineering.2020,42(4):469-475. [9]LIU S,HE T,DAI J.A survey of CRF algorithm based know-ledge extraction of elementary mathematics in Chinese[J].Mobile Networks and Applications,2021,26(5):1891-1903. [10]DONG C H,ZHANG J J,ZONG C Q,et al.Character-based LSTM-CRF with radical-level features for Chinese named entity recognition [M]//Natural Language Understanding and Intelligent Applications.Cham:Springer,2016:239-250. [11]LIU F,LU H,LO C,et al.Learning character-level compositio-nality with visual features[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,ACL 2017.Vancouver,2017:2059-2068. [12]SONG C J,XIONG Y,HUANG W C,et al.Joint self-attention and multi-embeddings for Chinese named entity recognition[C]//Proceedings of the 6th International Conference on Big Data Computing and Communications(BIGCOM).New York:IEEE Press,2020:76-80. [13]ZHANG Y,YANG J.Chinese NER using Lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Stroudsburg:ACL Press,2018:1554-1564. [14]MA R T,PENG M N,ZHANG Q,et al.Simplify the usage of lexicon in Chinese NER [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:ACL Press,2020:5951-5960. [15]LIU W,FU X Y,ZHANG Y,et al.Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).Online:Association for Computational Linguistics,2021:5847-5858. [16]GRIDACH M.Character-level neural network for biomedicalnamed entity recognition[J].Journal of Biomedical Informatics,2017,70(5):85-91. [17]YIN M W,MOU C J,XIONG K N,et al.Chinese clinical named entity re-cognition with radical-level feature and self-attention mechanism[J].Journal of Biomedical Informatics,2019,98(9):1-7. [18]GONG D W,ZHANG Y K,GUO Y N,et al.Named entity re-cognition of Chinese electronic medical records based on multifeatured embedding and attention mechanism[J].Chinese Journal of Engineering,2021,43(9):1190-1196. [19]LI Y B,WANG X H,HUI L H,et al.Chinese Clinical Named Entity Recognition in Electronic Medical Records:Development of a Lattice Long Short-Term Memory Model with Contextua-lized Character Representations[J].JMIR Medical Informatics,2020,8(9):1-16. [20]ZHAO Y Q,CHE C ZHANG Q.Chinese medical named entity recognition based on new word discovery and Lattice-LSTM[J].Computer Applications and Software.2021(1):161-165. [21]WANG X,ZHANG Y,REN X,et al.Cross-type biomedicalnamed entity recognition with deep multi-task learning[J].Bioinformatics,2019,35(10):1745-1752. [22]HU B,GENG T Y,DENG G,et al.Faster biomedical named entity recognition based on knowledge distillation[J].Journal of Tsinghua University(Science and Technology),2021,61(9):936-942. [23]PENG Y F,YANG S K,LU Z Y.Transfer learning in biome-dical natural language processing:an evaluation of BERT and ELMo on ten benchmarking datasets[C]//Proceedings of the 18th BioNLP Workshop and Shared Task.Florence:ACL,2019:58-65. [24]GU Y,TINN R,CHENG H,et al.Domain-specific languagemodel pretraining for biomedical natural language processing[J].ACM Transactions on Computing for Healthcare(HEALTH),2021,3(1):1-23. [25]WU S,SONG X N,FENG Z H.MECT:multi-metadata embedding based cross-transformer for Chinese named entity recognition[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg:ACL,2021:1529-1539. [26]YANG J,ZHANG Y,DONG F.Neural word segmentation with rich pretraining[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Vancouver:ACL,2017:839-849. [27]MA X Z,HOVY E.End-to-end Sequence labeling via Bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:ACL Press,2016:1064-1074. [28]YAN H,DENG B,LI X,et al.TENER:adapting transformer encoder for named entity recognition[J].arXiv:1911.04474,2019. [29]GUI T,MA R,ZHANG Q,et al.CNN-Based Chinese NER with Lexicon Rethinking[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.San Francisco:Morgan Kaufmann,2019:4982-4988. |
|