Computer Science ›› 2023, Vol. 50 ›› Issue (6): 243-250.doi: 10.11896/jsjkx.220400115

• Artificial Intelligence • Previous Articles     Next Articles

Chinese Medical Named Entity Recognition Based on Multi-feature Embedding

HUANG Jiange1, JIA Zhen1,2, ZHANG Fan1,2, LI Tianrui1,2,3   

  1. 1 School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756,China
    2 Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province,Chengdu 611756,China
    3 National Engineering Laboratory of Integrated Transportation Big Data Application Technology,Chengdu 611756,China
  • Received:2022-04-11 Revised:2022-09-15 Online:2023-06-15 Published:2023-06-06
  • About author:HUANG Jiange,born in 1996,postgra-duate,is a member of China Computer Federation.His main research interests include named entity recognition and natural language processing.LI Tianrui,born in 1969,Ph.D,professor,Ph.D supervisor,is a distinguished member of China Computer Federation.His main research interests include big data intelligence,rough sets and granular computing.
  • Supported by:
    National Natural Science Foundation of China(62176221).

Abstract: Aiming at the problems of single embedding information,lacking of word boundary and text structure information in Chinese medical named entity recognition(NER) model based on character representation,this paper presents a medical named entity recognition model integrating multi-feature embedding.Firstly,the characters are mapped to a fixed-length embedding representation.Secondly,external resources are introduced to construct lexical feature,which can supplement the potential phrase information of characters.Thirdly,according to the characteristics of Chinese pictographs and text sequences,character structure feature and sequence structure feature are introduced,respectively.The convolutional neural networks are used to encode the two structural features to obtain radial-level word embedding and sentence-level word embedding.Finally,the obtained multiple feature embeddings are concatenated and input into the long short-term memory network encoding,and the entity result is output by the CRF layer.Taking the self-built Chinese medical data and the CHIP_2020 data as the datasets,experimental results show that compared with the benchmark models,the proposed model integrating both lexical feature and text structure feature can effectivelyidentify named entities in the medical field.

Key words: Named entity recognition, Chinese medical text, Lexical information, Text structure features, Deep learning

CLC Number: 

  • TP391
[1]CHO M,HA J,PARK C,et al.Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition[J].Journal of Biomedical Informatics,2020,103(1):1-8.
[2]WU F Z,LIU J X,WU C H,et al.Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation [C]//Proceedings of the World Wide Web Confe-rence.2019:3342-3348.
[3]YANG J,TENG Z Y,ZHANG M S,et al.Combining discreteand neural features for sequence labeling[C]//International Conference on Intelligent Text Processing and Computational Linguistics.Cham,Switzerland:Springer,2016:140-154.
[4]CUI B W,JIN T,WANG J M.Overview of information extraction of free-text electronic medical records[J].Journal of Computer Applications,2021,41(4):1055-1063.
[5]AZERAF E,MONFRINI E,VIGNON E,et al.Highly fast text segmentation with pairwise markov chains[C]//Proceedings of the 6th IEEE Congress on Information Science and Technology(CIST).NEW YORK:IEEE,2021:361-366.
[6]HARSHITHA C P,SUNITHAR N R.Topic identification for semantic grouping based on hidden markov model[C]//Procee-dings of the 5th International Conference on Communication and Electronics Systems(ICCES).NEW YORK:IEEE,2020:932-937.
[7]SONG S L,ZHANG N,HUANG H T.Named entity recognition based on conditional random fields[J].Cluster Computing,2019,22(3):5195-5206.
[8]GONG L J.ZHANG Z F.Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J].Chinese Journal of Engineering.2020,42(4):469-475.
[9]LIU S,HE T,DAI J.A survey of CRF algorithm based know-ledge extraction of elementary mathematics in Chinese[J].Mobile Networks and Applications,2021,26(5):1891-1903.
[10]DONG C H,ZHANG J J,ZONG C Q,et al.Character-based LSTM-CRF with radical-level features for Chinese named entity recognition [M]//Natural Language Understanding and Intelligent Applications.Cham:Springer,2016:239-250.
[11]LIU F,LU H,LO C,et al.Learning character-level compositio-nality with visual features[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,ACL 2017.Vancouver,2017:2059-2068.
[12]SONG C J,XIONG Y,HUANG W C,et al.Joint self-attention and multi-embeddings for Chinese named entity recognition[C]//Proceedings of the 6th International Conference on Big Data Computing and Communications(BIGCOM).New York:IEEE Press,2020:76-80.
[13]ZHANG Y,YANG J.Chinese NER using Lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Stroudsburg:ACL Press,2018:1554-1564.
[14]MA R T,PENG M N,ZHANG Q,et al.Simplify the usage of lexicon in Chinese NER [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:ACL Press,2020:5951-5960.
[15]LIU W,FU X Y,ZHANG Y,et al.Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).Online:Association for Computational Linguistics,2021:5847-5858.
[16]GRIDACH M.Character-level neural network for biomedicalnamed entity recognition[J].Journal of Biomedical Informatics,2017,70(5):85-91.
[17]YIN M W,MOU C J,XIONG K N,et al.Chinese clinical named entity re-cognition with radical-level feature and self-attention mechanism[J].Journal of Biomedical Informatics,2019,98(9):1-7.
[18]GONG D W,ZHANG Y K,GUO Y N,et al.Named entity re-cognition of Chinese electronic medical records based on multifeatured embedding and attention mechanism[J].Chinese Journal of Engineering,2021,43(9):1190-1196.
[19]LI Y B,WANG X H,HUI L H,et al.Chinese Clinical Named Entity Recognition in Electronic Medical Records:Development of a Lattice Long Short-Term Memory Model with Contextua-lized Character Representations[J].JMIR Medical Informatics,2020,8(9):1-16.
[20]ZHAO Y Q,CHE C ZHANG Q.Chinese medical named entity recognition based on new word discovery and Lattice-LSTM[J].Computer Applications and Software.2021(1):161-165.
[21]WANG X,ZHANG Y,REN X,et al.Cross-type biomedicalnamed entity recognition with deep multi-task learning[J].Bioinformatics,2019,35(10):1745-1752.
[22]HU B,GENG T Y,DENG G,et al.Faster biomedical named entity recognition based on knowledge distillation[J].Journal of Tsinghua University(Science and Technology),2021,61(9):936-942.
[23]PENG Y F,YANG S K,LU Z Y.Transfer learning in biome-dical natural language processing:an evaluation of BERT and ELMo on ten benchmarking datasets[C]//Proceedings of the 18th BioNLP Workshop and Shared Task.Florence:ACL,2019:58-65.
[24]GU Y,TINN R,CHENG H,et al.Domain-specific languagemodel pretraining for biomedical natural language processing[J].ACM Transactions on Computing for Healthcare(HEALTH),2021,3(1):1-23.
[25]WU S,SONG X N,FENG Z H.MECT:multi-metadata embedding based cross-transformer for Chinese named entity recognition[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg:ACL,2021:1529-1539.
[26]YANG J,ZHANG Y,DONG F.Neural word segmentation with rich pretraining[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).Vancouver:ACL,2017:839-849.
[27]MA X Z,HOVY E.End-to-end Sequence labeling via Bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Berlin:ACL Press,2016:1064-1074.
[28]YAN H,DENG B,LI X,et al.TENER:adapting transformer encoder for named entity recognition[J].arXiv:1911.04474,2019.
[29]GUI T,MA R,ZHANG Q,et al.CNN-Based Chinese NER with Lexicon Rethinking[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.San Francisco:Morgan Kaufmann,2019:4982-4988.
[1] GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[2] ZENG Wu, MAO Guojun. Few-shot Learning Method Based on Multi-graph Feature Aggregation [J]. Computer Science, 2023, 50(6A): 220400029-10.
[3] HOU Yanrong, LIU Ruixia, SHU Minglei, CHEN Changfang, SHAN Ke. Review of Research on Denoising Algorithms of ECG Signal [J]. Computer Science, 2023, 50(6A): 220300094-11.
[4] GU Yuhang, HAO Jie, CHEN Bing. Semi-supervised Semantic Segmentation for High-resolution Remote Sensing Images Based on DataFusion [J]. Computer Science, 2023, 50(6A): 220500001-6.
[5] HAN Junling, LI Bo, KANG Xiaodong, YANG Jingyi, LIU Hanqing, WANG Xiaotian. Cardiac MRI Image Segmentation Based on Faster R-CNN and U-net [J]. Computer Science, 2023, 50(6A): 220600047-9.
[6] LIU Haowei, YAO Jingchi, LIU Bo, BI Xiuli, XIAO Bin. Two-stage Method for Restoration of Heritage Images Based on Muti-scale Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220600129-8.
[7] XIE Puxuan, CUI Jinrong, ZHAO Min. Electiric Bike Helment Wearing Detection Alogrithm Based on Improved YOLOv5 [J]. Computer Science, 2023, 50(6A): 220500005-6.
[8] WAN Haibo, JIANG Lei, WANG Xiao. Real-time Detection of Motorcycle Lanes Based on Deep Learning [J]. Computer Science, 2023, 50(6A): 220200066-5.
[9] WANG Xiaotian, LI Bo, KANG Xiaodong, LIU Hanqing, HAN Junling, YANG Jingyi. Study on Phased Target Detection in CT Image [J]. Computer Science, 2023, 50(6A): 220200063-10.
[10] ZHANG Jian, ZHANG Ye. College Students Employment Dynamic Prediction of Multi-feature Fusion Based on GRU-LSTM [J]. Computer Science, 2023, 50(6A): 220500056-6.
[11] LIANG Mingxuan, WANG Shi, ZHU Junwu, LI Yang, GAO Xiang, JIAO Zhixiang. Survey of Knowledge-enhanced Natural Language Generation Research [J]. Computer Science, 2023, 50(6A): 220200120-8.
[12] WANG Dongli, YANG Shan, OUYANG Wanli, LI Baopu, ZHOU Yan. Explainability of Artificial Intelligence:Development and Application [J]. Computer Science, 2023, 50(6A): 220600212-7.
[13] GAO Xiang, WANG Shi, ZHU Junwu, LIANG Mingxuan, LI Yang, JIAO Zhixiang. Overview of Named Entity Recognition Tasks [J]. Computer Science, 2023, 50(6A): 220200119-8.
[14] LI Yang, WANG Shi, ZHU Junwu, LIANG Mingxuan, GAO Xiang, JIAO Zhixiang. Summarization of Aspect-level Sentiment Analysis [J]. Computer Science, 2023, 50(6A): 220400077-7.
[15] LI Yang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, GAO Xiang. Aspect-based Sentiment Analysis Based on Prompt and Knowledge Enhancement [J]. Computer Science, 2023, 50(6A): 220300279-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!