计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 416-420.doi: 10.11896/jsjkx.200200020
杜琳1, 曹东1, 林树元2, 瞿溢谦2, 叶辉1
DU Lin1, CAO Dong1, LIN Shu-yuan2, QU Yi-qian2, YE Hui1
摘要: 中医逐渐成为热点,中医病历文本中包含着巨大而宝贵的医疗信息。而在中医病历文本挖掘和利用方面,一直面临中医病历文本利用率低、抽取有效信息并对信息文本进行分类的难度大的问题。针对这一问题,研究一种对中医病历文本的提取与自动分类的方法具有很大的临床价值。文中尝试提出一种基于BERT+Bi-LSTM+Attention融合的病历短文本分类模型。使用BERT预处理获取短文本向量作为模型输入,对比BERT与word2vec模型的预训练效果,对比Bi-LSTM+Attention和LSTM模型的效果。实验结果表明,BERT+Bi-LSTM+Attention融合模型在中医病历文本的提取和分类方面达到了最高的AverageF1值(即89.52%)。通过对比发现,BERT较word2vec模型的预训练效果有显著的提升,且Bi-LSTM+Attention模型较LSTM模型的效果有显著的提升,因此提出的BERT+Bi-LSTM+Attention融合模型在病历文本抽取与分类上有一定的医学价值。
中图分类号:
[1] ZHOU Y.Research on medical text analysis mining technology based on machine learning [D].Beijing:Beijing Jiaotong University,2019. [2] SUN C A,DING Y,TIAN G.Analysis of emotional tendency of neural network based on GLU-CNN and attention-bilstm [J].Shandong University of Science and Technology,2019,40(7):62-66. [3] YANG P,YANG Z H,LUO L,et al.Recognition of chemical drug named entity based on attention mechanism [J].Computer Research and Development,2008,55(7):1548-1556. [4] WANG Y,WANG M X,ZHANG S,et al.BERT based alerttext named entity recognition [J/OL].Computer application:1-7 [2019-11-21].https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2020&filename=JSJY202002040&v=p5eBltYjMg96L0PeNbfh3J3eRmFBqc7Bb8ovNkpxL0WtdeRGGLxEJhgL4xxvx4DQ. [5] YANG P,DONG W Y.Recognition method of Chinese namedentities based on BERT embedding [J/OL].Computer Engineering:1-7.[2019-11-21].https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2020&filename=JSJC202004006&v=fpDLQvPDFGf6wfMKb3vnBnPPBJfHHDXcSaF%25mmd2Bu59DcVlTutrMRBjr1z9Ri0PG2Gqa. [6] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of DeepBidirectional Transformers for Language Understanding[J].CS.CL,2018:4171-4186. [7] JIANG Y.Research on Chinese medicine text mining for TCM prescription compatibility [D].University of Electronic Science and Technology of China,2019. [8] YAO L,JIN Z,MAO C S,et al.Traditional Chinese medicine clinical records classification with BERT and domain specific corpora[J].Journal of the American Medical Informatics Association:JAMIA,2019,26(12). [9] GUO X P.On constructing electronic medical records confor-ming to the characteristicsof traditional Chinese medicine [J].Journal of Traditional Chinese Medicine Management,2009,17(5):469-470. [10] BIN Y.Intelligent Judicial Research Based on BERT SentenceEmbedding and Multi-Level Attention CNNs[C]//International Informatization and Engineering Associations:Computer Science and Electronic Technology International Society.2019:7. [11] ULLAHA,AHMAD J,MUHAMMAD K,et al.Action Recognition in Video Sequences using Deep Bi-directional LSTM with CNN Features[J].IEEE Access,2017,PP(99):1-1. [12] HU T T,FENG Y Q,SHEN L J,et al.Selection of main features of LSTM speech emotion based on attention mechanism [J].Acoustic Technology,2019,38(4):414-421. |
[1] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[2] | 于家畦, 康晓东, 白程程, 刘汉卿. 一种新的中文电子病历文本检索模型 New Text Retrieval Model of Chinese Electronic Medical Records 计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198 |
[3] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[4] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
[5] | 余本功, 张子薇, 王惠灵. 一种融合多层次情感和主题信息的TS-AC-EWM在线商品排序方法 TS-AC-EWM Online Product Ranking Method Based on Multi-level Emotion and Topic Information 计算机科学, 2022, 49(6A): 165-171. https://doi.org/10.11896/jsjkx.210400238 |
[6] | 王杉, 徐楚怡, 师春香, 张瑛. 基于CNN-LSTM的卫星云图云分类方法研究 Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM 计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177 |
[7] | 郭雨欣, 陈秀宏. 融合BERT词嵌入表示和主题信息增强的自动摘要模型 Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement 计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101 |
[8] | 袁景凌, 丁远远, 盛德明, 李琳. 基于视觉方面注意力的图像文本情感分析模型 Image-Text Sentiment Analysis Model Based on Visual Aspect Attention 计算机科学, 2022, 49(1): 219-224. https://doi.org/10.11896/jsjkx.201000074 |
[9] | 程思伟, 葛唯益, 王羽, 徐建. BGCN:基于BERT和图卷积网络的触发词检测 BGCN:Trigger Detection Based on BERT and Graph Convolution Network 计算机科学, 2021, 48(7): 292-298. https://doi.org/10.11896/jsjkx.200500133 |
[10] | 胡聿文. 基于优化LSTM模型的股票预测 Stock Forecast Based on Optimized LSTM Model 计算机科学, 2021, 48(6A): 151-157. https://doi.org/10.11896/jsjkx.200400011 |
[11] | 陈慧琴, 郭贯成, 秦朝轩, 李兆碧. 基于GM-LSTM模型的南京市老年人口预测研究 Research on Elderly Population Prediction Based on GM-LSTM Model in Nanjing City 计算机科学, 2021, 48(6A): 231-234. https://doi.org/10.11896/jsjkx.200900142 |
[12] | 俞建业, 戚湧, 王宝茁. 基于Spark的车联网分布式组合深度学习入侵检测方法 Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark 计算机科学, 2021, 48(6A): 518-523. https://doi.org/10.11896/jsjkx.200700129 |
[13] | 张争万, 吴迪, 张春炯. 基于多通道稀疏LSTM的蜂窝流量预测研究 Study of Cellular Traffic Prediction Based on Multi-channel Sparse LSTM 计算机科学, 2021, 48(6): 296-300. https://doi.org/10.11896/jsjkx.210400134 |
[14] | 董哲, 邵若琦, 陈玉梁, 翟维枫. 基于BERT和对抗训练的食品领域命名实体识别 Named Entity Recognition in Food Field Based on BERT and Adversarial Training 计算机科学, 2021, 48(5): 247-253. https://doi.org/10.11896/jsjkx.200800181 |
[15] | 李冰荣, 皮德常, 候梦如. 基于CNN和LSTM的移动对象目的地预测 Destination Prediction of Moving Objects Based on Convolutional Neural Networks and Long-Short Term Memory 计算机科学, 2021, 48(4): 70-77. https://doi.org/10.11896/jsjkx.200200024 |
|