计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 230900157-7.doi: 10.11896/jsjkx.230900157

• 智能计算 • 上一篇    下一篇

基于深度学习的细粒度医学知识图谱构建

王钰涵1, 马涪元2, 王英3   

  1. 1 吉林大学软件学院 长春 130012
    2 吉林大学人工智能学院 长春 130012
    3 符号计算与知识工程教育部重点实验室(吉林大学) 长春 130012
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 王英(wangying2010@jlu.edu.cn)
  • 作者简介:(yuhanw23@mails.jlu.edu.cn)
  • 基金资助:
    国家自然科学基金(62272191);吉林省科技厅重点研发项目(20220201153GX)

Construction of Fine-grained Medical Knowledge Graph Based on Deep Learning

WANG Yuhan1, MA Fuyuan2, WANG Ying3   

  1. 1 College of Software,Jilin University,Changchun 130012,China
    2 College of Artificial Intelligence,Jilin University,Changchun 130012,China
    3 Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education,Jilin University,Changchun 130012,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:WANG Yuhan,born in 2001,postgra-duate.Her main research interests include machine learning and deep lear-ning.
    WANG Ying,born in 1981,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.183695).Her main research interests include machine learning,social networks,data mining,and search engines.
  • Supported by:
    National Natural Science Foundation of China(62272191) and Science and Technology Development Program of Jilin Province(20220201153GX).

摘要: 医疗知识图谱作为整合海量医疗信息的有力工具,正被广泛应用于临床决策支持系统、医疗问答系统等便民平台。目前,大规模医疗知识图谱层出不穷,但大多都将注意力放在实体数量的扩充,而忽略了实体种类的细粒度化。医疗术语具有冗长且难以理解的特点,因此构建细粒度化的知识图谱可以在很大程度上提高知识图谱便民系统的实用性,并为问答系统提供更具有针对性的诊断说明。文中针对垂直网站爬取的大规模医疗知识库,以实现医疗长文本细粒度化为目标,运用BiLSTM从长句子的两个方向为每个词语建模完整上下文信息,同时引入预训练模型BERT加强对词语上下文语义的建模,并结合CRF模型学习状态转移矩阵维持标签序列的一致性,高效识别长句中的实体,并通过实体对齐和属性填充构建细粒度医疗知识图谱。医疗实体细粒度化任务的对比实验表明,BERT+BiLSTM+CRF模型的效果优于其他模型,可视化结果也说明了所提方法进行细粒度化的有效性。

关键词: 知识图谱, BiLSTM, CRF, 细粒度

Abstract: As a powerful tool for integrating massive medical information,medical knowledge graphs are being widely evaluated on convenient platforms such as clinical decision support systems and medical question and answer systems.At present,large-scale medical knowledge graphs are emerging one after another,but most of them focus on the supplement of the number of entities.Medical terminology is lengthy and difficult to understand.Therefore,building a fine-grained knowledge graph can make the knowledge graph convenient for the system to a large extent.practicality and provide more crown diagnostic instructions for the question and answer system.This paper targets the large-scale medical knowledge base crawled by vertical websites,with the goal of achieving fine-grained medical long texts.BiLSTM is used to model complete contextual information for each word from both directions of the long sentence.At the same time,we introduce the pre-training model BERT to enhance the modeling of word context semantics and combined with the CRF model learning status.The incremental matrix maintains the consistency of the label sequence,efficiently identifies entities in long sentences,and builds a fine-grained medical knowledge graph through entity alignment and attribute filling.Comparative experiments on the fine-grained task of medical entities demonstrate that the BERT+BiLSTM+CRF model is better than other models,and the visualization results also illustrate the fine-grained effect of this method.

Key words: Knowledge graph, BiLSTM, CRF, Fine-grained

中图分类号: 

  • TP391
[1]SINGHAL A.Introducing the knowledge graph:things,notstrings,May 2012[OL].http://googleblog.blogspot.ie/2012/05/introducing-knowledgegraph-things-not.html,2012.
[2]MINGYU C,QINGQING L,ZHIHAO Y,et al.A Question Answering System for Primary Liver Cancer Based on KnowledgeGraph [J].Journal of Chinese Information Processing,2019,33(6):88-93.
[3]ZHAO C,JIANG J,GUAN Y,et al.EMR-based medical know-ledge representation and inference via Markov random fields and distributed representation learning[J].Artificial Intelligence in Medicine,2018,87:49-59.
[4]ZHENG Z,LIU Y,ZHANG Y,et al.TCMKG:A deep learning based traditional Chinese medicine knowledge graph platform[C]//2020 IEEE International Conference on Knowledge Graph(ICKG).IEEE,2020:560-564.
[5]ROTMENSCH M,HALPERN Y,TLIMAT A,et al.Learning a health knowledge graph from electronic medical records[J].Scientific Reports,2017,7(1):1-11.
[6]ODMAA B,YUNFEI Y,ZHIFANG S,et al.Preliminary Study on the Construction of Chinese Medical Knowledge Graph [J].Journal of Chinese Information Processing,2019,33(10):1-9.
[7]LIU X,WEI F,ZHANG S,et al.Named entity recognition for tweets[J].ACM Transactions on Intelligent Systems and Technology(TIST),2013,4(1):1-15.
[8]QIU J,ZHOU Y,WANG Q,et al.Chinese clinical named entity recognition using residual dilated convolutional neural network with conditional random field[J].IEEE Transactions on NanoBioscience,2019,12:306-315.
[9]LEHMANN J,ISELE R,JAKOB M,et al.DBpedia:a large-scale,multilingual knowledge base extracted from wikipedia [J].Semantic Web,2015,6(2):167-195.
[10]BOLLACKER K D,COOK R P,TUFTS P.Freebase:a shared database of structured general human knowledge[C]//Procee-dings of the 22nd AAAI Conference on Artificial Intelligence,Vancouver.Menlo Park:AAAI,2007:1962-1963.
[11]SUCHANEK F M,KASNECI G,WEIKUM G.Yago:a largeontology from Wikipedia and WordNet[J].Journal of Web Semantics,2008,6(3):203-217.
[12]XU B,LIANG J,XIE C,et al.CN-DBpedia2:an extraction and verification framework for enriching Chinese encyclopedia knowledge base[J].Data Intelligence,2019,1(3):271-288.
[13]CHEN H J,HU N,QI G L,et al.OpenKG chain:a blockchain infrastructure for open knowledge graphs[J].Data Intelligence,2021,3(2):205-227.
[14]JIXIANG Z,XIANGSEN Z,CHANGXU W,et al.Survey ofKnowledge Graph Construction Techniques [J].Computer Engineering,2022,48(3):23-37.
[15]RAU L F.Extracting company names from text[C]//Procee-dings the Seventh IEEE Conference on Artificial Intelligence Application.IEEE Computer Society,1991:29-32.
[16]SCHUTZ A,BUITELAAR P.RelExt:a tool for relation extraction from text in ontology extension[C]//Proceedings of the 4th International Semantic Web Conference.Berlin,Germany:Springer,2005:593-606.
[17]QUIMBAYA A P,MÚNERA A S,RIVERA R A G,et al.Named entity recognition over electronic health records through a combined dictionary-based approach[J].Procedia Computer Science,2016,100:55-61.
[18]WANG H,ZHANG W,ZENG Q,et al.Extracting important information from Chinese Operation Notes with natural language processing methods[J].Journal of Biomedical Informatics,2014,48:130-136.
[19]PONOMAREVA N,PLA F,MOLINAA,et al.Biomedical namedentity recognition:apoorknowledge HMM-based approach[C]//Natural Language Processing and Information Systems:12th International Conference on Applications of Natural Language to Information Systems(NLDB 2007).Paris,France,Springer Berlin Heidelberg,2007:382-387.
[20]XU R,LI L,WANG Q Q.dRiskKB:a large-scale disease-disease risk relationship knowledge base constructed from biomedical text[J].BMC Bioinformatics,2014,15(1):1-13.
[21]SUI M S,CUI L.Extracting chemical and disease named entitieswith multiple-feature CRF model[J].New Technology of Library and Information Service,2016(10):91-97.
[22]FAN YY,LI Z M.Research and application progress of Chinese medical knowledge garph[J].Journal of Frontiers of Computer Science and Technology,2022,16(10):2219-2233.
[23]LI W,ZHAO D Z,LI B,et al.Combining CRF and rule basedmedical named entity recognition[J].Application Research of Computers,2015,32(4):1082-1086.
[24]FENG J,HUANG M,ZHAO L,et al.Reinforcement learning for relation classification from noisy data[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence,New Or-leans.Menlo Park:AAAI,2018:5779-5786.
[25]JI G,LIU K,HE S,et al.Distant supervision for relation extraction with sentence-level attention and entity descriptions[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence,San Francisco.Menlo Park:AAAI,2017:3060-3066.
[26]WU X P,ZHANG Q,ZHAO F,et al.Entity relation extraction method for guidelines of cardiovascular disease based on bidirectional encoder representation from transformers[J].Journal of Computer Applications,2021,41(1):145-149.
[27]DING Z Y,YANG Z H,LUO L,et al.A Chinese biomedical entity relationship extraction system based on deep learning[J].Journal of Chinese Information Processing,2021,35(5):70-76.
[28]UKOV-GREGORI A,BACHRACH Y,COOPE S.Named entityrecognition with parallel recurrent neural networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2018:69-74.
[29]GAO F,YANG J X,GU J G.Extraction of diagnosis and treatment relationship based on fusion relation discovery words and deep learning[J].Computer Applications and Software,2021,38(12):168-173.
[30]ZENG D,LIU K,LAI S,et al.Relation classification via convo-lutional deep neural network[C]//25th International Conference on Computational Linguistics:Technical Papers(COLING 2014).2014:2335-2344.
[31]YAN X,DUAN Y X,ZHANG Z H.Entity relationship extraction fusing self-attention mechanism and CNN[J].Computer Engineering & Science,2020,42(11):2059-2066.
[32]ZHANG Y,GAO D L,GONG D W,et al.Attention graph long short term memory neural network for relation extraction[J].CAAI Transactions on Intelligent Systems,2021,16(3):518-527.
[33]KATIYAR A,CARDIE C.Going out on a limb:Joint extraction of entity mentions and relations without dependency trees[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017:917-928.
[34]XIAO J,ZHOU Z.Chapter-level entity relationship extractionmethod based on joint learning[C]//2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics(IHMSC).IEEE,2020,1:75-78.
[35]MIWA M,BANSAL M.End-to-end relation extraction usinglstms on sequences and tree structures[J].arXiv:1601.00770,2016.
[36]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[37]LAFFERTY J,MCCALLUM A,PEREIRA F C N.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[J].2001.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!