Computer Science ›› 2022, Vol. 49 ›› Issue (6A): 32-38.doi: 10.11896/jsjkx.210400198

• Smart Healthcare • Previous Articles     Next Articles

New Text Retrieval Model of Chinese Electronic Medical Records

YU Jia-qi1,2, KANG Xiao-dong1, BAI Cheng-cheng1, LIU Han-qing1   

  1. 1 School of Medical Image,Tianjin Medical University,Tianjin 300202,China
    2 Clinical Medical College of Tianjin Medical University,Tianjin 300270,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:YU Jia-qi,born in 1989,postgraduate.Her main research interests include medical information processing and so on.
    KANG Xiao-dong,born in 1964,Ph.D,professor,postgraduate supervisor,is a member of China Computer Federation.His main research interests include medical image processing and medical information system integration.

Abstract: The growth of electronic medical records forms the basis of user health big data,which can improve the quality of medi-cal services and reduce medical costs.Therefore,the rapid and effective retrieval of cases has practical significance in clinical medi-cine.Electronic medical records have strong professionalism and unique text characteristics.However,traditional text retrieval methods have the disadvantages of inaccurate text entity semantic expression and low retrieval accuracy.In view of the above characteristics and problems,this paper proposes a fusion BERT-BiLSTM model structure to fully express the semantic information of the electronic medical record text and improve the accuracy of retrieval.This research is based on public data.First,correlation extension retrieval keywords prerpocessing is carried on the open standard Chinese EMR data according to clinical diagnosis rules.Secondly,the BERT model is used to dynamically obtain the word granularity vector matrix according to the context of the medical record text,then the generated word vector is used as the input of the bidirectional long and short-term memory network model(BiLSTM) to extract the global semantic features of the context information.Finally,the feature vector of the retrieved document is mapped to the Euclidean space,and the medical record text closest to the retrieved document is found to realize the text retrieval of unstructured clinical data.Simulation results show that this method can dig out multi-level and multi-angle text semantic features from the medical record text,the F1 value obtained on the electronic medical record data set is 0.94,which can significantly improve the accuracy of text semantic retrieval.

Key words: BERT model, Bidirectional long and short-term memory network model, Electronic medical record, Extended search keywords, Text retrieval

CLC Number: 

  • TP391
[1] KANG X D.Image informatics[M].People's Medical Publi-shing House,2009.
[2] JOON L,MASLOVE D M,DUBIN J A,et al.Personalized Mortality Prediction Driven by Electronic Medical Data and a Patient Similarity Metric[J].Plos One,2015,10(5):e0127428.
[3] NG K,SUN J,HU J Y,et al.Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity[J].Amia JT Summits Transl Sci Proc,2015,2015:132-136.
[4] LI L,CHENG W Y,GLICKSBERG B S,et al.Identification oftype 2 diabetes subgroups through topological analysis of patient similarity[J].Science Translational Medicine,2015,7(311):311ra174-311ra174.
[5] DING Z J,YANG Q,ZHANG H B,et al.Review of retrievalmodels based on unstru-ctured text[J].Application Research of Computers,2017,34(6):1601-1608,1612.
[6] DWORK C.Differential Privacy[C]//Proceedings of the 33rd international conference on Automata,Languages and Programming-Volume Part II.Berlin:Springer,2006.
[7] SALTON G.A Vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
[8] CAO D L,LIN D Z.A review of text retrieval models[J].Mind and Computing,2007(4):426-432.
[9] WANG X,LUO E P,ZHANG J.Intelligent full-text retrieval of electronic medical records based on semantics[J].Medical and Medical Equipment,2008(4):52-53.
[10] DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Indexing by latent semantic analysis[J].Journal of the Association for Information Science & Technology,2010,41(6):391-407.
[11] CHEN L,TOKUDA N,NAGAI A.A new differential LSIspace-based probabilistic document classifier[J].Information Processing Letters,2003,88(5):203-212.
[12] BLEI D M,NG A Y,JORDAN M I,et al.Latent Dirichlet Allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
[13] WU D.Research and application of electronic medical record retrieval based on latent semantic correlation algorithm[D].Shenyang:Northeastern University,2012.
[14] SHI Q Q.Research on semantic retrieval methods of medical records based on LDA and LSA[D].Shenyang:Northeastern University,2014.
[15] SUN J W.Application of Chinese Document Retrieval Based on Deep Learning[D].Jilin:Jilin University,2015.
[16] KIM Y.Convolutional Neural Networks for Sentence Classification[C]//EMNLP 2014.2014.
[17] GRAVES A,MOHAMED A R,HINTON G.Speech Recognition with Deep Recurrent Neural Networks[C]//International Conference on Acoustics,Speech,and Signal Processing(ICASSP'88).2013.
[18] HUANG N E,SHEN Z,LONG S R,et al.The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis[J].Proceedings A,1998,454(1971):903-995.
[19] ZHANG J Y.Research on temporal semantic similarity in electronic medical record retrieval[D].Beijing:Beijing University of Posts and Telecommunications,2018.
[20] REI M.Semi-supervised Multitask Learning for Sequence Labeling[C]//Proceedings of the 55th Annual Meeting of the Assocoation for Compinnal Linguistics.2017.
[21] ZHOU S,XU S,XU B.Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages[J].arXiv:1806.05059,2018.
[22] PALANGI H,DENG L,SHEN Y,et al.Deep Sentence Embedding Using Long Short-Term Memory Networks:Analysis and Application to Information Retrieval[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(4):694-707.
[23] SAK H,SENIOR A,BEAUFAYS F.Long Short-Term Me-mory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition[J].arXiv:1402.1128,2014.
[24] ZHANG X C,DAI X Y,LIU L,et al.Chinese short text classification model with multi-head self-attention mechanism[J].Journal of Computer Applications,2020,40(12):3485-3489.
[1] YUAN Hao-nan, WANG Rui-jin, ZHENG Bo-wen, WU Bang-yan. Design and Implementation of Cross-chain Trusted EMR Sharing System Based on Fabric [J]. Computer Science, 2022, 49(6A): 490-495.
[2] FAN Hong-jie, LI Xue-dong, YE Song-tao. Aided Disease Diagnosis Method for EMR Semantic Analysis [J]. Computer Science, 2022, 49(1): 153-158.
[3] ZHOU Xiao-jin, XU Chen-ming, RUAN Tong. Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records [J]. Computer Science, 2021, 48(4): 237-242.
[4] HE Heng, JIANG Jun-jun, FENG Ke, LI Peng, XU Fang-fang. Efficient Multi-keyword Retrieval Scheme Based on Attribute Encryption in Multi-cloud Environment [J]. Computer Science, 2021, 48(11A): 576-584.
[5] YU Jie, JI Bin, LIU Lei, LI Sha-sha, MA Jun, LIU Hui-jun. Joint Extraction Method for Chinese Medical Events [J]. Computer Science, 2021, 48(11): 287-293.
[6] LV Jian-fu,LAI Ying-xu,LIU Jing. Log Security Storage and Retrieval Based on Combination ofOn-chain and Off-chain [J]. Computer Science, 2020, 47(3): 298-303.
[7] LI Xiao-rong, SONG Zi-ye, REN Jing-yi, XU Lei and XU Chun-gen. Attribute-based Searchable Encryption of Electronic Medical Records in Cloud Computing [J]. Computer Science, 2017, 44(Z11): 342-347.
[8] CHEN Chao-qun and LI Zhi-hua. Privacy-preserving Oriented Ciphertext Retrieval Algorithm [J]. Computer Science, 2016, 43(Z11): 346-351.
[9] CHENG Shuai and YAO Han-bing. Study of Cipher Text Retrieval Based on Homomorphic Encryption [J]. Computer Science, 2015, 42(Z6): 413-416.
[10] SHEN Guo-feng,KONG Jun-jun,GUO Yao and CHEN Xiang-qun. Context Retrieval Cost Model on Smartphones and its Application [J]. Computer Science, 2014, 41(11): 132-136.
[11] WANG Ying,CHEN Wei-he,JU Shi-guang. Application of UCON Model on Electronic Medical Record [J]. Computer Science, 2010, 37(11): 190-193.
Full text



No Suggested Reading articles found!