中文病理文本的结构化处理方法研究

doi:10.11896/j.issn.1002-137X.2016.10.051

计算机科学 ›› 2016, Vol. 43 ›› Issue (10): 272-276.doi: 10.11896/j.issn.1002-137X.2016.10.051

中文病理文本的结构化处理方法研究

陈德华,冯洁莹,乐嘉锦,潘乔

东华大学计算机科学与技术学院上海200051,东华大学计算机科学与技术学院上海200051,东华大学计算机科学与技术学院上海200051,东华大学计算机科学与技术学院上海200051

出版日期:2018-12-01 发布日期:2018-12-01
基金资助:
本文受上海市科委科技创新行动计划:基于“互联网+”技术的多病种多中心临床大数据行业应用(15511106900)资助

Research on Structured Method for Chinese Pathological Text

CHEN De-hua, FENG Jie-ying, LE Jia-jin and PAN Piao

Online:2018-12-01 Published:2018-12-01

摘要/Abstract

摘要： 病理文本作为一类重要的非结构化临床文档,对临床诊断至关重要。针对具体的中文病理文本数据,提出一种简单有效结构化处理方法。首先对中文病理历史文本数据进行预处理,包括数据清洗、短句切分及主干提取等步骤,从中提取出各个样本所对应的文本信息；然后通过短句聚类和统计参数筛选实现样本描述模板的提取；最后利用模板对病理文本进行即时结构化处理,得到最终的结构化处理结果。实验证明,该方法对同类文本可以达到很好的结构化效果；同时提取的模板会被定期优化以适应最新的数据结构化需求。

关键词: 中文病理文本,结构化,短句聚类,模板提取

Abstract: Pathological text as an important kind of unstructured clinical documents,is essential to clinical diagnosis.For the specific Chinese pathological text,this paper put forward a simple and effective structured approach.Firstly the Chinese pathological texts are preprocessed ,including data cleaning,clauses split and trunk extraction,in order to extract the corresponding information of each sample.Then each sample’s final template information is extracted by the way of clauses clustering and statistical parameters filtering.Finally,the templates are used for immediate pathological text structuring process,and the structured results are obtained.Experiments show that the proposed method can achieve satisfactory structured results for similar pathological texts,and the extracted templates will be regularly optimized to meet the needs of the latest text structuring.

Key words: Chinese pathological text,Structuring,Clauses clustering,Template extraction

陈德华,冯洁莹,乐嘉锦,潘乔. 中文病理文本的结构化处理方法研究[J]. 计算机科学, 2016, 43(10): 272-276. https://doi.org/10.11896/j.issn.1002-137X.2016.10.051

CHEN De-hua, FENG Jie-ying, LE Jia-jin and PAN Piao. Research on Structured Method for Chinese Pathological Text[J]. Computer Science, 2016, 43(10): 272-276. https://doi.org/10.11896/j.issn.1002-137X.2016.10.051

参考文献

[1] Chen Jin-xiong.Electronic Medical Record and Electronic Medical Record System[J].Chinese Medical Equipment Journal,2010,31(10):1-7(in Chinese) 陈金雄.电子病历与电子病历系统[J].医疗卫生装备,2010,31(10):1-7
[2] Gunnar E,Bente C,Line S.Developing Large-scale ElectronicPatient Records Conforming to the openEHR Architecture[J].Procedia Technology,2014,16:1281-1286
[3] Hiroshi T,Yasushi M,Takeo O,et al.A Japanese approach to establish an electronic patient record system in an intelligent hospital[J].International Journal of Medical Informatics,1998,49(1):45-51
[4] Hiroshi T,Yasushi M,Shigeki K,et al.Architecture for networked electronic patient record systems[J].International Journal of Medical Informatics,2000,60(2):161-167
[5] Emilia A,Channin D S,Demner-Fushman Dina,et al.Automatic segmentation of clinical texts[C]∥Annual International Conference of the IEEE Engineering in Medicine and Biology Society.Antalya,2009:5905-5908
[6] Maria S,Maria K,Nilsson G H,et al.Automatic recognition of disorders,findings,pharmaceuticals and body structures from clinical text:An annotation and machine learning study[J].Journal of Biomedical Informatics,2014,49(5):148-158
[7] Savova G K,Masanz J J,Ogren P V,et al.Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES):architecture,component evaluation and applications[J].American Medical Informatics Association,2010,17(5):507-513
[8] Carol F,Lyudmila S,Yves L,et al.Automated encoding of clinical documents based on natural language processing[J].American Medical Informatics Association,2004,11(5):392-402
[9] Kong Xiao-feng,Li Ying,Li Hao-min,et al.Structurization ofDigestive Endoscopy Report Based on NLP[J].Chinese Journal of Medical Instrumentation,2008,32(5):348-351(in Chinese) 孔晓风,李莹,李昊旻,等.基于自然语言处理技术的消化科内窥镜检查报告的结构化[J].中国医疗器械杂志,2008,32(5):348-351
[10] 张华平.NLPIR汉语分词系统.http://ictclas.nlpir.org
[11] 邱锡鹏.中文自然语言处理工具包FNLP.https://github.com/xpqiu/fnlp
[12] Steven B,Ewan K,Loper Edward,et al.NLTK.http://www.nltk.org
[13] Mima H,Ananiadou S.An application and evalution of the C/NC-value approach for the automatic term recognition of multi-word units in Japanese[J].International Journal on Terminology,2001,6(2):175-194
[14] Frantzi K T,Ananiandou Sophia.Extracting nested collections[C]∥Proceeding COLING’96 Proeeedings of the 16’,Confe-rence on Association Computational Linguistic.Stroudsburg,1996:41-47
[15] 吴军.数学之美[M].北京:人民邮电出版社,2012
[16] Zhai Dong-hai,Yu Jiang,Gao Fei,et al.K-means text clustering algorithm based on initial cluster centers selection according to maximum distance[J].Application Research of Computers,2014,31(3):713-719(in Chinese) 翟东海,鱼江,高飞,等.最大距离法选取初始簇中心的K-means文本聚类算法的研究[J].计算机应用研究,2014,31(3):713-719
[17] He Hui,Chen Bo,Xu Wei-ran,et al.Short Text Feature Extraction and Clustering for Web Topic Mining[C]∥Third International Conference on Semantics.Knowledge,and Grid,2007:382-385

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

中文病理文本的结构化处理方法研究

Research on Structured Method for Chinese Pathological Text

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0