基于大小语言模型协同增强的中文电子病历依存句法分析

doi:10.11896/jsjkx.231200054

Abstract

Abstract: Dependency parsing is a crucial task in natural language processing,aiming to identify the syntactic dependencies between words in a sentence.However,existing research on dependency parsing for Chinese electronic medical records faces follo-wing problems:current general-purpose parsers are unable to accurately analyze the situation when there is a lack of components indicative of grammatical structure and a variety of positions of modifiers.To address these issues,this paper proposes a method based on a dual-scale collaborative enhancement of large and small language models for dependency parsing of Chinese electronic medical records.Specifically,we first analyze the linguistic features of Chinese electronic medical records,and propose component completion to indicate special grammatical structures in medical texts.Subsequently,we utilize a generic parser for dependency parsing,for the parsed syntactic graph,we employ the prior grammatical knowledge of a large language model to modify it automatically.In addition,since our approach focuses on narrowing the feature distribution gap between medical and generic texts,it is not constrained by the lack of annotated data in the medical domain.This study annotates 444 samples for dependency parsing of Chinese electronic medical records,which validates our method.Experimental results demonstrate the effectiveness of our approach in parsing Chinese electronic medical records,achieving LAS and UAS metrics of 92.42 and 94.60 in the scenario with little data.The proposed method also shows significant performance in various departments.

Key words: Natural language processing, Dependency parsing, Chinese electronic medical records, Large language model, Collaborative enhancement

CLC Number:

TP391

XU Siyao, ZENG Jianjun, ZHANG Weiyan, YE Qi, ZHU Yan. Dependency Parsing for Chinese Electronic Medical Record Enhanced by Dual-scale Collaboration of Large and Small Language Models[J].Computer Science, 2025, 52(2): 253-260.

References

[1]EISNER.Bilexical Grammars and their Cubic-Time Parsing Algorithms [J].Springer Netherlands,2000,10(7):29-61.
[2]CHEN W L,ZHANG M,ZHANG Y.Semi-supervised FeatureTransformation for Dependency Parsing[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language.2013:1303-1313.
[3]TIMOTHY D,CHRISTOPHER M.Deep Biaffine Attention for Neural Dependency Parsing[C]//Proceedings of the 2017 International Conference on Learning Representations.2017:1-8.
[4]CHEN D Q,MANNING.A fast and accurate dependency parser using neural networks[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language.2014:740-750.
[5]WEISS D,ALBERTI C,COLLINS M.Structured training forneural network transition-based parsing.[C]//Proceedings of Annual Meeting of the Association for Computational Linguistics.2015.
[6]DYER C,BALLESTEROS M,WANG L,et al.Transition-based dependency parsing with stack long short-term memory[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language.2015.
[7]KIPERWASSER E,GOLDBERG Y.Simple and accurate de-pendency parsing using bidirectional LSTM feature representations[C]//Proceedings of Annual Meeting of the Association for Computational Linguistics.2016:313-327.
[8]DOZAT T,MANNING C.Deep biaffine attention for neural dependency parsing[J]. arXiv:1611.01734,2016.
[9]MRINI K,DERNONCOURT F.Rethinking self-attention:To-wards interpretability in neural parsing [C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language.2020:731-742.
[10]BADA M,PYYSALO S,CIOSICI M,et al.Craft SharedTasks 2019 Overview－Integrated Structure,Semantics,and Coreference[C]//Proceedings of the 5th Workshop on BioNLP Open Shared Tasks.2019:174-184.
[11]NGO TM,KANERVA J,GINTER F,et al.Neural Dependency Parsing of Biomedical Text:TurkuNLP entry in the CRAFT structural annotation task[C]//Proceedings of the 5th Workshop on BioNLP Open Shared Tasks.2019:206-215.
[12]JANG Z P,GUAN Y.A Fusion Model for Chinese Electronic Medical Record Parsing [J].ACTA Automatica Sinica,2019,45(2):276-288.
[13]KOPF A,KILCHER Y,RUTTE D,et al.Open AssistantConversations-Democratizing Large Language Model Alignment[C]//Proceedings of the 2023 Conference and Workshop on Neural Information Processing Systems.2023:1-13.
[14]WEI J,TAY Y,RISHI B,et al.Emergent Abilities of Large Language Models [J].arXiv:2206.07682,2022.
[15]SUN X F,DONG L F.Pushing the Limits of ChatGPT on NLP Tasks[J].arXiv:2306.09719,2023.
[16]SCHICK T,SCHÜTZE H.Exploiting Cloze-questions for Few-shot Text Classification and Natural Language Inference[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:Main Volume.2021:255-269.
[17]GUNTER T D,TERRY N P.The Emergence of National Electronic Health Record Architectures in the United States and Australia:Models,Costs,and Questions [J].Journal of Medical Internet Research,2005,7(1):e3.
[18]YEH C L,CHEN Y C.Zero Anaphora Resolution in Chinese with Shallow Parsing [J].Journal of Chinese Language and Computing,2007,17(1):41-56.
[19]JIANG M,HUANG Y,FAN J W,et al. Parsing Clinical Text:How Good Are the State-of-the-art Parsers?[J] BMC Medical Informatics and Decision Making,2015,15(S1):1-6.
[20]SHI J L,LUO X Y.Construction of a Treebank of LearnersChinese [J].Journal of Chinese Information Processing,2022,36(1):39-46.
[21]CHE W,FENG Y,QIN L,et al.N-LTP:An Open-source Neural Language Technology Platform for Chinese[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language.2021:42-49.
[22]PLANK B,ALONSO H M,AGIĆ Ž,et al.Do dependencyparsing metrics correlate with human judgments?[C]//Proceedings of the 19th Conference on Computational Natural Language Learning.2015:315-320.
[23]HAN H,CHOI J D.The Stem Cell Hypothesis:Dilemma behind Multi-Task Learning with Transformer Encoders[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language.2021:5555-5577.
[24]ZHANG S,WANG L,SUN K,et al.A practical Chinese dependency parser based on a large-scale dataset [J]. arXiv:2009.00901,2020.
[25]ZHANG Y,CUI L.Siren's Song in the AI Ocean:A Survey on Hallucination in Large Language Models [J].arXiv:2309.01219,2023.

Related Articles 15

[1]	ZOU Rui, YANG Jian, ZHANG Kai. Low-resource Vietnamese Speech Synthesis Based on Phoneme Large Language Model andDiffusion Model [J]. Computer Science, 2025, 52(6A): 240700138-6.
[2]	ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[3]	BAI Yuntian, HAO Wenning, JIN Dawei. Study on Open-domain Question Answering Methods Based on Retrieval-augmented Generation [J]. Computer Science, 2025, 52(6A): 240800141-7.
[4]	ZHANG Le, CHE Chao, LIANG Yan. Hallucinations Proactive Relief in Diabetes Q&A LLM [J]. Computer Science, 2025, 52(6A): 240700182-10.
[5]	YIN Baosheng, ZONG Chen. Research on Semantic Fusion of Chinese Polysemous Words Based on Large LanguageModel [J]. Computer Science, 2025, 52(6A): 240400139-7.
[6]	HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[7]	ZHAO Zheyu, WANG Zhongqing, WANG Hongling. Commodity Attribute Classification Method Based on Dual Pre-training [J]. Computer Science, 2025, 52(6A): 240500127-8.
[8]	TU Ji, XIAO Wendong, TU Wenji, LI Lijian. Application of Large Language Models in Medical Education:Current Situation,Challenges and Future [J]. Computer Science, 2025, 52(6A): 240400121-6.
[9]	LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[10]	GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
[11]	CHEN Xuhao, HU Sipeng, LIU Hongchao, LIU Boran, TANG Dan, ZHAO Di. Research on LLM Vector Dot Product Acceleration Based on RISC-V Matrix Instruction Set Extension [J]. Computer Science, 2025, 52(5): 83-90.
[12]	CONG Yingnan, HAN Linrui, MA Jiayu, ZHU Jinqing. Research on Intelligent Judgment of Criminal Cases Based on Large Language Models [J]. Computer Science, 2025, 52(5): 248-259.
[13]	ZHU Shucheng, HUO Hongying, WANG Weikang, LIU Ying, LIU Pengyuan. Automatic Optimization and Evaluation of Prompt Fairness Based on Large Language Model Itself [J]. Computer Science, 2025, 52(4): 240-248.
[14]	SONG Xingnuo, WANG Congyan, CHEN Mingkai. Survey on 3D Scene Reconstruction Techniques in Metaverse [J]. Computer Science, 2025, 52(3): 17-32.
[15]	CHENG Dawei, WU Jiaxuan, LI Jiangtong, DING Zhijun, JIANG Changjun. Study on Evaluation Framework of Large Language Model’s Financial Scenario Capability [J]. Computer Science, 2025, 52(3): 239-247.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Dependency Parsing for Chinese Electronic Medical Record Enhanced by Dual-scale Collaboration of Large and Small Language Models

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0