计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240700182-10.doi: 10.11896/jsjkx.240700182
张乐1, 车超1,2, 梁艳3
ZHANG Le1, CHE Chao1,2, LIANG Yan3
摘要: 糖尿病的治疗是一项长期且高度个性化的工作,给患者的日常生活带来了巨大负担。患者通过医学大语言模型进行糖尿病问诊能有效减轻患者的医疗负担,但大语言模型在处理医学等专业领域文本时更可能会产生幻觉,即错误、无意义或与输入不匹配的输出。且现有的幻觉缓解技术在医学领域的准确率并不理想,这会极大地影响大语言模型的准确率。为了解决这一问题,提出一种结合指令微调和检索增强生成的幻觉自查与主动缓解方法,主要在生成过程前对用户提问形成附加知识,在生成过程后通过相似度对比判断幻觉是否产生。实验在多个医学数据集上进行,在大规模糖尿病多轮对话数据集上取得了0.79的F1值、2.38的BLEU-4值和9.26的Rouge-l值,在准确率和生成效率方面均优于现有的大语言模型幻觉缓解技术。
中图分类号:
[1]ZENG A,LIU X,DU Z,et al.GLM-130B:An Open Bilingual Pre-Trained Model [C]//The Eleventh International Conference on Learning Representations,ICLR 2023,Kigali,Rwanda,May 1-5,2023.OpenReview.net,2023. [2]SUN Y,WANG S,FENG S,et al.Ernie 3.0:Large-scale knowledge enhanced pre-training for language understanding and generation [J].arXiv:2107.02137,2021. [3]BAI J,BAI S,CHU Y,et al.Qwen technical report [J].arXiv:2309.16609,2023. [4]TOUVRON H,MARTIN L,STONE K,et al.Llama 2:OpenFoundation and Fine-Tuned Chat Models [J].arXiv:2307.09288,2023. [5]VARSHNEY N,YAO W,ZHANG H,et al.A Stitch in Time Saves Nine:Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation [J].arXiv:2307.03987,2023. [6]LI Y,LI Z,ZHANG K,et al.ChatDoctor:A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI(LLaMA) Using Medical Domain Knowledge [J].Cureus,2023,15(6):1-12. [7]WANG H,LIU C,XI N,et al.Huatuo:Tuning LLaMA Model with Chinese Medical Knowledge [J].arXiv:2304.06975,2023. [8]LIAO Y,MENG Y,LIU H,et al.MING:Chinese Medical Consultation Large Model [EB/OL].(2023-01-01) [2024-07-24].https://github.com/MediaBrain-SJTU/MING. [9]WANG H,ZHAO S,QIANG Z,et al.Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese [J].arXiv:2309.04175,2023. [10]LEE N,PING W,XU P,et al.Factuality enhanced language models for open-ended text generation [J].Advances in Neural Information Processing Systems,2022,35:34586-34599. [11]RASHKIN H,REITTER D,TOMAR G S,et al.IncreasingFaithfulness in Knowledge-Grounded Dialogue with Controllable Features [C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).Association for Computational Linguistics,2021:704-718. [12]LI Y,YAO K,QIN L,et al.Slot-consistent NLG for task-oriented dialogue systems with iterative rectification network [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020. [13]CHEN S,ZHANG F,SONE K,et al.Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection [EB/OL].(2021) [Association for Computational Linguistics].https://aclanthology.org/2021.naacl-main.475. [14]PENG B,GALLEY M,HE P,et al.Check your facts and tryagain:Improving large language models with external knowledge and automated feedback [J].arXiv:2302.12813,2023. [15]MANAKUL P,LIUSIEA,GALES M.SelfCheckGPT:Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Singapore,2023.Association for Computational Linguistics. [16]AZARIA A,AZOULAY R,RECHES S.ChatGPT is a remarkable tool-For experts [J].Data Intelligence,2023,5(4):1-49. [17]LIU X,JI K,FU Y,et al.P-tuning v2:Prompt tuning can be comparable to fine-tuning universally across scales and tasks [J].arXiv:2110.07602,2021. [18]TAORI R,GULRAJANI I,ZHANG T,et al.Alpaca:A strong,replicable instruction-following model [J].Stanford Center for Research on Foundation Models,2023,3(6):7. [19]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space [C]//1st International Conference on Learning Representations(ICLR 2013).Scottsdale,Arizona,USA,May 2-4,2013,Workshop Track Proceedings.Y.Bengio and Y.LeCun(eds.),2013. [20]DU Z,QIAN Y,LIU X,et al.GLM:General Language Model Pretraining with Autoregressive Blank Infilling [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Dublin,Ireland,2022.Association for Computational Linguistics. [21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [J].Advances in Neural Information Processing Systems,2017,30. [22]SHEN J,WU Y,SHANG J,et al.DeepNet:Scaling Transformers to 1,000 Layers [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2023:10777-10786. [23]SU J,AHMED M,LU Y,et al.Roformer:Enhanced Transformer with Rotary Position Embedding [J].Neurocomputing,2024,568:127063. [24]HENDRYCKS D,GIMPEL K.Gaussian error linear units(gelus) [J].arXiv:1606.08415,2016. [25]HOULSBY N,GIURGIU A,JASTRZEBSKI S,et al.Parameter-efficient transfer learning for NLP [C]//International Conference on Machine Learning.PMLR,2019. [26]LIU X,JI K,FU Y,et al.P-tuning:Prompt tuning can be comparable to fine-tuning across scales and tasks [C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2022. [27]ZHANG T,KISHORE V,WU F,et al.BERTScore:Evaluating Text Generation with BERT [C]//8th International Conference on Learning Representations(ICLR 2020).Addis Ababa,Ethiopia,April 26-30,2020.OpenReview.net,2020. [28]LIN C Y.ROUGE:A Package for Automatic Evaluation ofSummaries [C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics.Barcelona,Spain,2004.Association for Computational Linguistics. [29]PAPINENI K,ROUKOS S,WARD T,et al.BLEU:A Method for Automatic Evaluation of Machine Translation [C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.2002. [30]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners [J].Advances in Neural Information Processing Systems,2020,33:1877-1901. [31]LIN S,HILTON J,EVANS O.TruthfulQA:Measuring HowModels Mimic Human Falsehoods [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Dublin,Ireland,2022.Association for Computational Linguistics. [32]HUANG K H,CHAN H P,JI H.Zero-shot Faithful FactualError Correction [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Toronto,Canada,2023.Association for Computational Linguistics. [33]CAI Y,WANG L,WANG Y,et al.MedBench:A Large-ScaleChinese Benchmark for Evaluating Medical Large Language Models [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:17709-17717. |
|