Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240700182-10.doi: 10.11896/jsjkx.240700182

• Large Language Model Technology and Its Application • Previous Articles     Next Articles

Hallucinations Proactive Relief in Diabetes Q&A LLM

ZHANG Le1, CHE Chao1,2, LIANG Yan3   

  1. 1 Key Laboratory of Advanced Design and Intelligent Computing(Dalian University),Ministry of Education,Dalian,Liaoning 116622,China
    2 School of Software Engineering,Dalian University,Dalian,Liaoning 116622,China
    3 College of Mechanical and Electronic Engineering,Shanghai Jianqiao University,Shanghai 201306,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:ZHANG Le,born in 2000,postgraduate,is a member of CCF(No.T9208G).His main research interests include large language model and natural language processing.
    LIANG Yan,born in 1982,master.Her main research interests include digital and signal processing.
  • Supported by:
    National Natural Science Foundation of China (62076045), Liaoning Provincial Department of Education Service Local Program (LJKFZ20220290) and Dalian University Interdisciplinary Program (DLUXK-2023-YB-003).

Abstract: The treatment of diabetes is a long-term and highly personalized endeavor and imposes a significant burden on patients’ daily lives.Diabetes consultation through medical large language models(LLMs) can effectively alleviate the medical healthcare burden of patients.But LLMs are more likely to produce hallucinations,i.e.,outputs that are incorrect,meaningless,or mismatched with the input,when processing texts in specialized domains such as medicine.And the accuracy rate of existing hallucination relief techniques in the medical field is not satisfactory,which will greatly affect the accuracy rate of the LLMs.To address this problem,this paper proposes a hallucination self-inspection and proactive relief method that combines instruction fine-tuning and retrieval augmented generation to form additional knowledge about user questions before the generation process,and to determine whether a hallucination is generated by similarity comparison after the generation process.Experiments are conducted on several medical datasets,and an F1 value of 0.79,a BLEU-4 value of 2.38,and a Rouge-l value of 9.26 are achieved on the large-scale diabetic multi-round conversation dataset,which outperforms the existing hallucination relief techniques for LLMs in terms of accuracy and generation efficiency.

Key words: Large language model, Retrieval augmented generation, Hallucination relief, Diabetes, Question and answer system

CLC Number: 

  • F416
[1]ZENG A,LIU X,DU Z,et al.GLM-130B:An Open Bilingual Pre-Trained Model [C]//The Eleventh International Conference on Learning Representations,ICLR 2023,Kigali,Rwanda,May 1-5,2023.OpenReview.net,2023.
[2]SUN Y,WANG S,FENG S,et al.Ernie 3.0:Large-scale knowledge enhanced pre-training for language understanding and generation [J].arXiv:2107.02137,2021.
[3]BAI J,BAI S,CHU Y,et al.Qwen technical report [J].arXiv:2309.16609,2023.
[4]TOUVRON H,MARTIN L,STONE K,et al.Llama 2:OpenFoundation and Fine-Tuned Chat Models [J].arXiv:2307.09288,2023.
[5]VARSHNEY N,YAO W,ZHANG H,et al.A Stitch in Time Saves Nine:Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation [J].arXiv:2307.03987,2023.
[6]LI Y,LI Z,ZHANG K,et al.ChatDoctor:A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI(LLaMA) Using Medical Domain Knowledge [J].Cureus,2023,15(6):1-12.
[7]WANG H,LIU C,XI N,et al.Huatuo:Tuning LLaMA Model with Chinese Medical Knowledge [J].arXiv:2304.06975,2023.
[8]LIAO Y,MENG Y,LIU H,et al.MING:Chinese Medical Consultation Large Model [EB/OL].(2023-01-01) [2024-07-24].https://github.com/MediaBrain-SJTU/MING.
[9]WANG H,ZHAO S,QIANG Z,et al.Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese [J].arXiv:2309.04175,2023.
[10]LEE N,PING W,XU P,et al.Factuality enhanced language models for open-ended text generation [J].Advances in Neural Information Processing Systems,2022,35:34586-34599.
[11]RASHKIN H,REITTER D,TOMAR G S,et al.IncreasingFaithfulness in Knowledge-Grounded Dialogue with Controllable Features [C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).Association for Computational Linguistics,2021:704-718.
[12]LI Y,YAO K,QIN L,et al.Slot-consistent NLG for task-oriented dialogue systems with iterative rectification network [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020.
[13]CHEN S,ZHANG F,SONE K,et al.Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection [EB/OL].(2021) [Association for Computational Linguistics].https://aclanthology.org/2021.naacl-main.475.
[14]PENG B,GALLEY M,HE P,et al.Check your facts and tryagain:Improving large language models with external knowledge and automated feedback [J].arXiv:2302.12813,2023.
[15]MANAKUL P,LIUSIEA,GALES M.SelfCheckGPT:Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Singapore,2023.Association for Computational Linguistics.
[16]AZARIA A,AZOULAY R,RECHES S.ChatGPT is a remarkable tool-For experts [J].Data Intelligence,2023,5(4):1-49.
[17]LIU X,JI K,FU Y,et al.P-tuning v2:Prompt tuning can be comparable to fine-tuning universally across scales and tasks [J].arXiv:2110.07602,2021.
[18]TAORI R,GULRAJANI I,ZHANG T,et al.Alpaca:A strong,replicable instruction-following model [J].Stanford Center for Research on Foundation Models,2023,3(6):7.
[19]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space [C]//1st International Conference on Learning Representations(ICLR 2013).Scottsdale,Arizona,USA,May 2-4,2013,Workshop Track Proceedings.Y.Bengio and Y.LeCun(eds.),2013.
[20]DU Z,QIAN Y,LIU X,et al.GLM:General Language Model Pretraining with Autoregressive Blank Infilling [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Dublin,Ireland,2022.Association for Computational Linguistics.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need [J].Advances in Neural Information Processing Systems,2017,30.
[22]SHEN J,WU Y,SHANG J,et al.DeepNet:Scaling Transformers to 1,000 Layers [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2023:10777-10786.
[23]SU J,AHMED M,LU Y,et al.Roformer:Enhanced Transformer with Rotary Position Embedding [J].Neurocomputing,2024,568:127063.
[24]HENDRYCKS D,GIMPEL K.Gaussian error linear units(gelus) [J].arXiv:1606.08415,2016.
[25]HOULSBY N,GIURGIU A,JASTRZEBSKI S,et al.Parameter-efficient transfer learning for NLP [C]//International Conference on Machine Learning.PMLR,2019.
[26]LIU X,JI K,FU Y,et al.P-tuning:Prompt tuning can be comparable to fine-tuning across scales and tasks [C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 2:Short Papers).2022.
[27]ZHANG T,KISHORE V,WU F,et al.BERTScore:Evaluating Text Generation with BERT [C]//8th International Conference on Learning Representations(ICLR 2020).Addis Ababa,Ethiopia,April 26-30,2020.OpenReview.net,2020.
[28]LIN C Y.ROUGE:A Package for Automatic Evaluation ofSummaries [C]//Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics.Barcelona,Spain,2004.Association for Computational Linguistics.
[29]PAPINENI K,ROUKOS S,WARD T,et al.BLEU:A Method for Automatic Evaluation of Machine Translation [C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.2002.
[30]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners [J].Advances in Neural Information Processing Systems,2020,33:1877-1901.
[31]LIN S,HILTON J,EVANS O.TruthfulQA:Measuring HowModels Mimic Human Falsehoods [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Dublin,Ireland,2022.Association for Computational Linguistics.
[32]HUANG K H,CHAN H P,JI H.Zero-shot Faithful FactualError Correction [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Toronto,Canada,2023.Association for Computational Linguistics.
[33]CAI Y,WANG L,WANG Y,et al.MedBench:A Large-ScaleChinese Benchmark for Evaluating Medical Large Language Models [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:17709-17717.
[1] TU Ji, XIAO Wendong, TU Wenji, LI Lijian. Application of Large Language Models in Medical Education:Current Situation,Challenges and Future [J]. Computer Science, 2025, 52(6A): 240400121-6.
[2] LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[3] ZOU Rui, YANG Jian, ZHANG Kai. Low-resource Vietnamese Speech Synthesis Based on Phoneme Large Language Model andDiffusion Model [J]. Computer Science, 2025, 52(6A): 240700138-6.
[4] ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[5] BAI Yuntian, HAO Wenning, JIN Dawei. Study on Open-domain Question Answering Methods Based on Retrieval-augmented Generation [J]. Computer Science, 2025, 52(6A): 240800141-7.
[6] YIN Baosheng, ZONG Chen. Research on Semantic Fusion of Chinese Polysemous Words Based on Large LanguageModel [J]. Computer Science, 2025, 52(6A): 240400139-7.
[7] HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[8] ZHAO Zheyu, WANG Zhongqing, WANG Hongling. Commodity Attribute Classification Method Based on Dual Pre-training [J]. Computer Science, 2025, 52(6A): 240500127-8.
[9] GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
[10] CHEN Xuhao, HU Sipeng, LIU Hongchao, LIU Boran, TANG Dan, ZHAO Di. Research on LLM Vector Dot Product Acceleration Based on RISC-V Matrix Instruction Set Extension [J]. Computer Science, 2025, 52(5): 83-90.
[11] CONG Yingnan, HAN Linrui, MA Jiayu, ZHU Jinqing. Research on Intelligent Judgment of Criminal Cases Based on Large Language Models [J]. Computer Science, 2025, 52(5): 248-259.
[12] ZHU Shucheng, HUO Hongying, WANG Weikang, LIU Ying, LIU Pengyuan. Automatic Optimization and Evaluation of Prompt Fairness Based on Large Language Model Itself [J]. Computer Science, 2025, 52(4): 240-248.
[13] CHENG Dawei, WU Jiaxuan, LI Jiangtong, DING Zhijun, JIANG Changjun. Study on Evaluation Framework of Large Language Model’s Financial Scenario Capability [J]. Computer Science, 2025, 52(3): 239-247.
[14] HUANG Xueqin, ZHANG Sheng, ZHU Xianqiang, ZHANG Qianzhen, ZHU Cheng. Generative Task Network:New Paradigm for Autonomic Task Planning and Execution Based on LLM [J]. Computer Science, 2025, 52(3): 248-259.
[15] SONG Xingnuo, WANG Congyan, CHEN Mingkai. Survey on 3D Scene Reconstruction Techniques in Metaverse [J]. Computer Science, 2025, 52(3): 17-32.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!