计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 68-71.doi: 10.11896/jsjkx.231100066

• 创刊五十周年特别专题 • 上一篇    下一篇

大语言模型安全现状与挑战

赵月1, 何锦雯1,2, 朱申辰1,2, 李聪仪1,2, 张英杰1,2, 陈恺1,2   

  1. 1 中国科学院信息工程研究所 北京100085
    2 中国科学院大学网络安全学院 北京101408
  • 收稿日期:2023-11-10 修回日期:2023-12-20 出版日期:2024-01-15 发布日期:2024-01-12
  • 通讯作者: 陈恺(chenkai@iie.ac.cn)
  • 作者简介:(zhaoyue@iie.ac.cn)

Security of Large Language Models:Current Status and Challenges

ZHAO Yue1, HE Jinwen1,2, ZHU Shenchen1,2, LI Congyi1,2 , ZHANG Yingjie1,2, CHEN Kai1,2   

  1. 1 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100085,China
    2 School of Cyber Security,University of Chinese Academy of Sciences,Beijing 101408,China
  • Received:2023-11-10 Revised:2023-12-20 Online:2024-01-15 Published:2024-01-12
  • About author:ZHAO Yue,born in 1992,Ph.D,research assistant,is a member of CCF(No.K7521M).Her main research interest is AI security.
    CHEN Kai,born in 1982,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.76085D).His main research interests include software analysis and testing,AI security and privacy.

摘要: 大语言模型因其出色的文本理解和生成能力,被广泛应用于自然语言处理领域并取得了显著成果,为社会各界带来了巨大的便利。然而,大语言模型自身仍存在明显的安全问题,严重影响其应用的可信性与可靠性,是安全学者需广泛关注的问题。文中针对大语言模型自身的安全问题,首先从基于大语言模型的恶意应用问题切入,阐述提示注入攻击及其相应的防御方法;其次,介绍大语言模型幻觉带来的可信问题,对幻觉问题的量化评估、幻觉来源和缓解技术是当前研究的重点;然后,大语言模型隐私安全问题强调了个人及企业数据的保护问题,一旦在进行人机交互时泄露商业秘密和个人敏感信息,将可能引发严重的安全风险,当前研究主要通过可信执行环境和隐私计算技术来进行风险规避;最后,提示泄露问题关注攻击者如何窃取有价值的提示词进行获利或通过个性化提示词泄露个人隐私。提升大语言模型的安全性需要综合考虑模型隐私保护、可解释性研究以及模型分布的稳定性与鲁棒性等问题。

关键词: 大语言模型, 人工智能安全, 恶意应用, 模型幻觉, 隐私安全, 提示泄露

Abstract: Large language models have revolutionized natural language processing,offering exceptional text understanding and generation capabilities that benefit society significantly.However,they also pose notable security challenges,demanding the attention of security researchers.This paper introduces these concerns,including malicious applications with prompt injection attacks,reliable issues arising from model hallucinations,privacy risks tied to data protection,and the problem of prompt leakage.To enhance model security,a comprehensive approach is required,focusing on privacy preservation,interpretability research,and model distribution stability and robustness.

Key words: Large language models, AI security, Malicious applications, Model hallucinations, Privacy security, Prompt leakage

中图分类号: 

  • TP389
[1]BRANCH H J,CEFALU J R,MCHUGH J,et al.Evaluatingthe susceptibility of pre-trained language models via handcrafted adversarial examples[J].arXiv:2209.02128,2022.
[2]KEVIN L.The entire prompt of Microsoft Bing Chat![EB/OL][2023-02-09].https://twitter.com/kliu128/status/1623472922374574080.
[3]FÁBIO P,RIBEIRO I.Ignore Previous Prompt:Attack Techniques For Language Models[J].arXiv:2211.09527,2022.
[4]KAI G,SAHAR A,SHAILESH M,et al.More than you’ve asked for:A Comprehensive Analys is of Novel Prompt Injection Threats to Application-Integrated Large Language Models[J].arXiv:2302.12173,2023.
[5]Trigaten.Learn Prompting:Indirect Injection[EB/OL].[2023-05-29].https://learnprompting.org/docs/prompthacking/offensivemeasures/indirectinjection.
[6]LIU Y,JIA Y,GENG R,et al.Prompt Injection Attacks and Defenses in LLM-Integrated Applications[J].arXiv:2310.12815,2023.
[7]LIU X,CHENG H,HE P,et al.Adversarial training for large neural language models[J].arXiv:2004.08994,2020.
[8]MicroSoft.Content filtering[EB/OL].[2023-06-09].https://learn.microsoft.com/en-us/azure/cognitiveservices/openai/con-cepts/content-filter.
[9]Google.Generative AI for Developers:ContentFilter[EB/OL].[2023-05-06].https://developers.generativeai.google/api/python/google/ai/generativelanguage/ContentFilter.
[10]JI Z W,NAYEON L,RITA F,et al.Survey of Hallucination in Natural Language Generation[J].ACM Computing Surveys 2023,55(12):1-38.
[11]ZHANG Y,LI Y F,CUI L Y,et al.Siren’s Song in the AIOcean:A Survey on Hallucination in Large Language Models[J].arXiv:2309.01219,2023.
[12]SEWON M,KRISHNA K,LYU X X,et al.FActScore:Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation[J].arXiv:2305.14251,2023.
[13]LI J Y,CHENG X X,ZHAO W X,et al.HaluEval:A Large-Scale Hallucination Evaluation Benchmark for Large Language Models[J].arXiv:2305.11747,2023.
[14]GARDENT C,ANASTASIA S,SHASHI N,et al.CreatingTraining Corpora for NLG Micro-Planners[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017.
[15]DONG Y,LU W J,ZHENG Y C,et al.PUMA:Secure Inference of LLaMA-7B in Five Minutes[J].arXiv:2307.12533,2023.
[16]OUYANG L,JEFF W,XU J,et al.Training language models to follow instructions with human feedback[J].arXiv:2203.02155,2022.
[17]LEE N,WEI P,PENG X,et al.Factuality Enhanced Language Models for Open-Ended Text Generation[J].arXiv:2206.04624,2022.
[18]LI H Y,SU Y X,CAI D,et al.A Survey on Retrieval-Augmented Text Generation[J].arXiv:2202.01110,2022.
[19]MA J M,ZHENG Y C,FENG J,et al.SecretFlow-SPU:APerformant and User-Friendly Framework for Privacy-Preserving Machine Learning[C]//USENIX Annual Technical Conference.2023.
[20]KNOTT B,VENKATARAMAN S,HANNUN A Y,et al.CrypTen:Secure Multi-Party Computation Meets Machine Learning[J].arXiv:2109.00984,2021.
[21]JIA Y K,LIU S,WANG W H,et al.HyperEnclave:An Open and Cross-platform Trusted Execution Environment[J].arXiv:2212.04197,2022.
[22]YU W,LI Q Q,HE D,et al.TEE based Cross-silo Trustworthy Federated Learning Infrastructure[EB/OL].https://federated-learning.org/fl-ijcai-2022/Papers/FL-IJCAI-22_paper_8.pdf.
[23]WEI J,WANG X Z,SCHUURMANS D.et al.Chain-of-Thought Prompting Elicits Reasoning in Large Language Mo-dels[C]//NeurIPS.2022.
[24]QIN Y J,HU S D,LIN Y K,et al.Tool Learning with Foundation Models[J].arXiv:2304.08354,2023.
[25]FÁBIO P,IAN R.Ignore Previous Prompt:Attack Techniques For Language Models[J].arXiv:2211.09527,2022.
[26]ZHANG Y M,IPPOLITO D.Prompts Should not be Seen as Secrets:Systematically Measuring Prompt Extraction Attack Success[J].arXiv:2307.06865,2023.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!