计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 68-71.doi: 10.11896/jsjkx.231100066
赵月1, 何锦雯1,2, 朱申辰1,2, 李聪仪1,2, 张英杰1,2, 陈恺1,2
ZHAO Yue1, HE Jinwen1,2, ZHU Shenchen1,2, LI Congyi1,2 , ZHANG Yingjie1,2, CHEN Kai1,2
摘要: 大语言模型因其出色的文本理解和生成能力,被广泛应用于自然语言处理领域并取得了显著成果,为社会各界带来了巨大的便利。然而,大语言模型自身仍存在明显的安全问题,严重影响其应用的可信性与可靠性,是安全学者需广泛关注的问题。文中针对大语言模型自身的安全问题,首先从基于大语言模型的恶意应用问题切入,阐述提示注入攻击及其相应的防御方法;其次,介绍大语言模型幻觉带来的可信问题,对幻觉问题的量化评估、幻觉来源和缓解技术是当前研究的重点;然后,大语言模型隐私安全问题强调了个人及企业数据的保护问题,一旦在进行人机交互时泄露商业秘密和个人敏感信息,将可能引发严重的安全风险,当前研究主要通过可信执行环境和隐私计算技术来进行风险规避;最后,提示泄露问题关注攻击者如何窃取有价值的提示词进行获利或通过个性化提示词泄露个人隐私。提升大语言模型的安全性需要综合考虑模型隐私保护、可解释性研究以及模型分布的稳定性与鲁棒性等问题。
中图分类号:
[1]BRANCH H J,CEFALU J R,MCHUGH J,et al.Evaluatingthe susceptibility of pre-trained language models via handcrafted adversarial examples[J].arXiv:2209.02128,2022. [2]KEVIN L.The entire prompt of Microsoft Bing Chat![EB/OL][2023-02-09].https://twitter.com/kliu128/status/1623472922374574080. [3]FÁBIO P,RIBEIRO I.Ignore Previous Prompt:Attack Techniques For Language Models[J].arXiv:2211.09527,2022. [4]KAI G,SAHAR A,SHAILESH M,et al.More than you’ve asked for:A Comprehensive Analys is of Novel Prompt Injection Threats to Application-Integrated Large Language Models[J].arXiv:2302.12173,2023. [5]Trigaten.Learn Prompting:Indirect Injection[EB/OL].[2023-05-29].https://learnprompting.org/docs/prompthacking/offensivemeasures/indirectinjection. [6]LIU Y,JIA Y,GENG R,et al.Prompt Injection Attacks and Defenses in LLM-Integrated Applications[J].arXiv:2310.12815,2023. [7]LIU X,CHENG H,HE P,et al.Adversarial training for large neural language models[J].arXiv:2004.08994,2020. [8]MicroSoft.Content filtering[EB/OL].[2023-06-09].https://learn.microsoft.com/en-us/azure/cognitiveservices/openai/con-cepts/content-filter. [9]Google.Generative AI for Developers:ContentFilter[EB/OL].[2023-05-06].https://developers.generativeai.google/api/python/google/ai/generativelanguage/ContentFilter. [10]JI Z W,NAYEON L,RITA F,et al.Survey of Hallucination in Natural Language Generation[J].ACM Computing Surveys 2023,55(12):1-38. [11]ZHANG Y,LI Y F,CUI L Y,et al.Siren’s Song in the AIOcean:A Survey on Hallucination in Large Language Models[J].arXiv:2309.01219,2023. [12]SEWON M,KRISHNA K,LYU X X,et al.FActScore:Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation[J].arXiv:2305.14251,2023. [13]LI J Y,CHENG X X,ZHAO W X,et al.HaluEval:A Large-Scale Hallucination Evaluation Benchmark for Large Language Models[J].arXiv:2305.11747,2023. [14]GARDENT C,ANASTASIA S,SHASHI N,et al.CreatingTraining Corpora for NLG Micro-Planners[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017. [15]DONG Y,LU W J,ZHENG Y C,et al.PUMA:Secure Inference of LLaMA-7B in Five Minutes[J].arXiv:2307.12533,2023. [16]OUYANG L,JEFF W,XU J,et al.Training language models to follow instructions with human feedback[J].arXiv:2203.02155,2022. [17]LEE N,WEI P,PENG X,et al.Factuality Enhanced Language Models for Open-Ended Text Generation[J].arXiv:2206.04624,2022. [18]LI H Y,SU Y X,CAI D,et al.A Survey on Retrieval-Augmented Text Generation[J].arXiv:2202.01110,2022. [19]MA J M,ZHENG Y C,FENG J,et al.SecretFlow-SPU:APerformant and User-Friendly Framework for Privacy-Preserving Machine Learning[C]//USENIX Annual Technical Conference.2023. [20]KNOTT B,VENKATARAMAN S,HANNUN A Y,et al.CrypTen:Secure Multi-Party Computation Meets Machine Learning[J].arXiv:2109.00984,2021. [21]JIA Y K,LIU S,WANG W H,et al.HyperEnclave:An Open and Cross-platform Trusted Execution Environment[J].arXiv:2212.04197,2022. [22]YU W,LI Q Q,HE D,et al.TEE based Cross-silo Trustworthy Federated Learning Infrastructure[EB/OL].https://federated-learning.org/fl-ijcai-2022/Papers/FL-IJCAI-22_paper_8.pdf. [23]WEI J,WANG X Z,SCHUURMANS D.et al.Chain-of-Thought Prompting Elicits Reasoning in Large Language Mo-dels[C]//NeurIPS.2022. [24]QIN Y J,HU S D,LIN Y K,et al.Tool Learning with Foundation Models[J].arXiv:2304.08354,2023. [25]FÁBIO P,IAN R.Ignore Previous Prompt:Attack Techniques For Language Models[J].arXiv:2211.09527,2022. [26]ZHANG Y M,IPPOLITO D.Prompts Should not be Seen as Secrets:Systematically Measuring Prompt Extraction Attack Success[J].arXiv:2307.06865,2023. |
|