大语言模型安全现状与挑战

doi:10.11896/jsjkx.231100066

Computer Science ›› 2024, Vol. 51 ›› Issue (1): 68-71.doi: 10.11896/jsjkx.231100066

• Special Issue on the 57th Anniversary of Computer Science • Previous Articles Next Articles

Security of Large Language Models:Current Status and Challenges

ZHAO Yue¹, HE Jinwen^1,2, ZHU Shenchen^1,2, LI Congyi^1,2, ZHANG Yingjie^1,2, CHEN Kai^1,2

1 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100085,China
2 School of Cyber Security,University of Chinese Academy of Sciences,Beijing 101408,China

Received:2023-11-10 Revised:2023-12-20 Online:2024-01-15 Published:2024-01-12
About author:ZHAO Yue,born in 1992,Ph.D,research assistant,is a member of CCF(No.K7521M).Her main research interest is AI security.
CHEN Kai,born in 1982,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.76085D).His main research interests include software analysis and testing,AI security and privacy.

Abstract

Abstract: Large language models have revolutionized natural language processing,offering exceptional text understanding and generation capabilities that benefit society significantly.However,they also pose notable security challenges,demanding the attention of security researchers.This paper introduces these concerns,including malicious applications with prompt injection attacks,reliable issues arising from model hallucinations,privacy risks tied to data protection,and the problem of prompt leakage.To enhance model security,a comprehensive approach is required,focusing on privacy preservation,interpretability research,and model distribution stability and robustness.

Key words: Large language models, AI security, Malicious applications, Model hallucinations, Privacy security, Prompt leakage

CLC Number:

TP389

ZHAO Yue, HE Jinwen, ZHU Shenchen, LI Congyi, ZHANG Yingjie, CHEN Kai. Security of Large Language Models:Current Status and Challenges[J].Computer Science, 2024, 51(1): 68-71.

References

[1]BRANCH H J,CEFALU J R,MCHUGH J,et al.Evaluatingthe susceptibility of pre-trained language models via handcrafted adversarial examples[J].arXiv:2209.02128,2022.
[2]KEVIN L.The entire prompt of Microsoft Bing Chat![EB/OL][2023-02-09].https://twitter.com/kliu128/status/1623472922374574080.
[3]FÁBIO P,RIBEIRO I.Ignore Previous Prompt:Attack Techniques For Language Models[J].arXiv:2211.09527,2022.
[4]KAI G,SAHAR A,SHAILESH M,et al.More than you’ve asked for:A Comprehensive Analys is of Novel Prompt Injection Threats to Application-Integrated Large Language Models[J].arXiv:2302.12173,2023.
[5]Trigaten.Learn Prompting:Indirect Injection[EB/OL].[2023-05-29].https://learnprompting.org/docs/prompthacking/offensivemeasures/indirectinjection.
[6]LIU Y,JIA Y,GENG R,et al.Prompt Injection Attacks and Defenses in LLM-Integrated Applications[J].arXiv:2310.12815,2023.
[7]LIU X,CHENG H,HE P,et al.Adversarial training for large neural language models[J].arXiv:2004.08994,2020.
[8]MicroSoft.Content filtering[EB/OL].[2023-06-09].https://learn.microsoft.com/en-us/azure/cognitiveservices/openai/con-cepts/content-filter.
[9]Google.Generative AI for Developers:ContentFilter[EB/OL].[2023-05-06].https://developers.generativeai.google/api/python/google/ai/generativelanguage/ContentFilter.
[10]JI Z W,NAYEON L,RITA F,et al.Survey of Hallucination in Natural Language Generation[J].ACM Computing Surveys 2023,55(12):1-38.
[11]ZHANG Y,LI Y F,CUI L Y,et al.Siren’s Song in the AIOcean:A Survey on Hallucination in Large Language Models[J].arXiv:2309.01219,2023.
[12]SEWON M,KRISHNA K,LYU X X,et al.FActScore:Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation[J].arXiv:2305.14251,2023.
[13]LI J Y,CHENG X X,ZHAO W X,et al.HaluEval:A Large-Scale Hallucination Evaluation Benchmark for Large Language Models[J].arXiv:2305.11747,2023.
[14]GARDENT C,ANASTASIA S,SHASHI N,et al.CreatingTraining Corpora for NLG Micro-Planners[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017.
[15]DONG Y,LU W J,ZHENG Y C,et al.PUMA:Secure Inference of LLaMA-7B in Five Minutes[J].arXiv:2307.12533,2023.
[16]OUYANG L,JEFF W,XU J,et al.Training language models to follow instructions with human feedback[J].arXiv:2203.02155,2022.
[17]LEE N,WEI P,PENG X,et al.Factuality Enhanced Language Models for Open-Ended Text Generation[J].arXiv:2206.04624,2022.
[18]LI H Y,SU Y X,CAI D,et al.A Survey on Retrieval-Augmented Text Generation[J].arXiv:2202.01110,2022.
[19]MA J M,ZHENG Y C,FENG J,et al.SecretFlow-SPU:APerformant and User-Friendly Framework for Privacy-Preserving Machine Learning[C]//USENIX Annual Technical Conference.2023.
[20]KNOTT B,VENKATARAMAN S,HANNUN A Y,et al.CrypTen:Secure Multi-Party Computation Meets Machine Learning[J].arXiv:2109.00984,2021.
[21]JIA Y K,LIU S,WANG W H,et al.HyperEnclave:An Open and Cross-platform Trusted Execution Environment[J].arXiv:2212.04197,2022.
[22]YU W,LI Q Q,HE D,et al.TEE based Cross-silo Trustworthy Federated Learning Infrastructure[EB/OL].https://federated-learning.org/fl-ijcai-2022/Papers/FL-IJCAI-22_paper_8.pdf.
[23]WEI J,WANG X Z,SCHUURMANS D.et al.Chain-of-Thought Prompting Elicits Reasoning in Large Language Mo-dels[C]//NeurIPS.2022.
[24]QIN Y J,HU S D,LIN Y K,et al.Tool Learning with Foundation Models[J].arXiv:2304.08354,2023.
[25]FÁBIO P,IAN R.Ignore Previous Prompt:Attack Techniques For Language Models[J].arXiv:2211.09527,2022.
[26]ZHANG Y M,IPPOLITO D.Prompts Should not be Seen as Secrets:Systematically Measuring Prompt Extraction Attack Success[J].arXiv:2307.06865,2023.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Security of Large Language Models:Current Status and Challenges

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 2

Metrics

Comments

Recommended 0

[1]	LIU Yumeng, ZHAO Yijing, WANG Bicong, WANG Chao, ZHANG Baomin. Advances in SQL Intelligent Synthesis Technology [J]. Computer Science, 2024, 51(7): 40-48.
[2]	TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.