计算机科学 ›› 2025, Vol. 52 ›› Issue (4): 240-248.doi: 10.11896/jsjkx.240900008
朱述承1, 霍虹颖2, 王伟康3, 刘颖1, 刘鹏远2,4
ZHU Shucheng1, HUO Hongying2, WANG Weikang3, LIU Ying1, LIU Pengyuan2,4
摘要: 随着大语言模型的迅速发展,模型公平性日益受到关注,目前研究主要聚焦于生成文本及下游任务中的偏见。为了生成更加公平的文本,需要仔细设计和审查提示语的公平性。为此,采用了4个中文大语言模型作为优化器,自动迭代生成描述优势群体和劣势群体的公平提示语。同时,研究模型温度、初始提示语类型及优化方向等变量对优化过程的影响,并评估思维链、角色扮演等提示语风格的公平性。结果显示,大语言模型能有效生成更无偏或有偏的提示语,优势群体的提示语在低温度下优化效果更佳。生成偏见提示语相对困难,模型采用反对抗策略应对。使用问句作为初始提示可产生更随机但更高质量的输出。不同模型表现出不同的优化策略,其中思维链和消偏风格的提示语生成的文本更为公平。提示语在模型公平性中至关重要,需进一步研究其公平性。
中图分类号:
[1]ZHOU X,ZHU H,MATHUR L,et al.SOTOPIA:Interactive Evaluation for Social Intelligence in Language Agents[C]//The Twelfth International Conference on Learning Representations.2024. [2]ZHOU Y,MURESANU A I,HAN Z,et al.Large LanguageModels are Human-Level Prompt Engineers[C]//The Eleventh International Conference on Learning Representations.2023. [3]PRYZANT R,ITER D,LI J,et al.Automatic Prompt Optimization with “Gradient Descent” and Beam Search[C]//Procee-dings of the 2023 Conference on Empirical Methods in Natural Language Processing.2023:7957-7968. [4]YANG C,WANG X,LU Y,et al.Large Language Models asOptimizers[C]//The Twelfth International Conference on Learning Representations.2024. [5]KOJIMA T,GU S S,REID M,et al.Large language models are zero-shot reasoners[J].Advances in Neural Information Processing Systems,2022,35:22199-22213. [6]SHAIKH O,ZHANG H,HELD W,et al.On Second Thought,Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:4454-4470. [7]CALISKAN A,BRYSON J J,NARAYANAN A.Semantics derived automatically from language corpora contain human-like biases[J].Science,2017,356(6334):183-186. [8]HADA R,SETH A,DIDDEE H,et al.“Fifty Shades of Bias”:Normative Ratings of Gender Bias in GPT Generated English Text[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.2023:1862-1876. [9]VENKIT P N,GAUTAM S,PANCHANADIKAR R,et al.Nationality Bias in Text Generation[C]//Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics.2023:116-122. [10]CHENG M,DURMUS E,JURAFSKY D.Marked Personas:Using Natural Language Prompts to Measure Stereotypes in Language Models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:1504-1532. [11]ZHU S,WANG W,LIU Y.Quite Good,but Not Enough:Na-tionality Bias in Large Language Models-a Case Study of ChatGPT[C]//Proceedings of the 2024 Joint International Conference on Computational Linguistics,Language Resources and Evaluation(LREC-COLING 2024).2024:13489-13502. [12]FENG S,PARK C Y,LIU Y,et al.From Pretraining Data toLanguage Models to Downstream Tasks:Tracking the Trails of Political Biases Leading to Unfair NLP Models[C]//Procee-dings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:11737-11762. [13]KIRK H R,JUN Y,VOLPIN F,et al.Bias out-of-the-box:An empirical analysis of intersectional occupational biases in popular generative language models[J].Advances in neural information processing systems,2021,34:2611-2624. [14]GUO Q,WANG R,GUO J,et al.Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers[C]//The Twelfth International Conference on Learning Representations.2024. [15]YE Q,AHMED M,PRYZANT R,et al.Prompt engineering a prompt engineer[J].arXiv:2311.05661,2023. [16]HSIEH C J,SI S,YU F X,et al.Automatic engineering of long prompts[J].arXiv:2311.10117,2023. [17]CHENG J,LIU X,ZHENG K,et al.Black-box prompt optimization:Aligning large language models without model training[J].arXiv:2311.04155,2023. [18]WANG X,LI C,WANG Z,et al.PromptAgent:Strategic Planning with Langu-age Models Enables Expert-level Prompt Optimization[C]//The Twelfth International Conference on Lear-ning Representations.2024. [19]YAO H,ZHANG R,YU L,et al.SEP:Self-Enhanced Prompt Tuning for Visual-Language Model[J].arXiv:2405.15549,2024. [20]PENG K,DING L,ZHONG Q,et al.Towards Making the Most of ChatGPT for Machine Translation[C]//Findings of the Association for Computational Linguistics:EMNLP 2023.2023:5622-5633. [21]SHEN X,CHEN Z,BACKES M,et al.In chatgpt we trust?measuring and characterizing the reliability of chatgpt[J].ar-Xiv:2304.08979,2023. [22]BECK T,SCHUFF H,LAUSCHER A,et al.Sensitivity,per-formance,robustness:Deconstructing the effect of sociodemographic prompting[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics(Volume 1:Long Papers).2024:2589-2615. [23]TAMKIN A,ASKELL A,LOVITT L,et al.Evaluating andmitigating discrimination in language model decisions[J].arXiv:2312.03689,2023. [24]LI C,WANG J,ZHANG Y,et al.The Good,The Bad,andWhy:Unveiling Emotions in Generative AI[C]//Forty-first International Conference on Machine Learning.2024. [25]LI C,WANG J,ZHANG Y,et al.Large language models understand and can be enhanced by emotional stimuli[J].arXiv:2307.11760,2023. [26]GEHMAN S,GURURANGAN S,SAP M,et al.RealToxici-tyPrompts:Evaluating Neural Toxic Degeneration in Language Models[C]//Findings of the Association for Computational Linguistics:EMNLP 2020.2020:3356-3369. [27]LIU Y,YU J,SUN H,et al.Efficient Detection of ToxicPrompts in Large Language Models[J].arXiv:2408.11727,2024. [28]GUPTA S,SHRIVASTAVA V,DESHPANDE A,et al.BiasRuns Deep:Implicit Reasoning Biases in Persona-Assigned LLMs[C]//The Twelfth International Conference on Learning Representations.2024. [29]WALLACE E,FENG S,KANDPAL N,et al.Universal Adversarial Triggers for Attacking and Analyzing NLP[C]//Proceed-ings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:2153-2162. [30]ZHU S,LIU Y.Offensiveness Analysis of Chinese Group Ad-dressing Terms and Dataset Construction[C]//Workshop on Chinese Lexical Semantics.Singapore:Springer Nature Singapore,2023:342-356. [31]GILARDI F,ALIZADEH M,KUBLI M.ChatGPT outperforms crowd workers for text-annotation tasks[J].Proceedings of the National Academy of Sciences,2023,120(30):e2305016120. [32]GONEN H,IYER S,BLEVINSl T,et al.Demystifying Prompts in Language Models via Perplexity Estimation[C]//Findings of the Association for Computational Linguistics:EMNLP 2023.2023:10136-10148. [33]HONG J,LEE N,THORNE J.Reference-free monolithic pre-ference optimization with odds ratio[J].arXiv:2403.07691,2024. [34]WANG B,CHEN W,PEI H,et al.DecodingTrust:A Comprehensive Assessment of Trustworthiness in GPT Models[C]//Proceedings of the 37th International Conference onNeural Information Processing Systems.2024:31232-31339. |
|