计算机科学 ›› 2025, Vol. 52 ›› Issue (4): 240-248.doi: 10.11896/jsjkx.240900008

• 人工智能 • 上一篇    下一篇

基于大语言模型自身的提示语公平性自动优化与评估

朱述承1, 霍虹颖2, 王伟康3, 刘颖1, 刘鹏远2,4   

  1. 1 清华大学人文学院 北京 100084
    2 北京语言大学信息科学学院 北京 100083
    3 上海财经大学信息管理与工程学院 上海 200433
    4 北京语言大学国家语言资源监测与研究平面媒体中心 北京 100083
  • 收稿日期:2024-08-31 修回日期:2025-02-05 出版日期:2025-04-15 发布日期:2025-04-14
  • 通讯作者: 刘颖(yingliu@tsinghua.edu.cn)
  • 作者简介:(zhu_shucheng@126.com)
  • 基金资助:
    2018年度哲学社会科学基金重大项目(18ZDA238);CCF-百度松果基金(CCF-BAIDU202323)

Automatic Optimization and Evaluation of Prompt Fairness Based on Large Language Model Itself

ZHU Shucheng1, HUO Hongying2, WANG Weikang3, LIU Ying1, LIU Pengyuan2,4   

  1. 1 School of Humanities,Tsinghua University,Beijing 100084,China
    2 College of Information Science,Beijing Language and Culture University,Beijing 100083,China
    3 School of Information Management and Engineering,Shanghai University of Finance and Economics,Shanghai 200433,China
    4 Language Resources Monitoring and Research Center Print Media Language Branch,Beijing Language and Culture University,Beijing 100083,China
  • Received:2024-08-31 Revised:2025-02-05 Online:2025-04-15 Published:2025-04-14
  • About author:ZHU Shucheng,born in 1994,Ph.D candidate,is a member of CCF(No.H9600G).His main research interests include computational linguistics and sociolinguistics.
    LIU Ying,born in 1969,Ph.D,professor,Ph.D supervisor.Her main research interests include computational linguistics and so on.
  • Supported by:
    2018 National Major Program of Philosophy and Social Science Fund(18ZDA238) and CCF-Baidu Open Fund (CCF-BAIDU202323).

摘要: 随着大语言模型的迅速发展,模型公平性日益受到关注,目前研究主要聚焦于生成文本及下游任务中的偏见。为了生成更加公平的文本,需要仔细设计和审查提示语的公平性。为此,采用了4个中文大语言模型作为优化器,自动迭代生成描述优势群体和劣势群体的公平提示语。同时,研究模型温度、初始提示语类型及优化方向等变量对优化过程的影响,并评估思维链、角色扮演等提示语风格的公平性。结果显示,大语言模型能有效生成更无偏或有偏的提示语,优势群体的提示语在低温度下优化效果更佳。生成偏见提示语相对困难,模型采用反对抗策略应对。使用问句作为初始提示可产生更随机但更高质量的输出。不同模型表现出不同的优化策略,其中思维链和消偏风格的提示语生成的文本更为公平。提示语在模型公平性中至关重要,需进一步研究其公平性。

关键词: 大语言模型, 提示语, 公平性, 自动评估, 自优化

Abstract: With the rapid development of large language models,the issue of model fairness has garnered increasing attention,primarily focusing on biases in generated text and downstream tasks.To produce fairer text,careful design and examination of the fairness of prompts are necessary.This study employs four Chinese large language models as optimizers to automatically and ite-ratively generate fair prompts that describe both advantaged and disadvantaged groups.Additionally,it investigates the impact of variables such as model temperature,initial prompt types,and optimized directions on the optimization process,while assessing the fairness of various prompt styles,including chain-of-thought and persona.The results indicate that large language models can effectively generate prompts that are either less biased or more biased,with prompts for advantaged groups performing better at lower temperature settings.Generating biased prompts is relatively more challenging,with the models employing anti-adversarial strategies to tackle this task.Using questions as initial prompts can yield outputs that are more random yet of higher quality.Different models exhibit distinct optimization strategies,with chain-of-thought and debiasing styles producing fairer text.Prompts play a crucial role in model fairness and warrant further investigation into their fairness.

Key words: Large language model, Prompt, Fairness, Automatic evaluation, Self-optimization

中图分类号: 

  • TP391
[1]ZHOU X,ZHU H,MATHUR L,et al.SOTOPIA:Interactive Evaluation for Social Intelligence in Language Agents[C]//The Twelfth International Conference on Learning Representations.2024.
[2]ZHOU Y,MURESANU A I,HAN Z,et al.Large LanguageModels are Human-Level Prompt Engineers[C]//The Eleventh International Conference on Learning Representations.2023.
[3]PRYZANT R,ITER D,LI J,et al.Automatic Prompt Optimization with “Gradient Descent” and Beam Search[C]//Procee-dings of the 2023 Conference on Empirical Methods in Natural Language Processing.2023:7957-7968.
[4]YANG C,WANG X,LU Y,et al.Large Language Models asOptimizers[C]//The Twelfth International Conference on Learning Representations.2024.
[5]KOJIMA T,GU S S,REID M,et al.Large language models are zero-shot reasoners[J].Advances in Neural Information Processing Systems,2022,35:22199-22213.
[6]SHAIKH O,ZHANG H,HELD W,et al.On Second Thought,Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:4454-4470.
[7]CALISKAN A,BRYSON J J,NARAYANAN A.Semantics derived automatically from language corpora contain human-like biases[J].Science,2017,356(6334):183-186.
[8]HADA R,SETH A,DIDDEE H,et al.“Fifty Shades of Bias”:Normative Ratings of Gender Bias in GPT Generated English Text[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.2023:1862-1876.
[9]VENKIT P N,GAUTAM S,PANCHANADIKAR R,et al.Nationality Bias in Text Generation[C]//Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics.2023:116-122.
[10]CHENG M,DURMUS E,JURAFSKY D.Marked Personas:Using Natural Language Prompts to Measure Stereotypes in Language Models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:1504-1532.
[11]ZHU S,WANG W,LIU Y.Quite Good,but Not Enough:Na-tionality Bias in Large Language Models-a Case Study of ChatGPT[C]//Proceedings of the 2024 Joint International Conference on Computational Linguistics,Language Resources and Evaluation(LREC-COLING 2024).2024:13489-13502.
[12]FENG S,PARK C Y,LIU Y,et al.From Pretraining Data toLanguage Models to Downstream Tasks:Tracking the Trails of Political Biases Leading to Unfair NLP Models[C]//Procee-dings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:11737-11762.
[13]KIRK H R,JUN Y,VOLPIN F,et al.Bias out-of-the-box:An empirical analysis of intersectional occupational biases in popular generative language models[J].Advances in neural information processing systems,2021,34:2611-2624.
[14]GUO Q,WANG R,GUO J,et al.Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers[C]//The Twelfth International Conference on Learning Representations.2024.
[15]YE Q,AHMED M,PRYZANT R,et al.Prompt engineering a prompt engineer[J].arXiv:2311.05661,2023.
[16]HSIEH C J,SI S,YU F X,et al.Automatic engineering of long prompts[J].arXiv:2311.10117,2023.
[17]CHENG J,LIU X,ZHENG K,et al.Black-box prompt optimization:Aligning large language models without model training[J].arXiv:2311.04155,2023.
[18]WANG X,LI C,WANG Z,et al.PromptAgent:Strategic Planning with Langu-age Models Enables Expert-level Prompt Optimization[C]//The Twelfth International Conference on Lear-ning Representations.2024.
[19]YAO H,ZHANG R,YU L,et al.SEP:Self-Enhanced Prompt Tuning for Visual-Language Model[J].arXiv:2405.15549,2024.
[20]PENG K,DING L,ZHONG Q,et al.Towards Making the Most of ChatGPT for Machine Translation[C]//Findings of the Association for Computational Linguistics:EMNLP 2023.2023:5622-5633.
[21]SHEN X,CHEN Z,BACKES M,et al.In chatgpt we trust?measuring and characterizing the reliability of chatgpt[J].ar-Xiv:2304.08979,2023.
[22]BECK T,SCHUFF H,LAUSCHER A,et al.Sensitivity,per-formance,robustness:Deconstructing the effect of sociodemographic prompting[C]//Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics(Volume 1:Long Papers).2024:2589-2615.
[23]TAMKIN A,ASKELL A,LOVITT L,et al.Evaluating andmitigating discrimination in language model decisions[J].arXiv:2312.03689,2023.
[24]LI C,WANG J,ZHANG Y,et al.The Good,The Bad,andWhy:Unveiling Emotions in Generative AI[C]//Forty-first International Conference on Machine Learning.2024.
[25]LI C,WANG J,ZHANG Y,et al.Large language models understand and can be enhanced by emotional stimuli[J].arXiv:2307.11760,2023.
[26]GEHMAN S,GURURANGAN S,SAP M,et al.RealToxici-tyPrompts:Evaluating Neural Toxic Degeneration in Language Models[C]//Findings of the Association for Computational Linguistics:EMNLP 2020.2020:3356-3369.
[27]LIU Y,YU J,SUN H,et al.Efficient Detection of ToxicPrompts in Large Language Models[J].arXiv:2408.11727,2024.
[28]GUPTA S,SHRIVASTAVA V,DESHPANDE A,et al.BiasRuns Deep:Implicit Reasoning Biases in Persona-Assigned LLMs[C]//The Twelfth International Conference on Learning Representations.2024.
[29]WALLACE E,FENG S,KANDPAL N,et al.Universal Adversarial Triggers for Attacking and Analyzing NLP[C]//Proceed-ings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).2019:2153-2162.
[30]ZHU S,LIU Y.Offensiveness Analysis of Chinese Group Ad-dressing Terms and Dataset Construction[C]//Workshop on Chinese Lexical Semantics.Singapore:Springer Nature Singapore,2023:342-356.
[31]GILARDI F,ALIZADEH M,KUBLI M.ChatGPT outperforms crowd workers for text-annotation tasks[J].Proceedings of the National Academy of Sciences,2023,120(30):e2305016120.
[32]GONEN H,IYER S,BLEVINSl T,et al.Demystifying Prompts in Language Models via Perplexity Estimation[C]//Findings of the Association for Computational Linguistics:EMNLP 2023.2023:10136-10148.
[33]HONG J,LEE N,THORNE J.Reference-free monolithic pre-ference optimization with odds ratio[J].arXiv:2403.07691,2024.
[34]WANG B,CHEN W,PEI H,et al.DecodingTrust:A Comprehensive Assessment of Trustworthiness in GPT Models[C]//Proceedings of the 37th International Conference onNeural Information Processing Systems.2024:31232-31339.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!