计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 217-230.doi: 10.11896/jsjkx.241200055
陈昱妍1, 贾纪源2, 常婧雯1, 左凯文3, 肖仰华1
CHEN Yuyan1, JIA Jiyuan2, CHANG Jingwen1, ZUO Kaiwen3, XIAO Yanghua1
摘要: 近年来,大语言模型(Large Language Models,LLMs)在情感对话中展现了令人惊叹的能力,并且在实现目标方面表现出较强的能力。然而,现有研究主要集中在通过情感共鸣式的回复提供安慰,而不是通过这些回复实现特定的现实目标。为了填补这一研究空白,提出了一个名为SPEAKSMART的基准,其涵盖5个场景,用于评测LLMs在对话中通过高度共情的回复实现现实目标的能力。随后,引入了一个基于提供者满意度和请求者满意度的二维评测框架。利用SPEAKSMART对多种LLMs进行了评测,并设计了基线方法,以增强其在对话中生成具有说服力的共情回复的能力。实验结果表明,Claude3和LLaMA3-70B在不同场景中的表现最佳,而其他LLMs则有提升空间,这为增强LLMs处理需要高度共情回复以实现目标的现实任务的能力奠定了基础。
中图分类号:
[1]LUO M,WARREN C J,CHENG L,et al.Assessing empathy in large language models with real-world physician-patient interactions [J].arXiv:2405.16402,2024. [2]WOODSIDE A G,SOOD S,MILLER K E.When consumersand brands talk:Storytelling theory and research in psychology and marketing [J].Psychology & Marketing,2008,25(2):97-145. [3]ALMAZROUEI E,ALOBEIDLI H,ALSHAMSI A,et al.TheFalcon series of open language models [J].arXiv:2311.16867,2023. [4]JIANG H,ZHANG X,CAO X,et al.PersonaLLM:Investigating the ability of GPT-3.5 to express personality traits and gender differences [J].arXiv:2305.02547,2023. [5]LEE Y K,SUH J,ZHAN H,et al.Large language models produce responses perceived to be empathic [J].arXiv:2403.18148,2024. [6]OpenAI,ACHIAN J,ADLER S,et al.GPT-4 technical report [J].arXiv:2303.08774,2023. [7]LOH S B,SESAGIRI RAAMKUMAR A.Harnessing large language models' empathetic response generation capabilities for online mental health counselling support [J].arXiv:2310.08017,2023. [8]ULLMAN T.Large language models fail on trivial alterations totheory-of-mind tasks [J].arXiv:2302.08399,2023. [9]ZHAO W X,ZHAO Y Y,LU X,et al.Is ChatGPT equipped with emotional dialogue capabilities? [J].arXiv:2304.09582,2023. [10]ABDELNABI S,GOMAA A,SIVAPRASAD S,et al.LLM-deliberation:Evaluating LLMs with interactive multi-agent negotiation games [J].arXiv:2309.17234,2023. [11]GRATTAFIORI A,DUBEY A,JAUHRI A,et al.The llama 3 herd of models[J].arXiv:2407.21783.2024. [12]BIANCHI F,CHIA P J,YUKSEKGONUL M,et al.How well can LLMs negotiate? NegotiationArena platform and analysis [J].arXiv:2402.05863,2024. [13]KWON D,WEISS E,KULSHRESTHA T,et al.Are LLMs effective negotiators? Systematic evaluation of the multifaceted capabilities of LLMs in negotiation dialogues [J].arXiv:2402.13550,2024. [14]LI H,LEUNG J,SHEN Z.Towards goal-oriented large lan-guage model prompting:A survey [J].arXiv:2401.14043,2024. [15]CHEN Z,WHITE M,MOONEY R,et al.When is tree search useful for LLM planning? It depends on the discriminator [J].arXiv:2402.10890,2024. [16]ZHANG Q,WANG Y,YU T,et al.Reviseval:Improving llm-as-a-judge via response-adapted references[J].arXiv:2410.05193,2024. [17]BANDURA A.Self-efficacy:Toward a unifying theory of beha-vioral change [J].Psychological Review,1977,84(2):191. [18]PETTY R E,CACIOPPO J T.The elaboration likelihood model of persuasion[J].Advances in Experimental Social Psychology,1986,19:123-205. [19]BREHM J W.A theory of psychological reactance [M].Academic Press,1966. [20]DECI E L,RYAN R M.Intrinsic motivation and self-determination in human behavior [M].Springer Science & Business Media,2013. [21]Skinner B F.Science and human behavior [M].New York:Simon and Schuster,1953. [22]SKINNER B F.Science and human behavior (Vol.92904) [M].New York:Simon and Schuster,1965. [23]SKINNER B F.The behavior of organisms:An experimental analysis [M].BF Skinner Foundation,2019. [24]TAJFEL H.Experiments in intergroup discrimination [J].Scientific American,1970,223(5):96-103. [25]HOMANS G C.The human group [M].Routledge,2017. [26]CIALDINI R B.Influence:The psychology of persuasion [M].New York:Collins,2007. [27]ZHANG J D,LIU J F,WANG Z Y,et al.AI Question-Answe-ring Driven by Large Models in User-Responsive Scenarios:Ta-king Medical Triage as an Example[J].Journal of Nanjing University (Information Management Edition),2025,41(1):100-120. [28]LOEWENSTEIN G.The psychology of curiosity:A review and reinterpretation[J].Psychological Bulletin,1994,116(1):75-98. [29]CSIKSZENTMIHALYI M.Beyond boredom and anxiety:Experiencing flow in work and play[M].Jossey-Bass,2000. [30]CHEN Y,YUAN Y,LIU P,et al.Talk funny! A large-scale humor response dataset with chain-of-humor interpretation [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:17826-17834. [31]PAL D,VANIJJA V,THAPLIYAL H,et al.What affects theusage of artificial conversational agents? An agent personality and love theory perspective [J].Computers in Human Behavior,2023,145:107788. [32]CHEN Y R,XING X F,LIN J K,et al.SoulChat:ImprovingLLMs' empathy,listening,and comfort abilities through fine-tuning with multi-turn empathy conversations [C]//Findings of the Association for Computational Linguistics:EMNLP 2023.2023:1170-1183. [33]LE S T,FAN A,AKIKI C,et al.BLOOM:A 176B-parameter open-access multilingual language model [J].arXiv:2211.05100,2022. [34]BAI Y,KADAVATH S,KUNDU S,et al.Constitutional AI:Harmlessness from AI feedback [J].arXiv:2212.08073,2022. [35]ANTHROPI C.Claude 3 haiku:Our fastest model yet [EB/OL].https://www.anthropic.com. [36]BROWN T B,MANN B,RYDER N,et al.Language models are few-shot learners [C]//Proceedings of the 34th International Conference on Neural Information Processing Systems(NIPS'20).2020:1877-1901. [37]TOUVRON H,MARTIN L,STONE K,et al.Llama 2:Openfoundation and fine-tuned chat models [J].arXiv:2307.09288,2023. [38]CHIANG W L,LI Z,LIN Z,et al.Vicuna:An open-source chatbot impressing GPT-4 with 90% ChatGPT quality [EB/OL].https://vicuna.lmsys. [39]ZHENG L M,CHIANG W L,SHENG Y,et al.Judging LLM-as-a-judge with MT-Bench and Chatbot Arena [C]//Procee-dings of the 37th International Conference on Neutral Information Processing Systems.2020:46595-46623. [40]PRAKASH V,LEE K,BHATTACHARYA A,et al.Assessment of LLM Responses to End-user Security Questions[J].arXiv:2411.14571,2024. [41]ZOU Z,MUBIN O,ALNAJJAR F,et al.A pilot study of measuring emotional response and perception of LLM-generated questionnaire and human-generated questionnaires[J].Scientific Reports,2024,14(1):2781. [42]ZENG H,NIU C,WU F,et al.Personalized LLM for GeneratingCustomized Responses to the Same Query from Different Users[J].arXiv:2412.11736,2024. [43]ZHOU Y,HUANG Z,LU F,et al.Don't Say No:Jailbreaking LLM by Suppressing Refusal[J].arXiv:2404.16369,2024. [44]YADKORI Y A,KUZBORSKIJ I,GYÖRGY A,et al.To Believe or Not to Believe Your LLM[J].arXiv:2406.02543,2024. [45]PHUTE M,HELBLING A,HULL M,et al.Llm self defense:By self examination,llms know they are being tricked[J].arXiv:2308.07308,2023. [46]MCKNIGHT P E,NAJAB J.Mann-Whitney U Test[J].TheCorsini Encyclopedia of Psychology,2010,84(3):1. [47]CHEONG I,XIA K,FENG K J K,et al.(A) I Am Not a Lawyer,But…:Engaging Legal Experts towards Responsible LLM Policies for Legal Advice[C]//The 2024 ACM Conference on Fairness,Accountability,and Transparency.2024:2454-2469. [48]CHEN Y,LIU Y,YAN J,et al.See what llms cannot answer:A self-challenge framework for uncovering llm weaknesses[J].arXiv:2408.08978,2024. [49]LI M,SU Y S,HUANG H Y,et al.Language-specific representation of emotion-concept knowledge causally supports emotion inference [J].Iscience,2024,27(12):11401. [50]LEE Y J,LIM C G,CHOI H J.Does GPT-3 generate empathetic dialogues? A novel in-context example selection method and automatic evaluation metric for empathetic dialogue generation [C]//Proceedings of the 29th International Conference on Computational Linguistics.2022:669-683. [51]JIANG S W,ZHANG J W,HUA L S,et al.Implementation of a Meteorological Database Question-Answering Model Based on Large Model Retrieval-Augmented Generation[J].Application Research of Computers,2024,41(2):45-56. [52]TIAN Y L,SI F D,NIU L,et al.Research on Fault Tree Intelligent Question-Answering Method Based on Large Model Decision-Making[J].Journal of Systems Engineering,2024,42(5):78-89. [53]ZHANG J Y,WANG T K,MO C Y,et al.Construction and Evaluation of an Electric Power Knowledge Base Intelligent Question-Answering System Based on Large Language Models[J].Computer Science and Applications,2024,41(6):23-34. [54]TAO X Y.Research on Intelligent Question-Answering System of Large Language Models Based on Hybrid Architecture[J].Posts and Telecommunications Design Technology,2024 (5):48-55. [55]LI B X.Stable Output Method of Retrieval-Augmented LargeModels for Private Question-Answering Systems[J].CAAI Transactions on Intelligent Systems,2024,42(4):67-78. [56]CHEN J Z,WANG S Y,LUO H R.Knowledge Graph Question-Answering Integrating Large Model Fine-Tuning and Graph Neural Networks[J].Computer Engineering and Applications,2024,60(24):166-175. [57]HUANG Z,SHAN W Z,GUO Z P,et al.Design and Implementation of a Trustworthy Large Model Government Affairs Ques-tion-Answering System[C]//Proceedings of the 2024 World In-telligent Industry Expo on Artificial Intelligence Security Go-vernance Theme Forum.2024:193-197. [58]CHEN D H,LU X,ZHANG Y F.Research on Question-Answe-ring System in the Bidding Field Based on LangChain+LLM[J].Journal of Hubei University of Economics (Statistics and Mathematics Edition),2024,15(3):45-55. [59]ZHANG J D,LIU J F,WANG Z Y,et al.AI Question-Answering Driven by Large Models in User-Responsive Scenarios:Taking Medical Triage as an Example[J].Journal of Nanjing University (Information Management Edition),2025,41(1):100-120. [60]ZHAN H L,WANG Y F,FENG T,et al.Let's negotiate! A survey of negotiation dialogue systems [J].arXiv:2402.01097,2024. [61]HUA Y,QU L,HAFFARI G.Assistive large language modelagents for socially-aware negotiation dialogues [J].arXiv:2402.01737,2024. |
|