基于大型语言模型增广的少样本持续毒性检测

doi:10.11896/jsjkx.250600010

计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 321-330.doi: 10.11896/jsjkx.250600010

基于大型语言模型增广的少样本持续毒性检测

李雯莉¹, 冯小年², 钱铁云¹

1 武汉大学计算机学院武汉 430072
2 中国电力财务有限公司北京 100005

收稿日期:2025-06-03 修回日期:2025-09-04 发布日期:2026-03-12
通讯作者: 钱铁云(qty@whu.edu.cn)
作者简介:(1146208171@qq.com)
基金资助:
国家自然科学基金(62576256,62276193);算力互联网与信息安全教育部重点实验室项目(2024ZD027);中央高校自主科研项目(2042022dx0001)

Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation

LI Wenli¹, FENG Xiaonian², QIAN Tieyun¹

1 School of Computer Science, Wuhan University, Wuhan 430072, China
2 China Power Finance Company, Limited, Beijing 100005, China

Received:2025-06-03 Revised:2025-09-04 Online:2026-03-12
About author:LI Wenli,born in 2002,postgraduate.Her main research interest is LLM safety.
QIAN Tieyun,born in 1970,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.13483M).Her main research interests include Web mining and natural language processing.
Supported by:
National Natural Science Foundation of China(62576256,62276193),Key Laboratory of Computing Power Network and Information Security,Ministry of Education(2024ZD027) and Fundamental Research Funds for the Central Universities,China(2042022dx0001).

摘要/Abstract

摘要： 毒性言论检测是困扰网络社交媒体的一个具有挑战性的问题。现有方法尽管能够有效识别常见的有毒信息或经由特定扰动模式产生的有毒信息,但也面临两大挑战:1)由于毒性类型和语言表达的多样性,训练集不可能覆盖所有样本,毒性检测技术面临着毒性文本数据缺乏的问题;2)现实中的恶意用户倾向于创建新的扰动模式来欺骗文本毒性检测器,如何将模型对旧扰动模式的检测能力迁移到新扰动模式上,已成为一个亟待解决的问题。对此,提出了一种基于大型语言模型增广的少样本持续毒性检测模型。其基本思想是利用大型语言模型对训练集中的样例进行增广,再将持续学习与毒性检测技术相结合,确保毒性检测模型能够持续高效地检测文本中的毒性。通过上述方式,模型不仅能够更精确地理解有关不同扰动模式的特征,还能提高其在少样本持续毒性检测任务中的适应性与鲁棒性。在最新的DynEscape数据集上进行的实验表明,该模型优于现有基线模型,达到了最佳性能。

关键词: 毒性检测, 持续学习, 少样本学习, 对比学习, 大型语言模型

Abstract: Toxic speech detection is a challenging problem plaguing online social media.While existing methods can effectively identify common toxic information or toxic information generated through specific perturbation patterns,they face two major challenges:1)Due to the diversity of toxicity types and linguistic expressions,training data cannot cover all samples,leading to a shortage of toxic text data for detection techniques;2)Malicious users in real-world scenarios tend to create new perturbation patterns to deceive text toxicity detectors.How to transfer the model’s detection capabilities for old perturbation patterns to new ones has become an urgent issue to address.To address these issues,this paper proposes a few-shot continuous toxicity detection model based on large language model augmentation.The core idea is to use large language models to augment examples in the training set,then combine continuous learning with toxicity detection techniques to ensure the toxicity detection model can continuously and efficiently detect toxicity in text.Additionally,the model not only achieves more precise understanding of features related to different disturbance patterns but also enhances its adaptability and robustness in the few-shot continuous toxicity detection task.The model is tested on the latest DynEscape dataset,and the results demonstrate that it outperforms existing baseline models,achieving optimal performance.

Key words: Toxicity detection, Continual learning, Few-shot learning, Contrastive learning, Large language models

中图分类号:

TP391

李雯莉, 冯小年, 钱铁云. 基于大型语言模型增广的少样本持续毒性检测[J]. 计算机科学, 2026, 53(3): 321-330. https://doi.org/10.11896/jsjkx.250600010

LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation[J]. Computer Science, 2026, 53(3): 321-330. https://doi.org/10.11896/jsjkx.250600010

参考文献

[1]SLONJE R,SMITH P K,FRISÉN A.The nature of cyberbullying,and strategies for prevention[J].Computers in Human Behavior,2013,29(1):26-32.
[2]KOWALSKI R.Cyberbullying[C]//The Routledge Internatio-nal Handbook of Human Aggression.Routledge,2018:131-142.
[3]CHEN H,ZHU Y Z,LIU M Y,et al.Detection of Toxic Speech in Chinese Based on Large Language Models and Data Augmentation[J].Journal of Intelligence,2025,44(4):99-107.
[4]DEL VIGNA12 F,CIMINO A,DELL’ORLETTA F,et al.Hate me,hate me not:Hate speech detection on facebook[C]//Proceedings of the First Italian Conference on Cybersecurity(ITASEC17).2017:86-95.
[5]FORTUNA P,NUNES S.A survey on automatic detection of hate speech in text[J].ACM Computing Surveys,2018,51(4):1-30.
[6]LIANG P P,WU C,MORENCY L P,et al.Towards under-standing and mitigating social biases in language models[C]//International Conference on Machine Learning.PMLR,2021:6565-6576.
[7]GONGANE V U,MUNOT M V,ANUSE A D.Detection and moderation of detrimental content on social media platforms:current status and future directions[J].Social Network Analysis and Mining,2022,12(1):129.
[8]FELDMAN M,FRIEDLER S A,MOELLER J,et al.Certifying and removing disparate impact[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2015:259-268.
[9]DIXON L,LI J,SORENSEN J,et al.Measuring and mitigating unintended bias in text classification[C]//Proceedings of the 2018 AAAI/ACM Conference on AI,Ethics,and Society.2018:67-73.
[10]KANG H,CHEN J,LI Y,et al.Toxicity Detection towardsAdaptability to Changing Perturbations[J].arXiv:2412.15267,2024.
[11]QIN Z,WU D,LIU Y,et al.Few-shot hate speech detection based on the mind spore framework[J].ar-Xiv:2504.15987,2025.
[12]EMMERY C,KÁDÁR Á,CHRUPAŁA G,et al.Cyberbullying classifiers are sensitive to model-agnostic perturbations[J].ar-Xiv:2201.06384,2022.
[13]MARKOV T,ZHANG C,AGARWAL S,et al.A holistic approach to undesired content detection in the real world[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:15009-15018.
[14]ZHANG Z,CHEN J,YANG D.Mitigating biases in hate speech detection from A causal perspective[C]//Findings of the Association for Computational Linguistics:EMNLP 2023.2023:6610-6625.
[15]LE T,LEE J,YEN K,et al.Perturbations in the wild:Leveraging human-written text perturbations for realistic adversarial attack and defense[J].arXiv:2203.10346,2022.
[16]BESPALOV D,BHABESH S,XIANG Y,et al.Towards buil-ding a robust toxicity predictor[J].arXiv:2404.08690,2024.
[17]YU S,CHOI J,KIM Y.Don’t be a Fool:Pooling Strategies in Offensive Language Detection from User-Intended Adversarial Attacks[J].arXiv:2403.15467,2024.
[18]WU M J,YANG X,PAN C F,et al.Autoencoders Combinedwith Continuous Learning:Current Status,Challenges,and Prospects[J].Journal of Computers,2025,48(2):317-357.
[19]ZHOU D W,WANG F Y,YE H J,et al.A Review of Incremental Learning Algorithms Based on Deep Learning[J].Journal of Computers,2023,46(8):1577-1605.
[20]WANG L,ZHANG X,SU H,et al.A comprehensive survey of continual learning:Theory,method and application[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(8):5362-5383.
[21]KIRKPATRICK J,PASCANU R,RABINOWITZ N,et al.Overcoming catastrophic forgetting in neural networks[J].Proceedings of the National Academy of Sciences,2017,114(13):3521-3526.
[22]JUNG H,JU J,JUNG M,et al.Less-forgetting learning in deep neural networks[J].arXiv:1607.00122,2016.
[23]LI Z,HOIEM D.Learning without forgetting[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(12):2935-2947.
[24]CHAUDHRY A,ROHRBACH M,ELHOSEINY M,et al.On tiny episodic memories in continual learning[J].arXiv:1902.10486,2019.
[25]RIEMER M,CASES I,AJEMIAN R,et al.Learning to learnwithout forgetting by maximizing transfer and minimizing interference[J].arXiv:1810.11910,2018.
[26]TIWARI R,KILLAMSETTY K,IYER R,et al.Gcr:Gradient coreset based replay buffer selection for continual learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:99-108.
[27]REBUFFI S A,KOLESNIKOV A,SPERL G,et al.icarl:Incremental classifier and representation learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2001-2010.
[28]PETIT G,POPESCU A,SCHINDLER H,et al.Fetril:Feature translation for exemplar-free class-incremental learning[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2023:3911-3920.
[29]WANG Z,LIU Y,JI T,et al.Rehearsal-free continual language learning via efficient parameter isolation[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:10933-10946.
[30]DU F,YANG Y,ZHAO Z,et al.Efficient perturbation inferenceand expandable network for continual learning[J].Neural Networks,2023,159:97-106.
[31]WANG L,ZHANG X,LI Q,et al.Coscl:Cooperation of small continual learners is stronger than a big one[C]//European Conference on Computer Vision.Cham:Springer Nature Swit-zerland,2022:254-271.
[32]WANG L,ZHANG X,LI Q,et al.Incorporating neuro-inspired adaptability for continual learning in artificial intelligence[J].Nature Machine Intelligence,2023,5(12):1356-1368.
[33]SONG Y,WANG T,CAI P,et al.A comprehensive survey of few-shot learning:Evolution,applications,challenges,and opportunities[J].ACM Computing Surveys,2023,55(13s):1-40.
[34]MA Y,ZHONG G,WANG Y,et al.Metacgan:A novel ganmodel for generating high quality and diversity images with few training data[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-7.
[35]SHEN Z,LIU Z,QIN J,et al.Partial is better than all:Revisiting fine-tuning strategy for few-shot learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:9594-9602.
[36]ELSKEN T,STAFFLER B,METZEN J H,et al.Meta-learning of neural architectures for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12365-12375.
[37]MENSINK T,VERBEEK J,PERRONNIN F,et al.Distance-based image classification:Generalizing to new classes at near-zero cost[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(11):2624-2637.
[38]REN J,FORT S,LIU J,et al.A simple fix to mahalanobis distance for improving near-ood detection[J].arXiv:2106.09022,2021.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于大型语言模型增广的少样本持续毒性检测

Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0