基于大型语言模型增广的少样本持续毒性检测

doi:10.11896/jsjkx.250600010

Abstract

Abstract: Toxic speech detection is a challenging problem plaguing online social media.While existing methods can effectively identify common toxic information or toxic information generated through specific perturbation patterns,they face two major challenges:1)Due to the diversity of toxicity types and linguistic expressions,training data cannot cover all samples,leading to a shortage of toxic text data for detection techniques;2)Malicious users in real-world scenarios tend to create new perturbation patterns to deceive text toxicity detectors.How to transfer the model’s detection capabilities for old perturbation patterns to new ones has become an urgent issue to address.To address these issues,this paper proposes a few-shot continuous toxicity detection model based on large language model augmentation.The core idea is to use large language models to augment examples in the training set,then combine continuous learning with toxicity detection techniques to ensure the toxicity detection model can continuously and efficiently detect toxicity in text.Additionally,the model not only achieves more precise understanding of features related to different disturbance patterns but also enhances its adaptability and robustness in the few-shot continuous toxicity detection task.The model is tested on the latest DynEscape dataset,and the results demonstrate that it outperforms existing baseline models,achieving optimal performance.

Key words: Toxicity detection, Continual learning, Few-shot learning, Contrastive learning, Large language models

CLC Number:

TP391

LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation[J].Computer Science, 2026, 53(3): 321-330.

References

[1]SLONJE R,SMITH P K,FRISÉN A.The nature of cyberbullying,and strategies for prevention[J].Computers in Human Behavior,2013,29(1):26-32.
[2]KOWALSKI R.Cyberbullying[C]//The Routledge Internatio-nal Handbook of Human Aggression.Routledge,2018:131-142.
[3]CHEN H,ZHU Y Z,LIU M Y,et al.Detection of Toxic Speech in Chinese Based on Large Language Models and Data Augmentation[J].Journal of Intelligence,2025,44(4):99-107.
[4]DEL VIGNA12 F,CIMINO A,DELL’ORLETTA F,et al.Hate me,hate me not:Hate speech detection on facebook[C]//Proceedings of the First Italian Conference on Cybersecurity(ITASEC17).2017:86-95.
[5]FORTUNA P,NUNES S.A survey on automatic detection of hate speech in text[J].ACM Computing Surveys,2018,51(4):1-30.
[6]LIANG P P,WU C,MORENCY L P,et al.Towards under-standing and mitigating social biases in language models[C]//International Conference on Machine Learning.PMLR,2021:6565-6576.
[7]GONGANE V U,MUNOT M V,ANUSE A D.Detection and moderation of detrimental content on social media platforms:current status and future directions[J].Social Network Analysis and Mining,2022,12(1):129.
[8]FELDMAN M,FRIEDLER S A,MOELLER J,et al.Certifying and removing disparate impact[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.2015:259-268.
[9]DIXON L,LI J,SORENSEN J,et al.Measuring and mitigating unintended bias in text classification[C]//Proceedings of the 2018 AAAI/ACM Conference on AI,Ethics,and Society.2018:67-73.
[10]KANG H,CHEN J,LI Y,et al.Toxicity Detection towardsAdaptability to Changing Perturbations[J].arXiv:2412.15267,2024.
[11]QIN Z,WU D,LIU Y,et al.Few-shot hate speech detection based on the mind spore framework[J].ar-Xiv:2504.15987,2025.
[12]EMMERY C,KÁDÁR Á,CHRUPAŁA G,et al.Cyberbullying classifiers are sensitive to model-agnostic perturbations[J].ar-Xiv:2201.06384,2022.
[13]MARKOV T,ZHANG C,AGARWAL S,et al.A holistic approach to undesired content detection in the real world[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:15009-15018.
[14]ZHANG Z,CHEN J,YANG D.Mitigating biases in hate speech detection from A causal perspective[C]//Findings of the Association for Computational Linguistics:EMNLP 2023.2023:6610-6625.
[15]LE T,LEE J,YEN K,et al.Perturbations in the wild:Leveraging human-written text perturbations for realistic adversarial attack and defense[J].arXiv:2203.10346,2022.
[16]BESPALOV D,BHABESH S,XIANG Y,et al.Towards buil-ding a robust toxicity predictor[J].arXiv:2404.08690,2024.
[17]YU S,CHOI J,KIM Y.Don’t be a Fool:Pooling Strategies in Offensive Language Detection from User-Intended Adversarial Attacks[J].arXiv:2403.15467,2024.
[18]WU M J,YANG X,PAN C F,et al.Autoencoders Combinedwith Continuous Learning:Current Status,Challenges,and Prospects[J].Journal of Computers,2025,48(2):317-357.
[19]ZHOU D W,WANG F Y,YE H J,et al.A Review of Incremental Learning Algorithms Based on Deep Learning[J].Journal of Computers,2023,46(8):1577-1605.
[20]WANG L,ZHANG X,SU H,et al.A comprehensive survey of continual learning:Theory,method and application[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(8):5362-5383.
[21]KIRKPATRICK J,PASCANU R,RABINOWITZ N,et al.Overcoming catastrophic forgetting in neural networks[J].Proceedings of the National Academy of Sciences,2017,114(13):3521-3526.
[22]JUNG H,JU J,JUNG M,et al.Less-forgetting learning in deep neural networks[J].arXiv:1607.00122,2016.
[23]LI Z,HOIEM D.Learning without forgetting[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(12):2935-2947.
[24]CHAUDHRY A,ROHRBACH M,ELHOSEINY M,et al.On tiny episodic memories in continual learning[J].arXiv:1902.10486,2019.
[25]RIEMER M,CASES I,AJEMIAN R,et al.Learning to learnwithout forgetting by maximizing transfer and minimizing interference[J].arXiv:1810.11910,2018.
[26]TIWARI R,KILLAMSETTY K,IYER R,et al.Gcr:Gradient coreset based replay buffer selection for continual learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:99-108.
[27]REBUFFI S A,KOLESNIKOV A,SPERL G,et al.icarl:Incremental classifier and representation learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2001-2010.
[28]PETIT G,POPESCU A,SCHINDLER H,et al.Fetril:Feature translation for exemplar-free class-incremental learning[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2023:3911-3920.
[29]WANG Z,LIU Y,JI T,et al.Rehearsal-free continual language learning via efficient parameter isolation[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:10933-10946.
[30]DU F,YANG Y,ZHAO Z,et al.Efficient perturbation inferenceand expandable network for continual learning[J].Neural Networks,2023,159:97-106.
[31]WANG L,ZHANG X,LI Q,et al.Coscl:Cooperation of small continual learners is stronger than a big one[C]//European Conference on Computer Vision.Cham:Springer Nature Swit-zerland,2022:254-271.
[32]WANG L,ZHANG X,LI Q,et al.Incorporating neuro-inspired adaptability for continual learning in artificial intelligence[J].Nature Machine Intelligence,2023,5(12):1356-1368.
[33]SONG Y,WANG T,CAI P,et al.A comprehensive survey of few-shot learning:Evolution,applications,challenges,and opportunities[J].ACM Computing Surveys,2023,55(13s):1-40.
[34]MA Y,ZHONG G,WANG Y,et al.Metacgan:A novel ganmodel for generating high quality and diversity images with few training data[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-7.
[35]SHEN Z,LIU Z,QIN J,et al.Partial is better than all:Revisiting fine-tuning strategy for few-shot learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:9594-9602.
[36]ELSKEN T,STAFFLER B,METZEN J H,et al.Meta-learning of neural architectures for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:12365-12375.
[37]MENSINK T,VERBEEK J,PERRONNIN F,et al.Distance-based image classification:Generalizing to new classes at near-zero cost[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(11):2624-2637.
[38]REN J,FORT S,LIU J,et al.A simple fix to mahalanobis distance for improving near-ood detection[J].arXiv:2106.09022,2021.

Related Articles 15

[1]	XU Cheng, LIU Yuxuan, WANG Xin, ZHANG Cheng, YAO Dengfeng, YUAN Jiazheng. Review of Speech Disorder Assessment Methods Driven by Large Language Models [J]. Computer Science, 2026, 53(3): 307-320.
[2]	YU Chengcheng, JIANG Yongfa, CHEN Fangshu, WANG Jiahui, MENG Xiankai. Multi-view Exercise Representation and Forgetting Mechanism for Deep KnowledgeTracing [J]. Computer Science, 2026, 53(3): 107-114.
[3]	CHEN Lin, MA Longxuan, ZHANG Yongbing, HUANG Yuxin, GAO Shengxiang, YU Zhengtao. Industrial Text Classification for Chinese and Vietnamese Based on Prompt Learning and AdaptiveLoss Weighting [J]. Computer Science, 2026, 53(2): 312-321.
[4]	CHEN Xiaolan, MAO Shun, LI Weisheng, LIN Ronghua, TANG Yong. Robust Knowledge Tracing Model Based on Two-level Contrastive Learning [J]. Computer Science, 2026, 53(2): 31-38.
[5]	YANG Ming, HE Chaobo, YANG Jiaqi. Direction-aware Siamese Network for Knowledge Concept Prerequisite Relation Prediction [J]. Computer Science, 2026, 53(2): 39-47.
[6]	LI Chunying, TANG Zhikang, ZHUANG Zhiwei, LI Wenbo, GUO Yanxi, ZHANG Xiaowei. DCL-FKT:Personalized Knowledge Tracing via Dual Contrastive Learning and ForgettingMechanism [J]. Computer Science, 2026, 53(2): 99-106.
[7]	WANG Xinyu, SONG Xiaomin, ZHENG Huiming, PENG Dezhong, CHEN Jie. Contrastive Learning-based Masked Graph Autoencoder [J]. Computer Science, 2026, 53(2): 145-151.
[8]	WEI Jinsheng, ZHOU Su, LU Guanming , DING Jiawei. News Recommendation Algorithm Based on User Static and Dynamic Interests and DenoisedImplicit Negative Feedback [J]. Computer Science, 2026, 53(2): 152-160.
[9]	LIU Lilong, LIU Guoming, QI Baoyuan, DENG Xueshan, XUE Dizhan, QIAN Shengsheng. Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey [J]. Computer Science, 2026, 53(1): 12-28.
[10]	SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38.
[11]	HUANG Chao, CHENG Chunling, WANG Youkang. Source-free Domain Adaptation Method Based on Pseudo Label Uncertainty Estimation [J]. Computer Science, 2025, 52(9): 212-219.
[12]	ZHANG Shiju, GUO Chaoyang, WU Chengliang, WU Lingjun, YANG Fengyu. Text Clustering Approach Based on Key Semantic Driven and Contrastive Learning [J]. Computer Science, 2025, 52(8): 171-179.
[13]	WANG Jia, XIA Ying, FENG Jiangfan. Few-shot Video Action Recognition Based on Two-stage Spatio-Temporal Alignment [J]. Computer Science, 2025, 52(8): 251-258.
[14]	ZHANG Yuan, ZHANG Shengjie, LIU Lilong, QIAN Shengsheng. Research on Continual Social Event Classification Based on Continual Event Knowledge Network [J]. Computer Science, 2025, 52(8): 268-276.
[15]	LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0