基于多语言嵌入图卷积网络的仇恨言论检测方法

doi:10.11896/jsjkx.241200023

计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241200023-8.doi: 10.11896/jsjkx.241200023

基于多语言嵌入图卷积网络的仇恨言论检测方法

赵弘毅, 李志远, 卜凡亮

中国人民公安大学信息网络安全学院北京 100045

出版日期:2025-11-15 发布日期:2025-11-10
通讯作者: 卜凡亮(bufanliang@sina.com)
作者简介:2022211430@ppsuc.edu.cn
基金资助:
中国人民公安大学双一流创新研究项目(2023SYL08)

Multi-language Embedding Graph Convolutional Network for Hate Speech Detection

ZHAO Hongyi, LI Zhiyuan, BU Fanliang

School of Information Network Security,People’s Public Security University of China,Beijing 100045,China

Online:2025-11-15 Published:2025-11-10
Supported by:
Double First-Class Innovation Research Project of People’s Public Security University of China(2023SYL08).

摘要/Abstract

摘要： 随着社交媒体的广泛应用,网络仇恨言论的传播问题日益严重,尤其在网络匿名性的掩护下,仇恨言论得以快速扩散,为仇恨言论检测带来严峻挑战。为了有效应对这一问题,提出了一种基于多语言嵌入图卷积网络(Multi-language Embedding Graph Convolutional Network,MEGCN)的多语言仇恨言论检测方法。该方法充分融合了序列建模与图建模的优势,利用多语言预训练模型进行特征提取,从而能够处理不同语言间的复杂关系。同时,提出了一种基于插值预测的联合训练方式,以提升模型的准确性和鲁棒性。通过在4个公开数据集上的实验,结果表明,MEGCN相比所有对比模型,均在多语言仇恨言论检测任务中取得了更优的性能。该方法不仅能够保持较高的序列建模精度,还能够有效地捕捉文本间的结构性关系,进而提升模型在多语言环境中的表现,尤其在不同语言之间的语义对应关系方面展现出显著优势。

关键词: 仇恨言论检测, 图卷积网络, 多语言预训练模型, 自然语言处理

Abstract: With the widespread use of social media,the issue of the spread of online hate speech has become increasingly severe,especially under the cover of anonymity on the Internet,allowing hate speech to spread rapidly,posing a serious challenge to the detection of hate speech.In order to effectively address this issue,this paper proposes a cross-lingual hate speech detection me-thod based on Multi-language Embedding Graph Convolutional Network(MEGCN).This method fully integrates the advantages of sequence modeling and graph modeling,and uses multi-language pre-trained models for feature extraction,thus being able to handle complex relationships between different languages.At the same time,this paper proposes a joint training method based on interpolation prediction to improve the accuracy and robustness of the model.Experiments on four public datasets show that MEGCN achieves better performance than all existing comparative models in the task of cross-lingual hate speech detection.This method not only maintains a high sequence modeling accuracy,but also effectively captures the structural relationships between texts,thereby improving the performance of the model in multi-language environments,especially in terms of semantic correspondence between different languages.

Key words: Hate speech detection, Graph convolutional network, Multi-language pre-trained model, Natural language processing

中图分类号:

TP391

赵弘毅, 李志远, 卜凡亮. 基于多语言嵌入图卷积网络的仇恨言论检测方法[J]. 计算机科学, 2025, 52(11A): 241200023-8. https://doi.org/10.11896/jsjkx.241200023

ZHAO Hongyi, LI Zhiyuan, BU Fanliang. Multi-language Embedding Graph Convolutional Network for Hate Speech Detection[J]. Computer Science, 2025, 52(11A): 241200023-8. https://doi.org/10.11896/jsjkx.241200023

参考文献

[1]AGOSTINA C,LEONARDO N,NEIL S,et al.Explainability and Hate Speech:Structured Explanations Make Social Media Moderators Faster[C]//Proceedings of the 62nd Annual Mee-ting of the Association for Computational Linguistics.Association for Computational Linguistics,2024:398-408.
[2]WANG X L,WANG Y H,ZHANG S X,et al.Gender Discrimination Speech Detection Model Fusing Post Attributes[J].Computer Science,2024,51(6):338-345.
[3]CHEN H Y,ZHANG L.Very Short Texts Hierarchical Classification Combining Semantic Interpretation and DeBERTa[J].Computer Science,2024,51(5):250-257.
[4]YAO L,MAO C S,LUO Y.Graph Convolutional Networks for Text Classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7370-7377.
[5]HUANG R,XU J.Text Classification Based on Invariant Graph Convolutional Neural Networks[J].Computer Science,2024,51(S1):120-124.
[6]STEPHEN M,EMANUELA B,ANTOINE D,et al.Multilingual Epidemiological Text Classification:A Comparative Study[C]//International Conference on Computational Linguistics(COLING).2020:6172-6183.
[7]SEBASTIAN K,DENNIS M R,STEFFEN H,et al.Discussing the Value of Automatic Hate Speech Detection in Online Debates[C]//Multikonferenz Wirtschaftsinformatik.2018.
[8]DEBORA N.Exposing the limits of Zero-shot Cross-lingualHate Speech Detection[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Association for Computational Linguistics,2021:907-914.
[9]IRINA B,VIKTOR H,ALEXANDER F.Cross-Lingual Transfer Learning for Hate Speech Detection[C]//Proceedings of the First Workshop on Language Technology for Equality,Diversity and Inclusion,Kyiv.Association for Computational Linguistics,2021:15-25.
[10]ASHISH V,NOAM S,NIKI P,et al.Attention is All You Need[J].Advances in Neural Information Processing Systems,2017,30:5998-6008.
[11]WU S H,DREDZE M.Are All Languages Created Equal inMultilingual BERT?[C]//Proceedings of the 5th Workshop on Representation Learning for NLP.Association for Computatio-nal Linguistics,2020:120-130.
[12]LAMPLE G,ALEXIS C.Cross-lingual Language Model Pretraining[J].arXiv:₁901.07291,2019.
[13]GCONNEAU A,KHANDELWAL K,GOYAL N,et al.Unsupervised Cross-lingual Representation Learning at Scale[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2020:8440-8451.
[14]TEODOR T,ZUBIAGA A.Cross-lingual Hate Speech Detection using Transformer Models[J].arXiv:2111.00981,2021.
[15]YANG Z Q,XU Z H,CUI Y M,et al.CINO:A Chinese Mino-rity PRE-trained Language Model[C]//Proceedings of the 29th International Conference on Computational Linguistics.International Committee on Computational Linguistics,2022:3937-3949.
[16]SAI S A,BINNY M,PUNYAJOY S,et al.A Deep Dive into Multilingual Hate Speech Classification[C]//European Confe-rence on Machine Learning and Knowledge Discovery in Databased.2021:423-439.
[17]LIN Y X,MENG Y X,SUN X F,et al.BertGCN:Transductive Text Classification by Combining GNN and BERT[C]//Findings of the Association for Computational Linguistics.Association for Computational Linguistics,2021:1456-1462.
[18]YANG T C,HU L M,SHI C,et al.HGAT:Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification[J].ACM Transactions Information Systems,2021,39(3):1-29.
[19]WU L,CHEN Y,SHEN K,et al.Graph neural networks fornatural language processing:A survey[J].Foundations and Trends^ in Machine Learning,2023,16(2):119-328.
[20]ZHANG J,ZHANG H,SUN L,et al.Graph-Bert:Only Attention is Needed for Learning Graph Representations[J].arXiv:2001.05140,2020.
[21]SHAKED B,URI A,ERAN Y.How Attentive are Graph Attention Networks?[J].arXiv:2105.14491,2021.
[22]YAO L,MAO C S,LUO Y.Graph Convolutional Networks for Text Classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7370-7377.
[23]DENG J W,ZHOU J Y,SUN H,et al.COLD:A Benchmark for Chinese Offensive Language Detection[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2022:11580-11599.
[24]VALERIO B,CRISTINA B,ELISABETTA F,et al.SemEval-2019 Task 5:Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter[C]//Proceedings of the 13th International Workshop on Semantic Evaluation.Association for Computational Linguistics,2019:54-63.
[25]PATRICIA C,VÉRONIQUE M,FARAH B,et al.An Annotated Corpus for Sexism Detection in French Tweets[C]//Proceedings of the Twelfth Language Resources and Evaluation Conference.European Language Resources Association,2020:1397-1403.
[26]OUSIDHOUM N,LIN Z Z,ZHANG H M,et al.Multilingual and Multi-Aspect Hate Speech Analysis[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.Association for Computational Linguistics,2019:4675-4684.
[27]OLLAGNIER A,CABRIO E,VILLATA S,et al.CyberAgressionAdo-v1:a Dataset of Annotated Online Aggressions in French Collected through a Role-playing Game[C]//Language Resources and Evaluation Conference.2020.
[28]VANETIK N,MIMOUN E.Detection of Racist Language inFrench Tweets[J].Information,2022,13(7):318.
[29]KENNEDY,CHRIS J,GEOFF B,et al.Constructing interval variables via faceted Rasch measurement and multitask deep learning:a hate speech application[J].arXiv:2009.10277,2020.
[30]MNASSRI K,FARAHBAKHSH R,CRESPI N.MultilingualHate Speech Detection Using Semi-supervised Generative Adversarial Network[C]//International Conference on Complex Networks and Their Applications.2024:192-204.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于多语言嵌入图卷积网络的仇恨言论检测方法

Multi-language Embedding Graph Convolutional Network for Hate Speech Detection

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0