计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 338-345.doi: 10.11896/jsjkx.230800198
王小龙1,3, 王琰慧1,3, 张顺香1,2,3, 汪才钦1,3, 周渝皓1,3
WANG Xiaolong1,3, WANG Yanhui1,3, ZHANG Shunxiang1,2,3, WANG Caiqin1,3, ZHOU Yuhao1,3
摘要: 性别歧视言论检测是通过自然语言处理技术来识别文本是否具有性别歧视的倾向,为净化网络环境提供有力支持。当前相关研究仅关注帖文本身,未对帖文属性(用户、帖文以及主题)间的关系进行挖掘。为此,提出一种融合帖文属性的性别歧视言论检测模型,通过构建异构图来挖掘帖文属性间的关系。首先,利用ERNIE对帖文内容进行词嵌入,通过BiGRU模型提取上下文依赖关系,得到句子表征;然后,基于帖文属性关系构建异构图,并利用异构图注意力网络(Heterogeneous Graph Attention Network)得到帖文内容的关系表示;最后,融合帖文内容的关系表示与句子表征,通过Softmax函数进行分类。实验结果表明,所提模型可以提升性别歧视言论检测的准确率。
中图分类号:
[1]MIN C,LIN H,LI X,et al.Finding hate speech with auxiliaryemotion detection from self-training multi-label learning perspective[J].Information Fusion,2023,96:214-223. [2]JIANG A,ZUBIAGA A.SexWEs:Domain-Aware Word Em-beddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media[C]//Proceedings of the International AAAI Conference on Web and Social Media.2023,17:447-458. [3]LI P,HE L,WANG H,et al.Learning from short text streams with topic drifts[J].IEEE Transactions on Cybernetics,2017,48(9):2697-2711. [4]HU X,WANG H,LI P.Online biterm topic model based short text stream classification using short text expansion and concept drifting detection[J].Pattern Recognition Letters,2018,116:187-194. [5]ALSMADI I M,GAN K H.Short text classification using feature enrichment from credible texts[J].International Journal of Web Engineering and Technology,2020,15(1):59-80. [6]KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Doha:Association for Computational Linguistics,2014:1746-1751. [7]GUO L,ZHANG D,WANG L,et al.CRAN:a hybrid CNN-RNN attention-based model for text classification[C]//Conceptual Modeling:37th International Conference,ER 2018,Xi’an,China,October 22-25,2018,Proceedings 37.Springer International Publishing,2018:571-585. [8]WANG Y,WANG H,ZHANG X,et al.An attention-aware bidirectional multi-residual recurrent neural network(abmrnn):A study about better short-term text classification[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2019:3582-3586. [9]WANG H T,SONG W,WANG H.Text Classification Method Based on Hybrid Model of LSTM and CNN[J].Journal of Chinese Computer Systems,2020,41(6):1163-1168. [10]BAO D,QIN D,HONG L,et al.Multi-Channel Text Classification Model Based on ERNIE[C]//Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition.2022:321-327. [11]CHEN X N,GAO P F,LIANG Y,et al.A Category Hybrid Embedding Based Approach for Power Text Hierarchical Classification[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2022,58(1):77-82. [12]YAO L,MAO C,LUO Y.Graph convolutional networks fortext classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):7370-7377. [13]HU L M,YANG T,SHI C,et al.Heterogeneous graph attention networks for semi-supervised short text classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th international Joint Conference on Natural Language Processing(EMNLP-IJC-NLP).2019:4821-4830. [14]WANG X,JI H,SHI C,et al.Heterogeneous graph attention network[C]//The World Wide Web Conference.2019:2022-2032. [15]HOSSEINMARDI H,RAFIQ R I,HAN R,et al.Prediction of cyberbullying incidents in a media-based social network[C]//2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining(ASONAM).IEEE,2016:186-192. [16]MEHDAD Y,TETREAULT J.Do characters abuse more than words?[C]//Proceedings of the 17thAnnual Meeting of the Special Interest Group on Discourse and Dialogue.2016:299-303. [17]CAO R,LEE R K W,HOANG T A.DeepHate:Hate speech detection via multi-faceted text representations[C]//12th ACMConference on Web Science.2020:11-20. [18]KUMARI K,SINGH J P,DWIVEDI Y K,et al.Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization[J].Future Generation Computer Systems,2021,118:187-197. [19]BADRI N,KBOUBI F,CHAIBI A H.Combining FastText and Glove word embedding for offensive and hate speech text detection[J].Procedia Computer Science,2022,207:769-778. [20]BIANCHI F,HILLS S,ROSSINI P,et al.“It’s Not Just Hate”:A Multi-Dimensional Perspective on Detecting Harmful Speech Online[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:8093-8099. [21]RAJPUT G,PUNN N S,SONBHADRA S K,et al.Hate speech detection using static bert embeddings[C]//Big Data Analytics:9th International Conference(BDA 2021).Springer International Publishing,2021:67-77. [22]CASELLI T,BASILE V,JELENA M,et al.HateBERT:Re-training BERT for Abusive Language Detection in English[C]//Proceedings of the 5th Workshop on Online Abuse and Harms(WOAH 2021).Association for Computational Linguistics,2021:17-25. [23]SHI X Y,ZHANG F Y,YUAN JQ,et al.Detection of unsupervised offensive speech based on multilingual BERT[J].Journal of Computer Applications,2022,42(11):3379-3385. [24]ZHANG Z,LUO L.Hate speech detection:A solved problem? the challenging case of long tail on twitter[J].Semantic Web,2019,10(5):925-945. [25]KHAN S,FAZIL M,SEJWAL V K,et al.BiCHAT:BiLSTMwith deep CNN and hierarchical attention for hate speech detection[J].Journal of King Saud University-Computer and Information Sciences,2022,34(7):4335-4344. [26]ZHANG T Y,YOU F C.Research on short text classification based on textCNN[C]//Journal of Physics:Conference Series,2021,1757(1):012092. [27]YAO T,ZHAI Z,GAO B.Text classification model based on fasttext[C]//2020 IEEE International Conference on Artificial Intelligence and Information Systems(ICAIIS).IEEE,2020:154-157. [28]JOHNSON R,ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017:562-570. [29]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186. [30]SUN Y,WANG S H,LI Y K,et al.ERNIE:enhanced representation through knowledge integration[J].arXiv:1904.09223,2019. |
|