计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 338-345.doi: 10.11896/jsjkx.230800198

• 人工智能 • 上一篇    下一篇

融合帖文属性的性别歧视言论检测模型

王小龙1,3, 王琰慧1,3, 张顺香1,2,3, 汪才钦1,3, 周渝皓1,3   

  1. 1 安徽理工大学计算机科学与工程学院 安徽 淮南 232001
    2 淮南师范学院计算机学院 安徽 淮南 232038
    3 合肥综合性国家科学中心人工智能研究院 合肥 230000
  • 收稿日期:2023-08-31 修回日期:2023-12-04 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 张顺香(sxzhang@aust.edu.cn)
  • 作者简介:(1556282598@qq.com)
  • 基金资助:
    国家自然科学基金面上项目(62076006);认知智能全国重点实验室开放课题(COGOS-2023HE02);安徽高校协同创新项目(GXXT-2021-008)

Gender Discrimination Speech Detection Model Fusing Post Attributes

WANG Xiaolong1,3, WANG Yanhui1,3, ZHANG Shunxiang1,2,3, WANG Caiqin1,3, ZHOU Yuhao1,3   

  1. 1 School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan,Anhui 232001,China
    2 School of Computer,Huainan Normal University,Huainan,Anhui 232038,China
    3 Artificial Intelligence Research Institute of Hefei Comprehensive National Science Center,Hefei 230000,China
  • Received:2023-08-31 Revised:2023-12-04 Online:2024-06-15 Published:2024-06-05
  • About author:WANG Xiaolong,born in 1999.postgraduate,is a member of CCF(No.P8181G).His main research interests include sentiment analysis and data mining.
    ZHANG Shunxiang,born in 1970.Ph.D,professor,Ph.D supervisor.His main research interests include web mining,semantic search,and complex network.
  • Supported by:
    National Natural Science Foundation of China(62076006),Opening Foundation of State Key Laboratory of Cognitive Intelligence,iFLYTEK(COGOS-2023HE02) and University Synergy Innovation Program of Anhui Province(GXXT-2021-08).

摘要: 性别歧视言论检测是通过自然语言处理技术来识别文本是否具有性别歧视的倾向,为净化网络环境提供有力支持。当前相关研究仅关注帖文本身,未对帖文属性(用户、帖文以及主题)间的关系进行挖掘。为此,提出一种融合帖文属性的性别歧视言论检测模型,通过构建异构图来挖掘帖文属性间的关系。首先,利用ERNIE对帖文内容进行词嵌入,通过BiGRU模型提取上下文依赖关系,得到句子表征;然后,基于帖文属性关系构建异构图,并利用异构图注意力网络(Heterogeneous Graph Attention Network)得到帖文内容的关系表示;最后,融合帖文内容的关系表示与句子表征,通过Softmax函数进行分类。实验结果表明,所提模型可以提升性别歧视言论检测的准确率。

关键词: 性别歧视言论, 帖文属性, BiGRU, 异构图, 异构图注意力网络

Abstract: Gender discrimination speech detection is to identify whether the text has the tendency of gender discrimination through NLP technology,which provides strong support for purifying the network environment.The limitation of current researches is that they pay more attention to the posts itself,while the exploration of relationships among post attributes(user,post,and theme) is overlooked.Motivated by this issue,this paper proposes a model to mine the relationships among post attributes by constructing heterogeneous graphs.Firstly,the word embeddings of post content are generated by ERNIE,subsequently,the contextual dependencies are extracted using BiGRU,and thus the sentence representation is obtained.Then,the heterogeneous graph based on the relationships among post attributes is constructed,and the heterogeneous graph attention network is further employed to obtain the relationship representation of the post.Finally,the sentence representation and relationship representation are fused as input of the Softmax function for classification.Experimental results show that the proposed model can improve the effect of gender discrimination speech detection.

Key words: Gender discrimination speech, Post attributes, BiGRU, Heterogeneous graph, Heterogeneous graph attention network

中图分类号: 

  • TP391
[1]MIN C,LIN H,LI X,et al.Finding hate speech with auxiliaryemotion detection from self-training multi-label learning perspective[J].Information Fusion,2023,96:214-223.
[2]JIANG A,ZUBIAGA A.SexWEs:Domain-Aware Word Em-beddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media[C]//Proceedings of the International AAAI Conference on Web and Social Media.2023,17:447-458.
[3]LI P,HE L,WANG H,et al.Learning from short text streams with topic drifts[J].IEEE Transactions on Cybernetics,2017,48(9):2697-2711.
[4]HU X,WANG H,LI P.Online biterm topic model based short text stream classification using short text expansion and concept drifting detection[J].Pattern Recognition Letters,2018,116:187-194.
[5]ALSMADI I M,GAN K H.Short text classification using feature enrichment from credible texts[J].International Journal of Web Engineering and Technology,2020,15(1):59-80.
[6]KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP).Doha:Association for Computational Linguistics,2014:1746-1751.
[7]GUO L,ZHANG D,WANG L,et al.CRAN:a hybrid CNN-RNN attention-based model for text classification[C]//Conceptual Modeling:37th International Conference,ER 2018,Xi’an,China,October 22-25,2018,Proceedings 37.Springer International Publishing,2018:571-585.
[8]WANG Y,WANG H,ZHANG X,et al.An attention-aware bidirectional multi-residual recurrent neural network(abmrnn):A study about better short-term text classification[C]//2019 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2019:3582-3586.
[9]WANG H T,SONG W,WANG H.Text Classification Method Based on Hybrid Model of LSTM and CNN[J].Journal of Chinese Computer Systems,2020,41(6):1163-1168.
[10]BAO D,QIN D,HONG L,et al.Multi-Channel Text Classification Model Based on ERNIE[C]//Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition.2022:321-327.
[11]CHEN X N,GAO P F,LIANG Y,et al.A Category Hybrid Embedding Based Approach for Power Text Hierarchical Classification[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2022,58(1):77-82.
[12]YAO L,MAO C,LUO Y.Graph convolutional networks fortext classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):7370-7377.
[13]HU L M,YANG T,SHI C,et al.Heterogeneous graph attention networks for semi-supervised short text classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th international Joint Conference on Natural Language Processing(EMNLP-IJC-NLP).2019:4821-4830.
[14]WANG X,JI H,SHI C,et al.Heterogeneous graph attention network[C]//The World Wide Web Conference.2019:2022-2032.
[15]HOSSEINMARDI H,RAFIQ R I,HAN R,et al.Prediction of cyberbullying incidents in a media-based social network[C]//2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining(ASONAM).IEEE,2016:186-192.
[16]MEHDAD Y,TETREAULT J.Do characters abuse more than words?[C]//Proceedings of the 17thAnnual Meeting of the Special Interest Group on Discourse and Dialogue.2016:299-303.
[17]CAO R,LEE R K W,HOANG T A.DeepHate:Hate speech detection via multi-faceted text representations[C]//12th ACMConference on Web Science.2020:11-20.
[18]KUMARI K,SINGH J P,DWIVEDI Y K,et al.Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization[J].Future Generation Computer Systems,2021,118:187-197.
[19]BADRI N,KBOUBI F,CHAIBI A H.Combining FastText and Glove word embedding for offensive and hate speech text detection[J].Procedia Computer Science,2022,207:769-778.
[20]BIANCHI F,HILLS S,ROSSINI P,et al.“It’s Not Just Hate”:A Multi-Dimensional Perspective on Detecting Harmful Speech Online[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.2022:8093-8099.
[21]RAJPUT G,PUNN N S,SONBHADRA S K,et al.Hate speech detection using static bert embeddings[C]//Big Data Analytics:9th International Conference(BDA 2021).Springer International Publishing,2021:67-77.
[22]CASELLI T,BASILE V,JELENA M,et al.HateBERT:Re-training BERT for Abusive Language Detection in English[C]//Proceedings of the 5th Workshop on Online Abuse and Harms(WOAH 2021).Association for Computational Linguistics,2021:17-25.
[23]SHI X Y,ZHANG F Y,YUAN JQ,et al.Detection of unsupervised offensive speech based on multilingual BERT[J].Journal of Computer Applications,2022,42(11):3379-3385.
[24]ZHANG Z,LUO L.Hate speech detection:A solved problem? the challenging case of long tail on twitter[J].Semantic Web,2019,10(5):925-945.
[25]KHAN S,FAZIL M,SEJWAL V K,et al.BiCHAT:BiLSTMwith deep CNN and hierarchical attention for hate speech detection[J].Journal of King Saud University-Computer and Information Sciences,2022,34(7):4335-4344.
[26]ZHANG T Y,YOU F C.Research on short text classification based on textCNN[C]//Journal of Physics:Conference Series,2021,1757(1):012092.
[27]YAO T,ZHAI Z,GAO B.Text classification model based on fasttext[C]//2020 IEEE International Conference on Artificial Intelligence and Information Systems(ICAIIS).IEEE,2020:154-157.
[28]JOHNSON R,ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017:562-570.
[29]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186.
[30]SUN Y,WANG S H,LI Y K,et al.ERNIE:enhanced representation through knowledge integration[J].arXiv:1904.09223,2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!