计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 30-38.doi: 10.11896/jsjkx.240700004

• 社交媒体虚假信息检测 • 上一篇    下一篇

多源异构数据渐进式融合的虚假新闻检测

于泳欣1,2, 纪科1,2, 高源1,2, 陈贞翔1,2, 马坤1,2, 赵晓凡3,4   

  1. 1 济南大学信息科学与工程学院 济南 250022
    2 山东省网络环境智能计算技术重点实验室 济南 250022
    3 中国人民公安大学信息网络安全学院 北京 102623
    4 安全防范技术与风险评估公安部重点实验室 北京 102623
  • 收稿日期:2024-07-01 修回日期:2024-09-06 出版日期:2024-11-15 发布日期:2024-11-06
  • 通讯作者: 纪科(ise_jik@ujn.edu.cn)
  • 作者简介:(3106615421@qq.com)
  • 基金资助:
    山东省重点研发计划(2021CXGC010103,2018CXGC0706);山东省自然科学基金(ZR2022LZH016)

Multi-source Heterogeneous Data Progressive Fusion for Fake News Detection

YU Yongxin1,2, JI Ke1,2, GAO Yuan1,2, CHEN Zhenxiang1,2, MA Kun1,2, ZHAO Xiaofan3,4   

  1. 1 School of Information Science and Engineering,University of Jinan,Jinan 250022,China
    2 Shandong Provincial Key Laboratory of Network Based Intelligent Computing,Jinan 250022,China
    3 School of Information and Cyber Security,People’s Public Security University of China,Beijing 102623,China
    4 Key Laboratory of Security Prevention Technology and Risk Assessment of the Ministry of Public Security,Beijing 102623,China
  • Received:2024-07-01 Revised:2024-09-06 Online:2024-11-15 Published:2024-11-06
  • About author:YU Yongxin,born in 2000,postgra-duate,is a member of CCF(No.N4386G).Her main research interests include machine learning and natural language processing.
    JI Ke,born in 1989,Ph.D,associate professor,is a member of CCF(No.78936M).His main research interests include machine learning and recommendation systems.
  • Supported by:
    Shandong Provincial Key R & D Program of China(2021CXGC010103,2018CXGC0706) and Natural Science Foundation of Shandong Province,China(ZR2022LZH016).

摘要: 社交媒体平台上充斥着大量未经验证的信息,这些信息大多为不同来源的异构数据,其传播范围之广、速度之快,对个人和社会造成了严重危害。因此,有效检测和防范虚假新闻至关重要。针对当前虚假新闻检测模型局限于从单一数据来源获取新闻文本及视觉信息,导致新闻报道主观性较强、数据覆盖不全面的问题,提出了一种多源异构数据渐进式融合的虚假新闻检测模型。首先,进行多源异构数据的收集、筛选和清洗,由此构建了一个多源多模态数据集,其中包含关于每个事件的多个不同角度的报道;接着,通过将文本特征提取器和视觉特征提取器获取的特征输入多源融合模块,实现了不同来源特征之间的渐进式融合;同时,引入文本的情感特征和图像的频域特征,以实现多层次的特征提取;最后,采用软注意力机制进行特征集成。实验结果和分析表明,与已有的流行方法相比,所提模型有较好的检测效果,为大数据时代的虚假新闻检测提供了一种有效的解决方案。

关键词: 虚假新闻检测, 数据扩增, 多源异构数据, 特征融合, 情感特征, 频域特征

Abstract: Social media platforms are inundated with a vast amount of unverified information,much of which originates from he-terogeneous data from multi-source,which spreads so widely and quickly that it poses a significant threat to individuals and society.Therefore,it is crucial to effectively detect and prevent fake news. Targeting the current limitations of fake news detection models,which typically rely on single data sources for news textual and visual information,resulting in strong subjective news reports and incomplete data coverage,a model is proposed for detecting fake news by progressively fusing multi-source heteroge-neous data.Firstly,multi-source heterogeneous data collection,screening,and cleaning are conducted to create a multi-source multimodal dataset containing reports about each event from diverse perspectives.Next,by inputting the features obtained from the textual feature extractor and visual feature extractor into the multi-source fusion module,a progressive fusion of features from various sources is achieved.Additionally,sentiment features extracted from text and frequency domain features extracted from images are incorporated into the model to enable multi-level feature extraction.Finally,this paper adopts the soft attention mechanism for feature integration.Experimental results and analysis show that the proposed model has better detection performance compared to existing popular methods,providing an effective solution for fake news detection in the era of big data.

Key words: Fake news detection, Data augmentation, Multi-source heterogeneous data, Feature fusion, Sentiment feature, Frequency domain feature

中图分类号: 

  • TP391
[1] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Communication of the ACM,2017,60(6):84-90.
[2] ELMAN J L.Finding structure in time[J].Cognitive Science,1990,14(2):179-211.
[3] DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[4] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J/OL].https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[5] KIM Y.Convolutional neural networks for sentence classification[J].arXiv:1408.5882,2014.
[6] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[7] LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[8] HUA J,CUI X,LI X,et al.Multimodal fake news detectionthrough data augmentation-based contrastive learning[J].Applied Soft Computing,2023,136:110125.
[9] MAHAJAN S,ROTH S.Diverse image captioning with context-object split latent spaces[J].Advances in Neural Information Processing Systems,2020,33:3613-3624.
[10] LI Y,JI K,MA K,et al.Fake News Detection Based on the Correlation Extension of Multimodal Information[C]//Asia-Pacific Web(APWeb) and Web-Age Information Management(WAIM) Joint International Conference on Web and Big Data.Cham:Springer Nature Switzerland,2022:443-450.
[11] SU J.Simbert:Integrating retrieval and generation into bert[J/OL].Tech.Rep.https://github.com/Zhuiyi Technology/simbert.
[12] MA J,GAO W,MITRA P,et al.Detecting rumors from micro-blogs with recurrent neural networks[C]//Proceedings of IJCAI.2016:3818-3824.
[13] YU F,LIU Q,WU S,et al.A Convolutional Approach for Misinformation Identification[C]//IJCAI.2017:3901-3907.
[14] MA J,GAO W,WONG K F.Detect rumor and stance jointly by neural multi-task learning[C]//Companion Proceedings of the Web Conference 2018.2018:585-593.
[15] VAIBHAV V,ANNASAMMY R M,HOVY E.Do sentence interactions matter? leveraging sentence level representations for fake news classification[J].arXiv:1910.12203,2019.
[16] JIAN W,LI J P,AKBAR A,et al.SA-Bi-LSTM:Self Attention With Bi-Directional LSTM based Intelligent Model for Accurate Fake News Detection to ensured information integrity on social media platforms[J].IEEE Access,2024,12:48436-48452.
[17] SUN L,WANG H.Topic-Aware Fake News Detection Based on Heterogeneous Graph[J].IEEE Access,2023,11:103743-103752.
[18] QI P,CAO J,YANG T,et al.Exploiting multi-domain visual information for fake news detection[C]//2019 IEEE International Conference on Data Mining(ICDM).IEEE,2019:518-527.
[19] WANG Y,MA F,JIN Z,et al.Eann:Event adversarial neural networks for multi-modal fake news detection[C]//Proceedings of the 24th ACM Sigkdd International Conference on Knowledge Discovery & Data Mining.2018:849-857.
[20] KHATTAR D,GOUD J S,GUPTA M,et al.Mvae:Multimodal variational autoencoder for fake news detection[C]//The World Wide Web Conference.2019:2915-2921.
[21] ZHOU X,WU J,ZAFARANI R.Safe:Similarity-aware multi-modal fake news detection.arxiv[J].arXiv:2003.04981,2020.
[22] WU Y,ZHAN P,ZHANG Y,et al.Multimodal fusion with co-attention networks for fake news detection[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021.2021:2560-2569.
[23] QIAN S,WANG J,HU J,et al.Hierarchical multi-modal contextual attention network for fake news detection[C]//Procee-dings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.2021:153-162.
[24] GUO Y,GE H,LI J.A two-branch multimodal fake news detection model based on multimodal bilinear pooling and attention mechanism[J].Frontiers in Computer Science,2023,5:1159063.
[25] JING J,WU H,SUN J,et al.Multimodal fake news detection via progressive fusion networks[J].Information Processing & Management,2023,60(1):103120.
[26] GUO Y.A mutual attention based multimodal fusion for fake news detection on social network[J].Applied Intelligence,2023,53(12):15311-15320.
[27] LIU Y,BING W,REN S,et al.BC-FND:An Approach Based on Hierarchical Bilinear Fusion and Multimodal Consistency for Fake News Detection[J].IEEE Access,2024,12:62738-62749.
[28] PENG L,JIAN S,KAN Z,et al.Not all fake news is semantically similar:Contextual semantic representation learning for multi-modal fake news detection[J].Information Processing & Ma-nagement,2024,61(1):103564.
[29] WANG J,ZHENG J,YAO S,et al.TLFND:A Multimodal Fusion Model Based on Three-Level Feature Matching Distance for Fake News Detection[J].Entropy,2023,25(11):1533.
[30] MIHALCEA R,TARAU P.Textrank:Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:404-411.
[31] YU Z,YU J,FAN J,et al.Multi-modal factorized bilinear pooling with co-attention learning for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:1821-1830.
[32] HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[33] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[34] JIN Z,CAO J,GUO H,et al.Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]//Proceedings of the 25th ACM International Conference on Multimedia.2017:795-816.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!