计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 15-22.doi: 10.11896/jsjkx.240700099
武成龙1, 胡明昊2, 廖劲智3, 杨慧4, 赵翔1
WU Chenglong1, HU Minghao2, LIAO Jinzhi3, YANG Hui4, ZHAO Xiang1
摘要: 近年来,社交媒体因其开放性和便捷性,为虚假信息的扩散和泛滥提供了温床。相较于单模态虚假信息,多模态虚假信息通过融合文本和图片等多种信息形式,创造出更具迷惑性的虚假内容,造成更深远的影响。现有的多模态虚假信息识别方法大多基于小模型,而多模态大模型的快速发展为多模态虚假信息的识别提供了新思路。然而,这些模型通常参数众多、计算资源消耗大,无法直接部署在计算和能量资源受限的场景中。为了解决以上问题,提出一种基于多模态大模型Long-CLIP的多模态虚假信息识别模型。该模型能够处理长文本,关注更多粗粒度和细粒度细节。同时,利用高效多粒度分层剪枝进行模型压缩,得到一个更加轻量化的多模态虚假信息识别模型,以适应资源受限场景。最后,在微博数据集上,通过与微调前后的当前流行的多模态大模型和其他剪枝方法进行对比,验证了该模型的有效性。结果显示,基于Long-CLIP的多模态虚假信息识别模型在模型参数和推理时间方面远少于当前流行的多模态大模型,但检测效果更佳。模型压缩后,在检测效果仅下降0.01的情况下,模型参数减少50%,推理时间减少1.92s。
中图分类号:
[1] DUAN Y X,HU Y L,GUO H,et al.Research on improvedcross-modal association ambiguity learning for fake news detection[J].Computer Science,2024,51(4):307-313. [2] WANG J,WANG Y C,HUANG M J.Fake News in Social Networks:Definition,Detection,and Control[J].Computer Science,2021,48(8):263-277. [3] LI Z Y,LI J.Fake news detection method based on multimodal attention network using contrastive learning[J].China Science Papers,2023,18(11):1192-1197. [4] LIANG Y,TUO H T,AIMUDULA A.Multimodal fake news detection based on multi-layer CNN feature fusion and multi-classifier hybrid prediction[J].Computer Engineering & Science,2023,45(6):1087-1096. [5] LAO A,ZHANG Q,SHI C,et al.Frequency spectrum is more effective for multimodal representation and fusion:a multimodal spectrum rumor detector[J].arXiv:2312.11023,2023. [6] YING Q,HU X,ZHOU Y,et al.Bootstrapping multi-view representations for fake news detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto:AAAI,2023,37(4):5384-5392. [7] RADFORD A,KIM J W,HALLACY C,et al.Learning trans-ferable visual models from natural language supervision[C]//Proceedings of the International Conference on Machine Lear-ning.PMLR,2021:8748-8763. [8] LI J,LI D,XIONG C,et al.BLIP:bootstrapping language-image pre-training for unified vision-language understanding and ge-neration[C]//Proceedings of the International Conference on Machine Learning.PMLR,2022:12888-12900. [9] LI J,LI D,SAVARESE S,et al.BLIP-2:bootstrapping language-image pre-training with frozen image encoders and large language models[C]//Proceedings of the International Confe-rence on Machine Learning.PMLR,2023:19730-19742. [10] ALAYRAC J B,DONAHUE J,LUC P,et al.Flamingo:a visual language model for few-shot learning[J].Advances in Neural Information Processing Systems,2022,35:23716-23736. [11] ZHANG B,ZHANG P,DONG X,et al.Long-CLIP:unlockingthe long-text capability of CLIP[J].arXiv:2403.15378,2024. [12] WEI J,TAY Y,BOMMASANI R,et al.Emergent abilities of large language models[J].arXiv:2206.07682,2022. [13] LU H,LIU W,ZHANG B,et al.DeepSeek-VL:towards real-world vision-language understanding[J].arXiv:2403.05525,2024. [14] YANG A,XIAO B,WANG B,et al.Baichuan 2:open large-scale language models[J].arXiv:2309.10305,2023. [15] BAI J,BAI S,YANG S,et al.Qwen-vl:a versatile vision-lan-guage model for understanding,localization,text reading,and beyond[J].arXiv:2308.12966,2023. [16] SMITH S,PATWARY M,NORICK B,et al.Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B,a large-scale generative language model[J].arXiv:2201.11990,2022. [17] QI P,CAO J,LI X,et al.Improving fake news detection byusing an entity-enhanced framework to fuse diverse multimodal clues[C]//Proceedings of the 29th ACM International Confe-rence on Multimedia.New York:ACM,2021:1212-1220. [18] VERMA P K,AGRAWAL P,AMORIM I,et al.WELFake:word embedding over linguistic features for fake news detection[J].IEEE Transactions on Computational Social Systems,2021,8(4):881-893. [19] SHU K,CUI L,WANG S,et al.Defend:explainable fake news detection[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:ACM,2019:395-405. [20] WU L,RAO Y,ZHANG C,et al.Category-controlled encoder-decoder for fake news detection[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(2):1242-1257. [21] HU L,YANG T,ZHANG L,et al.Compare to the knowledge:graph neural fake news detection with external knowledge[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg:ACL,2021:754-763. [22] ABDALI S,GURAV R,MENON S,et al.Identifying misinformation from website screenshots[C]//Proceedings of the International AAAI Conference on Web and Social Media.Palo Alto:AAAI,2021:2-13. [23] QI P,BU Y,CAO J,et al.Fakesv:a multimodal benchmark with rich social context for fake news detection on short video platforms[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto:AAAI,2023:14444-14452. [24] LI S,YAO T,LI S,et al.Semantic-enhanced multimodal fusion network for fake news detection[J].International Journal of Intelligent Systems,2022,37(12):12235-12251. [25] BAZMI P,ASADPOUR M,SHAKERY A.Multi-view co-attention network for fake news detection by modeling topic-specific user and news source credibility[J].Information Processing & Management,2023,60(1):103146. [26] DAVOUDI M,MOOSAVI M R,SADREDDINI M H.DSS:a hybrid deep model for fake news detection using propagation tree and stance network[J].Expert Systems with Applications,2022,198:116635. [27] SINGHAL S,DHAWAN M,SHAH R R,et al.Inter-modality discordance for multimodal fake news detection[C]//Procee-dings of the 3rd ACM International Conference on Multimedia in Asia.New York:ACM,2021:1-7. [28] WU L,LIU P,ZHANG Y.See how you read? Multi-reading habits fusion reasoning for multimodal fake news detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto:AAAI,2023:13736-13744. [29] LI P,SUN X,YU H,et al.Entity-oriented multi-modal align-ment and fusion network for fake news detection[J].IEEE Transactions on Multimedia,2022,24:3455-3468. [30] LUVEMBE A M,LI W,LI S,et al.Dual emotion based fake news detection:a deep attention-weight update approach[J].Information Processing & Management,2023,60(4):103354. [31] BOULAHIA S Y,AMAMRA A,MADI M R,et al.Early,inter-mediate and late fusion strategies for robust deep learning-based multimodal action recognition[J].Machine Vision and Applications,2021,32(6):121. [32] MA Y,CAO Y,HONG Y C,et al.Large language model is not a good few-shot information extractor,but a good reranker for hard samples![J].arXiv:2303.08559,2023. [33] DRIESS D,XIA F,SAJJADI M S M,et al.Palm-e:an embodied multimodal language model[J].arXiv:2303.03378,2023. [34] ZHANG Q,ZUO S,LIANG C,et al.Platon:pruning large trans-former models with upper confidence bound of weight importance[C]//Proceedings of the International Conference on Machine Learning.PMLR,2022:26809-26823. [35] FANG G,MA X,SONG M,et al.DepGraph:towards any structural pruning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16091-16101. [36] MCCARLEY J S,CHAKRAVARTI R,SIL A.Structured pru-ning of a BERT-based question answering model[J].arXiv:1910.06360,2019. [37] SUN M,LIU Z,BAIR A,et al.A simple and effective pruning approach for large language models[J].arXiv:2306.11695,2023. [38] DAS A B,RAMAMOORTHY A.Coded sparse matrix computation schemes that leverage partial stragglers[J].IEEE Transactions on Information Theory,2022,68(6):4156-4181. [39] FRANTAR E,ALISTARH D.SparseGPT:massive languagemodels can be accurately pruned in one-shot[C]//Proceedings of the International Conference on Machine Learning.PMLR,2023:10323-10337. [40] SHI D,TAO C,JIN Y,et al.UPOP:unified and progressivepruning for compressing vision-language transformers[C]//Proceedings of the International Conference on Machine Learning.PMLR,2023:31292-31311. [41] SUNG Y L,YOON J,BANSAL M.EcoFLAP:efficient coarse-to-fine layer-wise pruning for vision-language models[J].ar-Xiv:2310.02998,2023. [42] JIN Z,CAO J,GUO H,et al.Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]//Proceedings of the 25th ACM International Conference on Multimedia.New York:ACM,2017:795-816. |
|