计算机科学 ›› 2026, Vol. 53 ›› Issue (1): 187-194.doi: 10.11896/jsjkx.241100029

• 计算机图形学&多媒体 • 上一篇    下一篇

跨模态不一致感知下双视角交互融合的多模态情感分析

卜韵阳, 齐彬廷, 卜凡亮   

  1. 中国人民公安大学信息网络安全学院 北京 100038
  • 收稿日期:2024-11-05 修回日期:2025-02-12 发布日期:2026-01-08
  • 通讯作者: 卜凡亮(bufanliang@sina.com)
  • 作者简介:(1252300321@qq.com)
  • 基金资助:
    中国人民公安大学安全防范工程双一流专项(2023SYL08)

Multimodal Sentiment Analysis for Interactive Fusion of Dual Perspectives Under Cross-modalInconsistent Perception

BU Yunyang, QI Binting, BU Fanliang   

  1. College of Information Network Security, People’s Public Security University of China, Beijing 100038, China
  • Received:2024-11-05 Revised:2025-02-12 Online:2026-01-08
  • About author:BU Yunyang,born in 2000,postgra-duate.His main research interest is multimodal sentiment analysis.
    BU Fanliang,born in 1965,Ph.D,professor,Ph.D supervisor.His main research interests include computer control and information processing.
  • Supported by:
    Double First-Class Innovation Research Project for People’s Public Security University of China(2023SYL08).

摘要: 在社交媒体上,人们的评论通常会描述对应图像中的某一情感区域,图像和文本之间是具有对应信息的。以往的大多数多模态情感分析方法只是从单一视角探索图像和文本的相互影响,捕获图像区域和文本单词的对应关系,导致结果不是最优的。此外,社交媒体上的数据具有强烈的个人主观性,数据中的情感是多维和复杂的,导致出现了图像和文本情感一致性弱的数据。针对上述问题,提出了一种跨模态不一致感知下双视角交互融合的多模态情感分析模型。一方面,从全局和局部两种视角对图文特征进行跨模态交互,提供更全面、准确的情感分析,从而提升模型的表现和应用效果。另一方面,计算图文特征的不一致分数,用于代表图文不一致程度,以此来动态调控单模态表示和多模态表示的最终情感特征的权重,从而提高模型的鲁棒性。在MVSA-Single和MVSA-Multiple两个公共数据集上进行广泛实验,结果证明所提出的多模态情感分析模型与现有基线模型相比F1值分别提高0.59个百分点和0.39个百分点,具有有效性和优越性。

关键词: 多模态情感分析, 跨模态不一致感知, 双视角交互融合, 动态调控, 跨模态交互

Abstract: In social media,people’s comments usually describe a certain sentiment region in the corresponding image,and there is correspondence information between image and text.Most previous multimodal sentiment analysis methods only explore the interactions between images and text from a single perspective,capturing the correspondence between image regions and text words,leading to results that are not optimal.In addition,data on social media is strongly personal and subjective,and the sentiment in the data is multidimensional and complex,which leads to the emergence of data with weak image and text sentiment consistency.To address the above two problems,a multimodal sentiment analysis model with interactive fusion of two perspectives under cross-modal inconsistency perception is proposed.On the one hand,cross-modal interaction of graphic and textual features from both global and local perspectives provides a more comprehensive and accurate sentiment analysis,which improves the perfor-mance and application of the model.On the other hand,the inconsistency scores of the graphical features are calculated to representthe degree of graphical inconsistency,as a way to dynamically regulate the weights of the unimodal and multimodal representations in the final sentiment features,thus improving the robustness of the model.Extensive experiments are conducted on two public datasets,MVSA-Single and MVSA-Multiple,and the results demonstrate the validity and superiority of the proposed multimodal sentiment analysis model compared to the existing baseline models,with F1 values increasing by 0.59 persentage points and 0.39 persentage points,respectively.

Key words: Multimodal sentiment analysis, Cross-modal inconsistent perception, Dual-view interactive fusion, Dynamic regulation, Cross-modal interaction

中图分类号: 

  • TP391.41
[1]ZHANG L,WANG S,LIU B.Deep learning for sentiment ana-lysis:A survey[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2018,8(4):e1253.
[2]PANG L,ZHU S,NGO C W.Deep multimodal learning for affective analysis and retrieval[J].IEEE Transactions on Multimedia,2015,17(11):2008-2020.
[3]ZHU T,LI L,YANG J,et al.Multimodal sentiment analysiswith image-text interaction network[J].IEEE Transactions on Multimedia,2022,25:3375-3385.
[4]XU J,HUANG F,ZHANG X,et al.Visual-textual sentiment classification with bi-directional multi-level attention networks[J].Knowledge-Based Systems,2019,178:61-73.
[5]TABOADA M,BROOKE J,TOFILOSKI M,et al.Lexicon-based methods for sentiment analysis[J].Computational linguistics,2011,37(2):267-307.
[6]RAO Y,LEI J,LIU W,et al.Building emotional dictionary for sentiment analysis of online news[J].World Wide Web,2014,17:723-742.
[7]HAMOUDA A,ROHAIM M.Reviews classification usingsentiwordnet lexicon[C]//World Congress on Computer Science and Information Technology.2011:104-105.
[8]PANG B,LEE L,VAITHYANATHAN S.Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing.2002:79-86.
[9]KIM Y.Convolutional Neural Networks for Sentence Classifica-tion[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.2014:1746-1751.
[10]SOCHER R,PERELYGIN A,WU J,et al.Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013:1631-1642.
[11]AKHTAR M S,GARG T,EKBAL A.Multi-task learning foraspect term extraction and aspect sentiment classification[J].Neurocomputing,2020,398:247-256.
[12]MACHAJDIK J,HANBURY A.Affective image classification using features inspired by psychology and art theory[C]//Proceedings of the 18th ACM International Conference on Multimedia.2010:83-92.
[13]SIERSDORFER S,MINACK E,DENG F,et al.Analyzing and predicting sentiment of images on the social web[C]//Procee-dings of the 18th ACM International Conference on Multimedia.2010:715-718.
[14]BORTH D,JI R,CHEN T,et al.Large-scale visual sentimentontology and detectors using adjective noun pairs[C]//Procee-dings of the 21st ACM International Conference on Multimedia.2013:223-232.
[15]YUAN J,MCDONOUGH S,YOU Q,et al.Sentribute:image sentiment analysis from a mid-level perspective[C]//Procee-dings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining.2013:1-8.
[16]YOU Q,LUO J,JIN H,et al.Robust image sentiment analysis using progressively trained and domain transferred deep networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2015.
[17]YANG J,SHE D,SUN M.Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network[C]//IJCAI.2017:3266-3272.
[18]LI Z,SUN Q,GUO Q,et al.Visual sentiment analysis based on image caption and adjective-noun-pair description[J].Soft Computing,2021:1-13.
[19]WANG M,CAO D,LI L,et al.Microblog sentiment analysisbased on cross-media bag-of-words model[C]//Proceedings of International Conference on Internet Multimedia Computing and Service.2014:76-80.
[20]YOU Q,LUO J,JIN H,et al.Joint visual-textual sentimentanalysis with deep neural networks[C]//Proceedings of the 23rd ACM International Conference on Multimedia.2015:1071-1074.
[21]LI P,ZHONG P,ZHANG J,et al.Convolutional transformer with sentiment-aware attention for sentiment analysis[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8.
[22]HE J,YANGA H,ZHANG C,et al.Dynamic Invariant-Specific Representation Fusion Network for Multimodal Sentiment Analysis[J].Computational Intelligence and Neuroscience,2022,2022(1):2105593.
[23]LIU H,LI K,FAN J,et al.Social Image-Text Sentiment Classification With Cross-Modal Consistency and Knowledge Distillation[J].IEEE Transactions on Affective Computing,2022,14(4):3332-3344.
[24]XU M,LIANG F,SU X,et al.Cmjrt:Cross-modal joint representation transformer for multimodal sentiment analysis[J].IEEE Access,2022,10:131671-131679.
[25]CHEN D,SU W,WU P,et al.Joint multimodal sentiment analysis based on information relevance[J].Information Processing &Management,2023,60(2):103193.
[26]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[27]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907,11692,2019.
[28]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010..
[29]WANG J,YANG Y,LIU K,et al.CiteNet:Cross-modal incongruity perception network for multimodal sentiment prediction[J].Knowledge-Based Systems,2024,295:111848.
[30]ZHAN F,YU Y,WU R,et al.Multimodal image synthesis and editing:A survey and taxonomy[J].arXiv:2112.13592,2023.
[31]NIU T,ZHU S,PANG L,et al.Sentiment analysis on multi-view social data[C]//MultiMedia Modeling:22nd International Conference(MMM 2016).Miami,FL,USA,Part II 22.2016:15-27.
[32]XU N,MAO W.Multisentinet:A deep semantic network formultimodal sentiment analysis[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2399-2402.
[33]WOLF T,DEBUT L,SANH V,et al.Transformers:State-of-the-art natural language processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.2020:38-45.
[34]LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization[J].arXiv:1711,05101,2017.
[35]ZHOU P,SHI W,TIAN J,et al.Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:207-212.
[36]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810,04805,2018.
[37]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[38]YANG X,FENG S,WANG D,et al.Image-text multimodalemotion classification via multi-view attentional network[J].IEEE Transactions on Multimedia,2020,23:4014-4026.
[39]XU N.Analyzing multimodal public sentiment based on hierarchical semantic attentional network[C]//2017 IEEE International Conference on Intelligence and Security Informatics(ISI).2017,IEEE:152-154.
[40]CAI G,XIA B.Convolutional neural networks for multimediasentiment analysis[C]//4th CCF Conference Natural Language Processing and Chinese Computing(NLPCC 2015).2015:159-167.
[41]XU N,MAO W,CHEN G.A co-memory network for multimodal sentiment analysis[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:929-932.
[42]YANG X,FENG S,ZHANG Y,et al.Multimodal sentiment detection based on multi-channel graph neural networks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:328-339.
[43]YE J,ZHOU J,TIAN J,et al.Sentiment-aware multimodal pre-training for multimodal sentiment analysis[J].Knowledge-Based Systems,2022,258:110021.
[44]WEI Y,YUAN S,YANG R,et al.Tackling modality heterogeneity with multi-view calibration network for multimodal sentiment detection[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:5240-5252.
[45]VAN DER MAATEN L,HINTON G.Visualizing data usingt-SNE[J].Journal of Machine Learning Research,2008,9(86):2579-2605.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!