计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200163-8.doi: 10.11896/jsjkx.231200163

• 智能计算 • 上一篇    下一篇

基于多视角的图像文本情感分析

高玮军, 孙子博, 刘书君   

  1. 兰州理工大学计算机与通信学院 兰州 730000
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 孙子博(941909610@qq.com)
  • 作者简介:(gaoweijun@lut.edu.cn)
  • 基金资助:
    国家自然科学基金(51668043)

Sentiment Analysis of Image-Text Based on Multiple Perspectives

GAO Weijun, SUN Zibi, LIU Shujun   

  1. School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730000,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:GAO Weijun,born in 1973,associate professor.His main research interests include software engineering,natural language processing,multimodal sentiment analysis.
    SUN Zibo,born in 1998,graduate student.His main research interest is multimodal sentiment analysis multimodal sentiment analysis
  • Supported by:
    National Natural Science Foundation of China(51668043).

摘要: 在社交媒体中,人们往往首先被图片中的人物表情所吸引,直接触及到情感。然而,对于情感的完整表达,场景也扮演着不可或缺的角色,为情感分析提供了必要的背景和支持。但许多学者忽视了场景在情感表达中的重要性,导致结果并非最优。针对图文双模态情感分析模型存在忽略多模态间的对齐、图片特征提取不充分和模型泛化能力不高的问题,提出了多视角图像文本情感分析网络(Multi-view Image-Text Emotion Analysis Network Model,MITN)。在图像特征提取中,在面部表情方面加入注意力机制来更好地捕捉人物面部表情,在场景方面加入空洞卷积引入膨胀率来增大感受野,并利用Places数据集对Scene-VGG进行迁移学习训练,以此来充分利用场景。使用BERT+BiGRU来提取文本表达特征,在多模态情感数据集 MVSA 上的实验验证了所提模型的有效性。

关键词: 多模态, 情感分析, 多视角, 迁移学习, 注意力机制

Abstract: In the realm of social media,facial expressions of characters in pictures often captivate our attention first,directly evoking strong emotional responses.However,for a truly comprehensive emotional expression,scenes play a pivotal role,serving as a crucial backdrop and support for emotional analysis.Scenes provide context,setting the tone and atmosphere for the emotions being expressed.Regrettably,numerous scholars have failed to fully recognize the significance of scenes in emotional expression,often focusing solely on facial expressions.This oversight has led to suboptimal outcomes in sentiment analysis,missing out on the rich emotional nuances that scenes can provide.To address these challenges,we propose the multi-view image text sentiment analysis network(MITN).This innovative approach takes into account both facial expressions and scenes,providing a more comprehensive analysis of emotional expression.In MITN,we enhance image feature extraction by incorporating an attention mechanism that meticulously captures the facial expressions of characters.At the same time,dilated convolution is introduced to broa-den the receptive field,focusing on the intricate details of the scene.Moreover,we leverage the Places dataset for transfer learning training of Scene-VGG.This allows us to fully utilize the vast amount of scene information available,enhancing the accuracy and depth of our emotional analysis.The effectiveness of MITN is rigorously tested through experiments on the multimodal sentiment dataset MVSA.Utilizing BERT+BiGRU to extract text expression features,our model demonstrates superior performance in sentiment analysis,accurately capturing the emotional nuances present in both facial expressions and scenes.This comprehensive approach offers a new perspective in sentiment analysis,paving the way for more accurate and nuanced understanding of emotio-nal expression in social media.

Key words: Multi-modal, Sentiment analysis, Multi-view, Transfer learning, Attention mechanism

中图分类号: 

  • TP391
[1]GIATSOGLOU M,VOZALIS M G,DIAMANTARAS K,et al.Sentiment analysis leveraging emotions and word embeddings[J].Expert Systems with Applications,2017,69:214-224.
[2]SINGH V,RAM M,PANT B.Identification of zonal-wise passenger's issues in Indian railways using latent Dirichlet allocation(LDA):A sentiment analysis approach on tweets[M]//Mathematics Applied in Information Systems.2018.
[3]CHATURVEDI I,RAGUSA E,GASTALDO P,et al.Bayesian network based extreme learning machine for subjectivity detection[J].Journal of The Franklin Institute,2018,355(4):1780-1797.
[4]BANDHAKAVI A,WIRATUNGA N,MASSIES,et al.Lexicon generation for emotion detection from text[J].IEEE intelligent systems,2017,32(1):102-108.
[5]PORIA S,CAMBRIA E,BAJPAIR,et al.A review of affective computing:From unimodal analysis to multimodal fusion[J].Information Fusion,2017,37:98-125.
[6]HUANG Y,DU C,XUE Z,et al.What Makes MultimodalLearning Better than Single(Provably)[J].Advances in Neural Information Processing Systems,2021,34:10944-10956.
[7]DENG D,ZHOU Y,PI J,et al.Multimodal utterance-level affect analysis using visual,audio and text features[J].arXiv:1805.00625,2018.
[8]SHUTOVA E,KIELA D,MAILLARD J.Black holes and white rabbits:Metaphor identification with visual features[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:160-170.
[9]YU Y,LIN H,MENG J,et al.Visual and textual sentimentanalysis of a microblog using deep convolutional neural networks[J].Algorithms,2016,9(2):41.
[10]LIU H Y,HU Z G,PENG D L.The interaction of emotion andlanguage processing[J].Advances in Psychological Science,2009,17(4):714.
[11]ORTIS A,FARINELLA G M,BATTIATO S.An Overview on Image Sentiment Analysis:Methods,Datasets and Current Challenges[J].ICETE(1),2019:296-306.
[12]COLOMBO C,DEL BIMBO A,PALA P.Semantics in visual information retrieval[J].IEEE Multimedia,1999,6(3):38-53.
[13]SCHMIDT S,STOCK W G.Collective indexing of emotions in images.A study in emotional information retrieval[J].Journal of the American Society for Information Science and Technology,2009,60(5):863-876.
[14]BORTH D,JI R,CHEN T,et al.Large-scale visual sentimentontology and detectors using adjective noun pairs[C]//Proceedings of the 21st ACM International Conference on Multimedia.2013:223-232.
[15]YOU Q,JIN H,LUO J.Visual sentiment analysis by attending on local image regions[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017.
[16]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[17]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 million image database for scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1452-1464.
[18]ASGHAR M Z,KUNDI F M,AHMAD S,et al.T-SAF:Twitter sentiment analysis framework using a hybrid classification scheme[J].Expert Systems,2018,35(1):e12233.
[19]HAMOUDA A,ROHAIM M.Reviews classification usingsentiwordnet lexicon[C]//World Congress on Computer Science and Information Technology.sn,2011,23:104-105.
[20]TANG D,WEI F,QIN B,et al.Coooolll:A deep learning system for twitter sentiment classification[C]//Proceedings of the 8th International Workshop on Semantic Evaluation(SemEval 2014).2014:208-212.
[21]YANG Z,YANG D,DYER C,et al.Hierarchical attention net-works for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:1480-1489.
[22]CHIONG R,FAN Z,HU Z,et al.A sentiment analysis-based machine learning approach for financial market prediction via news disclosures[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion.2018:278-279.
[23]XU J,HUANG F,ZHANG X,et al.Visual-textual sentiment classification with bi-directional multi-level attention networks[J].Knowledge-Based Systems,2019,178:61-73.
[24]HUANG F,ZHANG X,ZHAO Z,et al.Image-text sentimentanalysis via deep multimodal attentive fusion[J].Knowledge-Based Systems,2019,167:26-37.
[25]XU J,LI Z,HUANG F,et al.Social image sentiment analysis by exploiting multimodal content and heterogeneous relations[J].IEEE Transactions on Industrial Informatics,2020,17(4):2974-2982.
[26]YANG J,YU Y,NIU D,et al.ConFEDE:Contrastive Feature Decomposition for Multimodal Sentiment Analysis[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:7617-7630.
[27]FAN F,FENG Y,ZHAO D.Multi-grained attention network for aspect-level sentiment classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3433-3442.
[28]ZHANG L,ZHANG X,PAN J.Hierarchical cross-modality semantic correlation learning model for multimodal summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(10):11676-11684.
[29]PORIA S,CAMBRIA E,HAZARIKA D,et al.Multi-level multiple attentions for contextual multimodal sentiment analysis[C]//2017 IEEE International Conference on Data Mining(ICDM).IEEE,2017:1033-1038.
[30]ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis[J].arXiv:1707.07250,2017.
[31]AREVALO J,SOLORIO T,MONTES-Y-GÓMEZ M,et al.Gated multimodal units for information fusion[J].arXiv:1702.01992,2017.
[32]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient low-rank multimodal fusion with modality-specific factors[J].arXiv:1806.00064,2018.
[33]YOU Q,LUO J,JIN H,et al.Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia[C]//Proceedings of the Ninth ACM InternationalConfe-rence on Web Search and Data Mining.2016:13-22.
[34]SIMONYAN K,ZISSERMANA.Very deep convolutional net-works for large-scale image recognition[J].arXiv:1409.1556,2014.
[35]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 Million Image Database for Scene Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,40(6):1452-1464.
[36]DEVLIN J,CHANG M,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//NAACL.2019:4171-4186.
[37]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//Proceedings of the International Conference on Learning Representations.2015.
[38]SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,1997,45(11):2673-2681.
[39]NIU T,ZHU S,PANG L,et al.Sentiment analysis on multi-view social data,[C]//Proceedings of the International Confe-renceon Multimedia Modeling.2016:15-27.
[40]HUANG J,WANG Y.Emotional Analysis Method for ImageText Fusion Based on Image Semantic Translation [J].Compu-ter Engineering and Applications,2023,59(11):180-187.
[41]HUANG H Z,MENG Z Q.Multimodal sentiment classification method based on bidirectional attention mechanism [J].Computer Engineering and Applications,2021,57(11):9.
[42]YANG X,FENG S,WANG D,et al.Image-text multimodal emotion classification via multi-view attentional network[J].IEEE Transactions on Multimedia,2020,23:4014-4026.
[43]ZHU T,LI L,YANG J,et al.Multimodal sentiment analysiswith image-text interaction network[J].IEEE Transactions on Multimedia,2022,25:3375-3385.
[44]LI Z,XU B,ZHU C,et al.CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection[J].arXiv:2204.05515,2022.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!