Computer Science ›› 2024, Vol. 51 ›› Issue (11A): 231200163-8.doi: 10.11896/jsjkx.231200163

• Intelligent Computing • Previous Articles     Next Articles

Sentiment Analysis of Image-Text Based on Multiple Perspectives

GAO Weijun, SUN Zibi, LIU Shujun   

  1. School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730000,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:GAO Weijun,born in 1973,associate professor.His main research interests include software engineering,natural language processing,multimodal sentiment analysis.
    SUN Zibo,born in 1998,graduate student.His main research interest is multimodal sentiment analysis multimodal sentiment analysis
  • Supported by:
    National Natural Science Foundation of China(51668043).

Abstract: In the realm of social media,facial expressions of characters in pictures often captivate our attention first,directly evoking strong emotional responses.However,for a truly comprehensive emotional expression,scenes play a pivotal role,serving as a crucial backdrop and support for emotional analysis.Scenes provide context,setting the tone and atmosphere for the emotions being expressed.Regrettably,numerous scholars have failed to fully recognize the significance of scenes in emotional expression,often focusing solely on facial expressions.This oversight has led to suboptimal outcomes in sentiment analysis,missing out on the rich emotional nuances that scenes can provide.To address these challenges,we propose the multi-view image text sentiment analysis network(MITN).This innovative approach takes into account both facial expressions and scenes,providing a more comprehensive analysis of emotional expression.In MITN,we enhance image feature extraction by incorporating an attention mechanism that meticulously captures the facial expressions of characters.At the same time,dilated convolution is introduced to broa-den the receptive field,focusing on the intricate details of the scene.Moreover,we leverage the Places dataset for transfer learning training of Scene-VGG.This allows us to fully utilize the vast amount of scene information available,enhancing the accuracy and depth of our emotional analysis.The effectiveness of MITN is rigorously tested through experiments on the multimodal sentiment dataset MVSA.Utilizing BERT+BiGRU to extract text expression features,our model demonstrates superior performance in sentiment analysis,accurately capturing the emotional nuances present in both facial expressions and scenes.This comprehensive approach offers a new perspective in sentiment analysis,paving the way for more accurate and nuanced understanding of emotio-nal expression in social media.

Key words: Multi-modal, Sentiment analysis, Multi-view, Transfer learning, Attention mechanism

CLC Number: 

  • TP391
[1]GIATSOGLOU M,VOZALIS M G,DIAMANTARAS K,et al.Sentiment analysis leveraging emotions and word embeddings[J].Expert Systems with Applications,2017,69:214-224.
[2]SINGH V,RAM M,PANT B.Identification of zonal-wise passenger's issues in Indian railways using latent Dirichlet allocation(LDA):A sentiment analysis approach on tweets[M]//Mathematics Applied in Information Systems.2018.
[3]CHATURVEDI I,RAGUSA E,GASTALDO P,et al.Bayesian network based extreme learning machine for subjectivity detection[J].Journal of The Franklin Institute,2018,355(4):1780-1797.
[4]BANDHAKAVI A,WIRATUNGA N,MASSIES,et al.Lexicon generation for emotion detection from text[J].IEEE intelligent systems,2017,32(1):102-108.
[5]PORIA S,CAMBRIA E,BAJPAIR,et al.A review of affective computing:From unimodal analysis to multimodal fusion[J].Information Fusion,2017,37:98-125.
[6]HUANG Y,DU C,XUE Z,et al.What Makes MultimodalLearning Better than Single(Provably)[J].Advances in Neural Information Processing Systems,2021,34:10944-10956.
[7]DENG D,ZHOU Y,PI J,et al.Multimodal utterance-level affect analysis using visual,audio and text features[J].arXiv:1805.00625,2018.
[8]SHUTOVA E,KIELA D,MAILLARD J.Black holes and white rabbits:Metaphor identification with visual features[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:160-170.
[9]YU Y,LIN H,MENG J,et al.Visual and textual sentimentanalysis of a microblog using deep convolutional neural networks[J].Algorithms,2016,9(2):41.
[10]LIU H Y,HU Z G,PENG D L.The interaction of emotion andlanguage processing[J].Advances in Psychological Science,2009,17(4):714.
[11]ORTIS A,FARINELLA G M,BATTIATO S.An Overview on Image Sentiment Analysis:Methods,Datasets and Current Challenges[J].ICETE(1),2019:296-306.
[12]COLOMBO C,DEL BIMBO A,PALA P.Semantics in visual information retrieval[J].IEEE Multimedia,1999,6(3):38-53.
[13]SCHMIDT S,STOCK W G.Collective indexing of emotions in images.A study in emotional information retrieval[J].Journal of the American Society for Information Science and Technology,2009,60(5):863-876.
[14]BORTH D,JI R,CHEN T,et al.Large-scale visual sentimentontology and detectors using adjective noun pairs[C]//Proceedings of the 21st ACM International Conference on Multimedia.2013:223-232.
[15]YOU Q,JIN H,LUO J.Visual sentiment analysis by attending on local image regions[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017.
[16]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[17]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 million image database for scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1452-1464.
[18]ASGHAR M Z,KUNDI F M,AHMAD S,et al.T-SAF:Twitter sentiment analysis framework using a hybrid classification scheme[J].Expert Systems,2018,35(1):e12233.
[19]HAMOUDA A,ROHAIM M.Reviews classification usingsentiwordnet lexicon[C]//World Congress on Computer Science and Information Technology.sn,2011,23:104-105.
[20]TANG D,WEI F,QIN B,et al.Coooolll:A deep learning system for twitter sentiment classification[C]//Proceedings of the 8th International Workshop on Semantic Evaluation(SemEval 2014).2014:208-212.
[21]YANG Z,YANG D,DYER C,et al.Hierarchical attention net-works for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:1480-1489.
[22]CHIONG R,FAN Z,HU Z,et al.A sentiment analysis-based machine learning approach for financial market prediction via news disclosures[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion.2018:278-279.
[23]XU J,HUANG F,ZHANG X,et al.Visual-textual sentiment classification with bi-directional multi-level attention networks[J].Knowledge-Based Systems,2019,178:61-73.
[24]HUANG F,ZHANG X,ZHAO Z,et al.Image-text sentimentanalysis via deep multimodal attentive fusion[J].Knowledge-Based Systems,2019,167:26-37.
[25]XU J,LI Z,HUANG F,et al.Social image sentiment analysis by exploiting multimodal content and heterogeneous relations[J].IEEE Transactions on Industrial Informatics,2020,17(4):2974-2982.
[26]YANG J,YU Y,NIU D,et al.ConFEDE:Contrastive Feature Decomposition for Multimodal Sentiment Analysis[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:7617-7630.
[27]FAN F,FENG Y,ZHAO D.Multi-grained attention network for aspect-level sentiment classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3433-3442.
[28]ZHANG L,ZHANG X,PAN J.Hierarchical cross-modality semantic correlation learning model for multimodal summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(10):11676-11684.
[29]PORIA S,CAMBRIA E,HAZARIKA D,et al.Multi-level multiple attentions for contextual multimodal sentiment analysis[C]//2017 IEEE International Conference on Data Mining(ICDM).IEEE,2017:1033-1038.
[30]ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis[J].arXiv:1707.07250,2017.
[31]AREVALO J,SOLORIO T,MONTES-Y-GÓMEZ M,et al.Gated multimodal units for information fusion[J].arXiv:1702.01992,2017.
[32]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient low-rank multimodal fusion with modality-specific factors[J].arXiv:1806.00064,2018.
[33]YOU Q,LUO J,JIN H,et al.Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia[C]//Proceedings of the Ninth ACM InternationalConfe-rence on Web Search and Data Mining.2016:13-22.
[34]SIMONYAN K,ZISSERMANA.Very deep convolutional net-works for large-scale image recognition[J].arXiv:1409.1556,2014.
[35]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 Million Image Database for Scene Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,40(6):1452-1464.
[36]DEVLIN J,CHANG M,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//NAACL.2019:4171-4186.
[37]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//Proceedings of the International Conference on Learning Representations.2015.
[38]SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,1997,45(11):2673-2681.
[39]NIU T,ZHU S,PANG L,et al.Sentiment analysis on multi-view social data,[C]//Proceedings of the International Confe-renceon Multimedia Modeling.2016:15-27.
[40]HUANG J,WANG Y.Emotional Analysis Method for ImageText Fusion Based on Image Semantic Translation [J].Compu-ter Engineering and Applications,2023,59(11):180-187.
[41]HUANG H Z,MENG Z Q.Multimodal sentiment classification method based on bidirectional attention mechanism [J].Computer Engineering and Applications,2021,57(11):9.
[42]YANG X,FENG S,WANG D,et al.Image-text multimodal emotion classification via multi-view attentional network[J].IEEE Transactions on Multimedia,2020,23:4014-4026.
[43]ZHU T,LI L,YANG J,et al.Multimodal sentiment analysiswith image-text interaction network[J].IEEE Transactions on Multimedia,2022,25:3375-3385.
[44]LI Z,XU B,ZHU C,et al.CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection[J].arXiv:2204.05515,2022.
[1] LI Yunchen, ZHANG Rui, WANG Jiabao, LI Yang, WANG Ziqi, CHEN Yao. Re-parameterization Enhanced Dual-modal Realtime Object Detection Model [J]. Computer Science, 2024, 51(9): 162-172.
[2] HU Pengfei, WANG Youguo, ZHAI Qiqing, YAN Jun, BAI Quan. Night Vehicle Detection Algorithm Based on YOLOv5s and Bistable Stochastic Resonance [J]. Computer Science, 2024, 51(9): 173-181.
[3] HUANG Xiaofei, GUO Weibin. Multi-modal Fusion Method Based on Dual Encoders [J]. Computer Science, 2024, 51(9): 207-213.
[4] ZHANG Tianzhi, ZHOU Gang, LIU Hongbo, LIU Shuo, CHEN Jing. Text-Image Gated Fusion Mechanism for Multimodal Aspect-based Sentiment Analysis [J]. Computer Science, 2024, 51(9): 242-249.
[5] MO Shuyuan, MENG Zuqiang. Multimodal Sentiment Analysis Model Based on Visual Semantics and Prompt Learning [J]. Computer Science, 2024, 51(9): 250-257.
[6] LIU Qian, BAI Zhihao, CHENG Chunling, GUI Yaocheng. Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion [J]. Computer Science, 2024, 51(9): 258-264.
[7] LI Zhe, LIU Yiyang, WANG Ke, YANG Jie, LI Yafei, XU Mingliang. Real-time Prediction Model of Carrier Aircraft Landing Trajectory Based on Stagewise Autoencoders and Attention Mechanism [J]. Computer Science, 2024, 51(9): 273-282.
[8] LIU Qilong, LI Bicheng, HUANG Zhiyong. CCSD:Topic-oriented Sarcasm Detection [J]. Computer Science, 2024, 51(9): 310-318.
[9] YAO Yao, YANG Jibin, ZHANG Xiongwei, LI Yihao, SONG Gongkunkun. CLU-Net Speech Enhancement Network for Radio Communication [J]. Computer Science, 2024, 51(9): 338-345.
[10] LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning [J]. Computer Science, 2024, 51(8): 133-142.
[11] ZHANG Rui, WANG Ziqi, LI Yang, WANG Jiabao, CHEN Yao. Task-aware Few-shot SAR Image Classification Method Based on Multi-scale Attention Mechanism [J]. Computer Science, 2024, 51(8): 160-167.
[12] WANG Qian, HE Lang, WANG Zhanqing, HUANG Kun. Road Extraction Algorithm for Remote Sensing Images Based on Improved DeepLabv3+ [J]. Computer Science, 2024, 51(8): 168-175.
[13] XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[14] PU Bin, LIANG Zhengyou, SUN Yu. Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion [J]. Computer Science, 2024, 51(8): 192-199.
[15] ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan. Diversified Label Matrix Based Medical Image Report Generation [J]. Computer Science, 2024, 51(8): 200-208.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!