计算机科学 ›› 2024, Vol. 51 ›› Issue (9): 242-249.doi: 10.11896/jsjkx.230600117

• 人工智能 • 上一篇    下一篇

基于文本和图像门控融合机制的多模态方面级情感分析

张添植1, 周刚1,2, 刘洪波1, 刘铄1, 陈静1   

  1. 1 战略支援部队信息工程大学 郑州 450001
    2 数学工程与先进计算国家重点实验室 郑州 450001
  • 收稿日期:2023-06-14 修回日期:2024-02-23 出版日期:2024-09-15 发布日期:2024-09-10
  • 通讯作者: 周刚(gzhougzhou@126.com)
  • 作者简介:(timothyz2023@163.com)
  • 基金资助:
    河南省科技攻关项目(222102210081)

Text-Image Gated Fusion Mechanism for Multimodal Aspect-based Sentiment Analysis

ZHANG Tianzhi1, ZHOU Gang1,2, LIU Hongbo1, LIU Shuo1, CHEN Jing1   

  1. 1 Information Engineering University,Zhengzhou 450001,China
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing,Zhengzhou 450001,China
  • Received:2023-06-14 Revised:2024-02-23 Online:2024-09-15 Published:2024-09-10
  • About author:ZHANG Tianzhi,born in 1995,postgraduate.His main research interests include sentiment analysis and data mining.
    ZHOU Gang,born in 1974,Ph.D,professor.His main research interests include mass data processing and know-ledge grapgh.
  • Supported by:
    Science and Technology Research Program of Henan Province(222102210081).

摘要: 多模态方面级情感分析是多模态情感分析领域的一项新兴任务,旨在对给定的方面实体在文本和图像中所体现的情感进行识别。尽管多模态方面级情感分析研究近年来取得了突破性的进展,但是现有的模型在多模态特征融合阶段大都仅采用简单的拼接方法,而没有考虑图像中是否存在与文本语义不相关的信息,这在一定程度上可能会为模型引入额外的噪声。为了解决上述问题,提出了一种基于文本和图像门控融合机制的多模态方面级情感分析模型(TIGFM)。该模型在文本和图像进行交互的同时引入了从数据集图像中提取的形容词-名词对(ANPs),并将其中形容词的加权作为图像辅助信息;此外,在特征融合阶段,通过构建一种动态控制图像和图像辅助信息输入的门控机制实现多模态特征融合。实验结果表明,TIGFM模型在两个基于Twitter的数据集上取得了具有竞争力的结果,进而验证了所提方法的有效性。

关键词: 多模态方面级情感分析, 门控融合机制, 形容词-名词对, 图像辅助信息, 语义相关性

Abstract: Multimodal aspect-based sentiment analysis is an emerging task in multimodal sentiment analysis field,which aims to identify the sentiment of each given aspect in text and image.Although recent research on multimodal sentiment analysis has made breakthrough progress,most existing models only use simple concatenation in multimodal feature fusion without considering whether there is semantically irrelevant information in image with text,which may introduce additional interference to the model.To address the above problems,this paper proposes a text-image gated gusion mechanism(TIGFM) model for multimodal aspect-based sentiment analysis,which introduces adjective-noun pairs(ANPs) extracted from the dataset images while text interacts with image,and treats the weighted adjectives as image auxiliary information.In addition,multimodal feature fusion is achieved by constructing a gating mechanism that dynamically controls the input of image and image auxiliary information in feature fusion stage.Experimental results demonstrate that TIGFM model achieves competitive results on two Twitter datasets,and then validate the effectiveness of proposed method.

Key words: Multimodal aspect-based sentiment analysis, Gated fusion mechanism, Adjective-noun pairs, Image auxiliary information, Semantic relevance

中图分类号: 

  • TP391
[1]CHEEMA G S,HAKIMOV S,MÜLLER-BUDACK E,et al.A fair and comprehensive comparison of multimodal tweet sentiment analysis methods[C]//Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding.2021:37-45.
[2]YUAN J L,DING Y Y,SHENG D M,et al.Image-Text Sentiment Analysis Model Based on Visual Aspect Attention[J].Computer Science,2022,49(1):219-224.
[3]XU N,MAO W,CHEN G.Multi-interactive memory networkfor aspect based multimodal sentiment analysis[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2019:371-378.
[4]YU J,JIANG J,XIA R.Entity-Sensitive Attention and Fusion Network for Entity-Level Multimodal Sentiment Classification[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2020,28:429-439.
[5]YU J,JIANG J.Adapting BERT for target-oriented multimodal sentiment classification[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.2019:5408-5414.
[6]WANG J,LIU Z,SHENG V,et al.Saliencybert:Recurrent at-tention network for target-oriented multimodal sentiment classification[C]//Pattern Recognition and Computer Vision:4th Chinese Conference,PRCV 2021,Beijing,China,October 29,November 1,2021,Proceedings,Part III 4.Springer International Publishing,2021:3-15.
[7]BORTH D,JI R,CHEN T,et al.Large-scale visual sentimentontology and detectors using adjective noun pairs[C]//Procee-dings of the 21st ACM International Conference on Multimedia.2013:223-232.
[8]PORIA S,HAZARIKA D,MAJUMDER N,et al.Beneath thetip of the iceberg:Current challenges and new directions in sentiment analysis research[J].IEEE Transactions on Affective Computing,2020,14:108-132.
[9]PORIA S,CAMBRIA E,HAZARIKA D,et al.Context-depen-dent sentiment analysis in user-generated videos[C]//Procee-dings of the 55th Annual Meeting of the Association for Computational Linguistics(volume 1:Long papers).2017:873-883.
[10]LIANG P P,LIU Z,ZADEH A,et al.Multimodal LanguageAnalysis with Recurrent Multistage Fusion[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:150-161.
[11]BUSSO C,DENG Z,YILDIRIM S,et al.Analysis of emotionrecognition using facial expressions,speech and multimodal information[C]//Proceedings of the 6th International Conference on Multimodal Interfaces.2004:205-211.
[12]LEE C C,MOWER E,BUSSO C,et al.Emotion recognitionusing a hierarchical binary decision tree approach[J].Speech Communication,2011,53(9/10):1162-1171.
[13]CASTRO S,HAZARIKA D,PÉREZ-ROSAS V,et al.Towards Multimodal Sarcasm Detection(An _Obviously_ Perfect Paper)[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:4619-4629.
[14]CAI Y,CAI H,WAN X.Multi-modal sarcasm detection in twit-ter with hierarchical fusion model[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:2506-2515.
[15]YANG J,SHE D,SUN M,et al.Visual sentiment predictionbased on automatic discovery of affective regions[J].IEEE Transactions on Multimedia,2018,20(9):2513-2525.
[16]YANG J,SHE D,LAI Y K,et al.Weakly supervised coupled networks for visual sentiment analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7584-7592.
[17]KUMAR A,GARG G.Sentiment analysis of multimodal twitter data[J].Multimedia Tools and Applications,2019,78:24103-24119.
[18]KUMAR A,SRINIVASAN K,CHENG W H,et al.Hybrid con-text enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data[J].Information Processing & Management,2020,57(1):102141.
[19]ZHANG L,WANG S,LIU B.Deep learning for sentiment ana-lysis:A survey[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2018,8(4):e1253.
[20]PONTIKI M,GALANIS D,PAPAGEORGIOU H,et al.Semeval-2016 task 5:Aspect based sentiment analysis[C]//ProWorkshop on Semantic Evaluation(SemEval-2016).Association for Computational Linguistics,2016:19-30.
[21]DONG L,WEI F,TAN C,et al.Adaptive recursive neural network for target-dependent twitter sentiment classification[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics(volume 2:Short papers).2014:49-54.
[22]XUE W,LI T.Aspect Based Sentiment Analysis with GatedConvolutional Networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Vo-lume 1:Long Papers).2018:2514-2523.
[23]MA Y,PENG H,CAMBRIA E.Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:5876-5883.
[24]MEŠKELČ D,FRASINCAR F.ALDONAr:A hybrid solutionfor sentence-level aspect-based sentiment analysis using a lexicalized domain ontology and a regularized neural attention mo-del[J].Information Processing & Management,2020,57(3):102211.
[25]ZHAO L,LIU Y,ZHANG M,et al.Modeling label-wise syntax for fine-grained sentiment analysis of reviews via memory-based neural model[J].Information Processing & Management,2021,58(5):102641.
[26]WANG K,SHEN W,YANG Y,et al.Relational Graph Atten-tion Network for Aspect-based Sentiment Analysis[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:3229-3238.
[27]ZHANG M,QIAN T.Convolution over hierarchical syntacticand lexical graphs for aspect level sentiment analysis[C]//Proceedings of the 2020 Conference on Empirical Methods in NaturalLanguage Processing(EMNLP).2020:3540-3549.
[28]XU H,LIU B,SHU L,et al.BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:2324-2335.
[29]SUN C,HUANG L,QIU X.Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence[C]//Proceedings of NAACL-HLT.2019:380-385.
[30]KHAN Z,FU Y.Exploiting BERT for multimodal target sentiment classification through input space translation[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:3034-3042.
[31]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907.11692,2019.
[32]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of NAACL-HLT.2019:4171-4186.
[33]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[34]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[35]CHEN T,BORTH D,DARRELL T,et al.Deepsentibank:Vi-sual sentiment concept classification with deep convolutional neural networks[J].arXiv:1410.8586,2014.
[36]ZHAO F,WU Z,LONG S,et al.Learning from Adjective-Noun Pairs:A Knowledge-enhanced Framework for Target-Oriented Multimodal Sentiment Classification[C]//Proceedings of the 29th International Conference on Computational Linguistics.2022:6784-6794.
[37]WANG Y,HUANG M,ZHU X,et al.Attention-based LSTMfor aspect-level sentiment classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:606-615.
[38]FAN F,FENG Y,ZHAO D.Multi-grained attention network for aspect-level sentiment classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3433-3442.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!