计算机科学 ›› 2024, Vol. 51 ›› Issue (9): 258-264.doi: 10.11896/jsjkx.230700163
刘倩1, 白志豪1, 程春玲1, 归耀城2
LIU Qian1, BAI Zhihao1, CHENG Chunling1, GUI Yaocheng2
摘要: 图文情感分类任务常用早期融合和Transformer模型相结合的跨模态特征融合策略进行图文特征融合,但该策略更倾向于关注模态内部的独有信息,而忽略了模态间的相互联系和共有信息,导致跨模态特征融合效果不理想。针对此问题,提出一种基于多尺度跨模态特征融合的图文情感分类方法。局部尺度方面,基于跨模态注意力机制进行局部特征融合,使模型不仅关注图像和文本的独有信息,而且可以发现图像和文本之间的联系和共有信息。全局尺度方面,基于MLM损失进行全局特征融合,使模型对图像和文本数据进行全局建模,进一步挖掘图像和文本之间的联系,从而促进图像和文本特征的深度融合。在两个公开数据集MVSA-Single和MVSA-Multiple上与10个基线模型进行对比实验,结果表明所提方法在精度、F1值和模型参数量方面均具有明显优势,验证了其有效性。
中图分类号:
[1]ZHANG L,WANG S,LIU B.Deep learning for sentiment ana-lysis:A survey[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2018,8(4):e1253. [2]GUO Y X,JIN Y,TANG H,et al.Multi-modal Emotion Recognition Based on Dynamic Convolution and Residual Gating[J].Computer Engineering,2023,49(7):94-101. [3]AN X.Research on image-text sentiment analysis method based on cross-modal fusion [D].Beijing:Beijing University of Technology,2020. [4]PETZ G,KARPPWICZ M,FURSCHUß H,et al.Reprint of:Computational approaches for mining user's opinions on the Web 2.0[J].Information Processing & Management,2015,51(4):510-519. [5]BALTRUSAITIS T,AHUJA C,MORENCY L P.Multimodal machine learning:A survey and taxonomy[J].IEEE transactions on pattern analysis and machine intelligence,2018,41(2):423-443. [6]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of NIPS'17.2017:6000-6010. [7]XU N,MAO W,CHEN G.A co-memory network for multimodal sentiment analysis[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrie-val.2018:929-932. [8]NAM H,HA J W,KIM J.Dual attention networks for multimodal reasoning and matching[C]//Proceedings of CVPR'17.2017:299-307. [9]LEE K H,CHEN X,HUA G,et al.Stacked cross attention for image-text matching[C]//Proceedings of ECCV'18.2018:201-216. [10]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of NAACL'19.2019:4171-4186. [11]AI-TAMEEMI I K S,FEIZI-DERAKHSHI M R,PASHAZADEH S,et al.A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks[J].arXiv:2207.02160,2022. [12]CAI G,XIA B.Convolutional neural networks for multimedia sentiment analysis [C]//Proceedings of NLPCC'15.2015:159-167. [13]XU N,MAO W.Multisentinet:A deep semantic network for multimodal sentiment analysis [C]//Proceedings of CIKM'17.2017:2399-2402. [14]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [15]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780. [16]CHEEMA G S,HAKIMOV S,MULLER-BUDACK E,et al.A fair and comprehensive comparison of multimodal tweet sentiment analysis methods[C]//Proceedings of MMPT'21.2021:37-45. [17]LI Z,XU B,ZHU C,et al.CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection [C]//Findings of NAACL'22.2022:2282-2294. [18]WEI Y,YUAN S,YANG R,et al.Tackling Modality Heterogeneity with Multi-View Calibration Network for Multimodal Sentiment Detection[C]//Proceedings of ACL'23.2023:5240-5252. [19]WANG H,LI X,REN Z,et al.Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion[J].Sensors,2023,23(5):2679. [20]ZHANG Z,CUI P,ZHU W.Deep learning on graphs:A survey[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):249-270. [21]LIAO W,ZENG B,LIU J,et al.Image-text interaction graph neural network for image-text sentiment analysis[J].Applied Intelligence,2022,52(10):11184-11198. [22]YANG X,FENG S,ZHANG Y,et al.Multimodal sentiment detection based on multi-channel graph neural networks[C]//Proceedings of ACL'21.2021:328-339. [23]JIANG T,WANG J,LIU Z,et al.Fusion-extraction network for multimodal sentiment analysis[C]//Proceedings of PAKDD'20.2020:785-797. [24]PENG C,ZHANG C,XUE X,et al.Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification[J].Tsinghua Science and Technology,2021,27(4):664-679. [25]YU Y,LIN H,MENG J,et al.Visual and textual sentimentanalysis of a microblog using deep convolutional neural networks[J].Algorithms,2016,9(2):41. [26]LI J,SELVARAJU R,GOTMARE A,et al.Align before fuse:Vision and language representation learning with momentum distillation[J].Advances in Neural Information Processing Systems,2021,34:9694-9705. [27]ZHAO J,LI R,JIN Q,et al.Memobert:Pre-training model with prompt-based learning for multimodal emotion recognition[C]//Proceedings of ICASSP'22.2022:4703-4707. [28]SUN C,MYERS A,VONDRICK C,et al.Videobert:A jointmodel for video and language representation learning[C]//Proceedings of ICCV'19.2019:7464-7473. [29]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition [C]//Proceedings of CVPR'16.2016:770-778. [30]NIU T,ZHU S,PANG L,et al.Sentiment analysis on multi-view social data[C]//Proceedings of MMM'16.2016:15-27. [31]WOLF T,DEBUT L,SANH V,et al.Transformers:State-of-the-art natural language processing[C]//Proceedings of EMNLP'20.2020:38-45. |
|