计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 298-306.doi: 10.11896/jsjkx.220100156

• 人工智能 • 上一篇    下一篇

基于自适应门控信息融合的多模态情感分析

陈真1, 普园媛1,2, 赵征鹏1, 徐丹1, 钱文华1   

  1. 1 云南大学信息学院 昆明 650504
    2 云南省高校物联网技术及应用重点实验室 昆明 650504
  • 收稿日期:2022-01-16 修回日期:2022-09-20 出版日期:2023-03-15 发布日期:2023-03-15
  • 通讯作者: 赵征鹏(zhpzhao@ynu.edu.cn)
  • 作者简介:(15837332933@163.com)
  • 基金资助:
    国家自然科学基金(62162068,61271361,61761046,62061049);云南省应用基础研究面上项目(2018FB100);云南省科技厅应用基础研究计划重点项目(202001BB050043,2019FA044)

Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion

CHEN Zhen1, PU Yuanyuan1,2, ZHAO Zhengpeng1, XU Dan1, QIAN Wenhua1   

  1. 1 College of Information Science and Engineering,Yunnan University,Kunming 650504,China
    2 University Key Laboratory of Internet of Things Technology and Application,Yunnan Province,Kunming 650504,China
  • Received:2022-01-16 Revised:2022-09-20 Online:2023-03-15 Published:2023-03-15
  • About author:CHEN Zhen,born in 1994,postgra-duate.His main research interests include multimodal sentiment analysis and image processing.
    ZHAO Zhengpeng,born in 1973,master,associate professor.His main research interests include digital image processing and speech signal proces-sing.
  • Supported by:
    National Natural Science Foundation of China(62162068,61271361,61761046,62061049),Yunnan Science and Technology Department Project(2018FB100) and Key Program of the Applied Basic Research Programs of Yunnan(202001BB050043,2019FA044).

摘要: 多模态情感分析的目标是使用由多种模态提供的互补信息来实现可靠和稳健的情感分析。近年来,通过神经网络提取深层语义特征,在多模态情感分析任务中取得了显著的效果。而多模态信息的不同层次的特征融合也是决定情感分析效果的重要环节。因此,提出了一种基于自适应门控信息融合的多模态情感分析模型(AGIF)。首先,通过门控信息融合网络将Swin Transformer和ResNet提取的不同层次的视觉和色彩特征根据对情感分析的贡献进行有机融合。其次,由于情感的抽象性和复杂性,图像的情感往往由多个细微的局部区域体现,而迭代注意可以根据过去的信息精准定位这些情感判别区域。针对Word2Vec和GloVe无法解决一词多义的问题,采用了最新的ERNIE预训练模型。最后,利用自动融合网络“动态”融合各模态特征,解决了(拼接或TFN)确定性操作构建多模态联合表示所带来的信息冗余问题。在3个公开的真实数据集上进行了大量实验,证明了该模型的有效性。

关键词: 多模态情感分析, 门控信息融合网络, 迭代注意, ERNIE, 自动融合网络

Abstract: The goal of multimodal sentiment analysis is to achieve reliable and robust sentiment analysis by utilizing complementary information provided by multiple modalities.Recently,extracting deep semantic features by neural networks has achieved remarkable results in multimodal sentiment analysis.But the fusion of features at different levels of multimodal information is also an important part in determining the effectiveness of sentiment analysis.Thus,a multimodal sentiment analysis model based on adaptive gating information fusion(AGIF) is proposed.Firstly,the different levels of visual and color features extracted by swin transformer and ResNet are organically fused through a gated information fusion network based on their contribution to sentiment analysis.Secondly,the sentiment of an image is often expressed by multiple subtle local regions due to the abstraction and complexity of sentiment,and these sentiment discriminating regions can be located accurately by iterative attention based on past information.The latest ERNIE pre-training model is utilized to solve the problem of Word2Vec and GloVe's inability to handle the word polysemy.Finally,the auto-fusion network is utilized to “dynamically” fuse the features of each modality,solving the pro-blem of information redundancy caused by the deterministic operation(concatenation or TFN) to construct multimodal joint representation.Extensive experiments on three publicly available real datasets demonstrate the effectiveness of the proposed model.

Key words: Multimodal sentiment analysis, Gated information fusion networks, Iterative attention, ERNIE, Auto-fusion network

中图分类号: 

  • TP391
[1]KAGAN V,STEVENS A,SUBRAHMANIAN V S.UsingTwitter Sentiment to Forecast the 2013 Pakistani Election and the 2014 Indian Election [J].IEEE Intelligent Systems,2015,30(1):2-5.
[2]BOLLEN J,MAO H N,ZENG S J.Twitter mood predicts the stock market [J].Journal of Computational Science,2011,2(1):1-8.
[3]LI X D,XIE H R,CHEN L,et al.News impact on stock price return via sentiment analysis [J].Knowledge-Based Systems,2014,69(15):14-23.
[4]HUR M,KANG P,CHO S.Box-office forecasting based on sentiments of movie reviews and Independent subspace method [J].Information Sciences,2016:608-624.
[5]XU N,MAO W J.MultiSentiNet:A Deep Semantic Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2399-2402.
[6]HUANG F R,ZHANG X M,ZHAO Z H,et al.Image-text sentiment analysis via deep multimodal attentive fusion [J].Know-ledge-Based Systems,2019:167:26-37.
[7]LIN M H,MENG Z Q.Multimodal Sentiment Analysis Based on Attention Neural Network [J].Computer Science,2020,47(S2):508-514,548.
[8]XU N,MAO W J,CHEN G D.A co-memory network for multimodal sentiment analysis[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:929-932.
[9]XU J,HUANG F R,ZHANG X M,et al.Visual-textual sentiment classification with bi-directional multi-level attention networks [J].Knowledge Based Systems,2019,178(AUG.15):61-73.
[10]YANG X C,FENG S,WAND D L,et al.Image-Text Multimodal Emotion Classification via Multi-View Attentional Network [J].IEEE Transactions on Multimedia,2021,23(1):4014-4026.
[11]ANDERSON P,HE X D,BUEHLER C,et al.Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086.
[12]JIANG H,MISRA I,ROHRBACH M,et al.In defense of grid features for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10264-10273.
[13]WEN Z,PENG Y.Multi-level knowledge injecting for visualcommonsense reasoning [J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(3):1042-1054.
[14]ENGIN D,SCHNITZLER F,DUONG N Q K,et al.On the hidden treasure of dialog in video question answering[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:2064-2073.
[15]MIKOLOV T,CORRADO G,KAI C,et al.Efficient Estimation of Word Representations in Vector Space [J].Advances in Neural Information Processing Systems,2013,26(1):3111-3119.
[16]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing.EMNLP,2014:1532-1543.
[17]LIU Z,LIN Y T,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2021:10012-10022.
[18]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[19]ZHAND X D,GAO X B,LU W,et al.A Gated Peripheral-Fo-veal Convolutional Neural Network for Unified Image Aesthetic Prediction [J].IEEE Transactions on Multimedia,2019,21(11):2815-2826.
[20]MNIH V,HEESS N,GRAVES A,et al.Recurrent models ofvisual attention[C]//Proceedings of the Neural Information Processing Systems.2014:2204-2212.
[21]SUN Y,WANG S,LI Y,et al.ERNIE 2.0:A Continual Pre-Training Framework for Language Understanding[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8968-8975.
[22]HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8) 2011-2023.
[23]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling [J].arXiv::1412.3555,2014.
[24]ZHAO L,SHANG M,GAO F,et al.Representation learning of image composition for aesthetic prediction [J].Computer Vision and Image Understanding,2020,199(9):103024.
[25]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Copenhagen Denmark,2017:1103-1114.
[26]TENG N,ZHU S,LEI P,et al.Sentiment analysis on multi-view social data[C]//International Conference on Multimedia Mode-ling.2016:15-27.
[27]MACHAJDIK J,HANBURY A.Affective image classificationusing features inspired by psychology and art theory[C]//Proceedings of the 18th ACM International Conference on Multimedia.New York,NY,USA,2010:83-92.
[28]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Net-works for Large-Scale Image Recognition [J].arXiv:1409.1556,2014.
[29]SONG K K,YAO T,LING Q,et al.Boosting Image Sentiment Analysis with Visual Attention [J].Neurocomputing,2018,312(27):218-228.
[30]CAI G Y,CHU Y Y.Visual SentimentAnalysis Based on Multi-level Features Fusion of Dual Attention [J].Computer Engineering,2021,47(9):227-234.
[31]HU A,FLAXMAN S.Multimodal Sentiment Analysis To Explore the Structure of Emotions[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Disco-very & Data Mining.2018:350-358.
[32]GUO K X,ZHANG Y X.Visual-textual sentiment analysis method with multi-level spatial attention [J].Journal of Computer Applications,2021,41(10):2835-2841.
[1] 李帅, 徐彬, 韩祎珂, 廖同鑫.
SS-GCN:情感增强和句法增强的方面级情感分析模型
SS-GCN:Aspect-based Sentiment Analysis Model with Affective Enhancement and Syntactic Enhancement
计算机科学, 2023, 50(3): 3-11. https://doi.org/10.11896/jsjkx.220700238
[2] 汪璟玢, 赖晓连, 林新宇, 杨心逸.
基于关系约束的上下文感知时态知识图谱补全
Context-aware Temporal Knowledge Graph Completion Based on Relation Constraints
计算机科学, 2023, 50(3): 23-33. https://doi.org/10.11896/jsjkx.220400255
[3] 陈富强, 寇嘉敏, 苏利敏, 李克.
基于图神经网络的多信息优化实体对齐模型
Multi-information Optimized Entity Alignment Model Based on Graph Neural Network
计算机科学, 2023, 50(3): 34-41. https://doi.org/10.11896/jsjkx.220700242
[4] 邓亮, 齐攀虎, 刘振龙, 李敬鑫, 唐积强.
BGPNRE:一种基于BERT的全局指针网络实体关系联合抽取方法
BGPNRE:A BERT-based Global Pointer Network for Named Entity-Relation Joint Extraction Method
计算机科学, 2023, 50(3): 42-48. https://doi.org/10.11896/jsjkx.220600239
[5] 李志飞, 赵月, 张龑.
基于表示学习的知识图谱推理研究综述
Survey of Knowledge Graph Reasoning Based on Representation Learning
计算机科学, 2023, 50(3): 94-113. https://doi.org/10.11896/jsjkx.220900136
[6] 饶丹, 时宏伟.
基于深度聚类的航空交通流识别与异常检测研究
Study on Air Traffic Flow Recognition and Anomaly Detection Based on Deep Clustering
计算机科学, 2023, 50(3): 121-128. https://doi.org/10.11896/jsjkx.220100086
[7] 段顺然, 尹美娟, 刘粉林, 焦隆隆, 于岚岚.
一种基于影响力预测的节点排序模型
Nodes’ Ranking Model Based on Influence Prediction
计算机科学, 2023, 50(3): 155-163. https://doi.org/10.11896/jsjkx.211200261
[8] 董永峰, 黄港, 薛婉若, 李林昊.
融合IRT的图注意力深度知识追踪模型
Graph Attention Deep Knowledge Tracing Model Integrated with IRT
计算机科学, 2023, 50(3): 173-180. https://doi.org/10.11896/jsjkx.211200134
[9] 梅鹏程, 杨吉斌, 张强, 黄翔.
一种基于三维卷积的声学事件联合估计方法
Sound Event Joint Estimation Method Based on Three-dimension Convolution
计算机科学, 2023, 50(3): 191-198. https://doi.org/10.11896/jsjkx.220500259
[10] 白雪飞, 马亚楠, 王文剑.
基于特征融合的边缘引导乳腺超声图像分割方法
Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion
计算机科学, 2023, 50(3): 199-207. https://doi.org/10.11896/jsjkx.211200294
[11] 刘航, 普园媛, 吕大华, 赵征鹏, 徐丹, 钱文华.
极化自注意力约束颜色溢出的图像自动上色
Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image
计算机科学, 2023, 50(3): 208-215. https://doi.org/10.11896/jsjkx.220100149
[12] 刘松岳, 王欢.
基于多粒度特征融合的叶片分类与分级方法
Leaf Classification and Ranking Method Based on Multi-granularity Feature Fusion
计算机科学, 2023, 50(3): 216-222. https://doi.org/10.11896/jsjkx.211100203
[13] 张卫良, 陈秀宏.
跨层融合和感受野扩增的SSD目标检测算法
SSD Object Detection Algorithm with Cross-layer Fusion and Receptive Field Amplification
计算机科学, 2023, 50(3): 231-237. https://doi.org/10.11896/jsjkx.211100281
[14] 陈亮, 王璐, 李生春, 刘昌宏.
基于深度学习的可视化仪表板生成技术研究
Study on Visual Dashboard Generation Technology Based on Deep Learning
计算机科学, 2023, 50(3): 238-245. https://doi.org/10.11896/jsjkx.230100064
[15] 张译, 吴秦.
特征增强损失与前景注意力人群计数网络
Crowd Counting Network Based on Feature Enhancement Loss and Foreground Attention
计算机科学, 2023, 50(3): 246-253. https://doi.org/10.11896/jsjkx.220100219
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!