计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 226-232.doi: 10.11896/jsjkx.240600066

• 人工智能 • 上一篇    下一篇

基于跨模态单向加权的多模态情感分析模型

王有康, 程春玲   

  1. 南京邮电大学计算机学院、软件学院、网络空间安全学院 南京 210023
  • 收稿日期:2024-06-07 修回日期:2024-09-09 发布日期:2025-07-17
  • 通讯作者: 程春玲(chengcl@njupt.edu.cn)
  • 作者简介:(1222045635@njupt.edu.cn)
  • 基金资助:
    国家自然科学基金(61972201)

Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting

WANG Youkang, CHENG Chunling   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2024-06-07 Revised:2024-09-09 Published:2025-07-17
  • About author:WANG Youkang,born in 2000,postgraduate.His main research interests include deep learning and multimodal sentiment analysis.
    CHENG Chunling,born in 1972,professor.Her main research interests include data mining and data management.
  • Supported by:
    National Natural Science Foundation of China(61972201).

摘要: 多模态情感分析模型大多利用跨模态注意力机制处理多模态特征信息,这类方法容易忽略不同模态内特有的有效信息,且存在多模态共有冗余信息对有效信息提取的干扰,导致模型分类精度降低。为此,提出一种基于跨模态单向加权的多模态情感分析模型。该模型利用单向加权模块完成不同模态内共有信息和特有信息的提取;同时实现多模态数据信息的交互。为了避免在交互过程中提取大量的重复有效信息,使用KL散度损失函数对相同模态信息进行对比学习。此外,提出含有过滤门控的时间卷积网络来完成单模态数据的特征提取,增强单模态特征信息的表达能力。在两个公开数据集CMU-MOSI和CMU-MOSEI上与13个基线模型相比,所提方法在分类精度、F1值等指标上表现出明显优势,验证了其有效性。

关键词: 多模态情感分析, Transformer模型, 单向加权, 注意力机制, KL散度

Abstract: Most multimodal sentiment analysis models utilize cross-modal attention mechanism to handle multimodal features.These approaches are prone to not only overlook the unique and effective information within each modality,but also suffer from the interference of redundant information shared across modalities,resulting in decreasing classification accuracy.To address this issue,this paper proposes a multimodal sentiment analysis model based on cross-modal unidirectional weighting.This model leverages a unidirectional weighting module to extract both shared and unique information within different modalities,and uses si-milar structure to interact between multimodal data.To prevent excessive extraction of repetitive information,it employs a KL divergence loss function for contrastive learning of identical modality information.Additionally,it introduces a gated temporal convolutional network with filtering function to extract features from unimodal data,thereby enhancing the expressive power of unimodal feature information.Evaluation on two public datasets,CMU-MOSI and CMU-MOSEI,against 13 baseline models show significant advantages in terms of classification accuracy,F1 score,and other metrics,validating the effectiveness of the proposed method.

Key words: Multimodal sentiment analysis, Transformer model, Unidirectional weighting, Attention mechanism, Kullback-Leibler divergence

中图分类号: 

  • TP391.1
[1]ANKITA G,KINJAL A,SOUJANYA P,et al.Multimodal sentiment analysis:A systematic review of history,datasets,multimodal fusion methods applications,challenges and future directions[J].Information Fusion,2023,91:424-444.
[2]HAN W,CHEN H,PORIA S.Improving multimodal fusionwith hierarchical mutual information maximization for multimodal sentiment analysis[C]//Proceedings of EMNLP.ACL,2021:9180-9192.
[3]PORIA S,CAMBRIA E,BAJPAI R,et al.A review of affective computing:From unimodal analysis to multimodal fusion[J].Information Fusion,2017,37:98-125.
[4]FU Z,LIU F,XU Q,et al.NHFNET:a non-homogeneousfusion network for multimodal sentiment analysis[C]//2022 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2022:1-6.
[5]LINAN Z,ZHEHAO Z,CHENWEI Z,et al.Multimodal sentiment analysis based on fusion methods:A survey[J].Information Fusion,2023,95:306-325.
[6]GKOUMAS D,LI Q,LIOMA C,et al.What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J].Information Fusion,2021,66:184-197.
[7]LIANG P P,LIU Z,ZADEH A,et al.Multimodal language analysis with recurrent multistage fusion[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.ACL,2018:150-161.
[8]ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis[C]//Conference on Empirical Methods in Natural Language Processing.ACL,2017:1103-1114.
[9]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Efficient low-rank multimodal fusion with modality-specific factors[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.ACL,2018:2247-2256.
[10]ZADEH A,LIANG P P,MAZUMDER N,et al.Memory fusion network for multi-view sequential learning[C]//Proceedings of the Thirty-Second 48AAAI Conference on Artificial Intelligence.AAAI,2018:5634-5641.
[11]YU W,XU H,YUAN Z,et al.Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.AAAI,2021:10790-10797.
[12]YANG J,YU Y,NIU D,et al.Confede:Contrastive feature decomposition for multimodal sentiment analysis[C]//Proceedings of the 61st Annual Meeting of the Association for Computa-tional Linguistics.ACL,2023:7617-7630.
[13]HUANG J,PU Y,ZHOU D,et al.Dynamic hypergraph convolutional network for multimodal sentiment analysis[J].Neurocomputing,2024,565:126992.
[14]RAHMAN W,HASAN M K,LEE S,et al.Integrating multimodal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.ACL,2020:2359-2369.
[15]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2019:4171-4186.
[16]TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computa-tional Linguistics.ACL,2019:6558-6569.
[17]ZADEH A,LIANG P P,PORIA S,et al.Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2018:5642-5649.
[18]WANG Y,SHEN Y,LIU Z,et al.Words can shift:Dynamically adjusting word representations using nonverbal behaviors[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2019:7216-7223.
[19]GHOSAL D,AKHTAR M S,CHAUHAN D,et al.Contextual inter-modal attention for multi-modal sentiment analysis[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.ACL,2018:3454-3466.
[20]CHENG H,YANG Z,ZHANG X,et al.Multimodal Sentiment Analysis Based on Attentional Temporal Convolutional Network and Multi-layer Feature Fusion[J].IEEE Transactions on Affective Computing,2023,14(4):3149-3163.
[21]FU Y,ZHANG Z,YANG R,et al.Hybrid cross-modal interaction learning for multimodal sentiment analysis[J].Neurocomputing,2024,571:127201.
[22]SUN L,LIAN Z,LIU B,et al.Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis[J].IEEE Transactions on Affective Computing,2023,15(1):309-325.
[23]ZADEH A,ZELLERS R,PINCUS E,et al.Multimodal sentiment intensity analysis in videos:Facial gestures and verbal messages[J].IEEE Intelligent Systems,2016,31(6):82-88.
[24]ZADEH A A B,LIANG P P,PORIA S,et al.Multimodal language analysis in the wild:Cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.ACL,2018:2236-2246.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!