计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 226-232.doi: 10.11896/jsjkx.240600066
王有康, 程春玲
WANG Youkang, CHENG Chunling
摘要: 多模态情感分析模型大多利用跨模态注意力机制处理多模态特征信息,这类方法容易忽略不同模态内特有的有效信息,且存在多模态共有冗余信息对有效信息提取的干扰,导致模型分类精度降低。为此,提出一种基于跨模态单向加权的多模态情感分析模型。该模型利用单向加权模块完成不同模态内共有信息和特有信息的提取;同时实现多模态数据信息的交互。为了避免在交互过程中提取大量的重复有效信息,使用KL散度损失函数对相同模态信息进行对比学习。此外,提出含有过滤门控的时间卷积网络来完成单模态数据的特征提取,增强单模态特征信息的表达能力。在两个公开数据集CMU-MOSI和CMU-MOSEI上与13个基线模型相比,所提方法在分类精度、F1值等指标上表现出明显优势,验证了其有效性。
中图分类号:
[1]ANKITA G,KINJAL A,SOUJANYA P,et al.Multimodal sentiment analysis:A systematic review of history,datasets,multimodal fusion methods applications,challenges and future directions[J].Information Fusion,2023,91:424-444. [2]HAN W,CHEN H,PORIA S.Improving multimodal fusionwith hierarchical mutual information maximization for multimodal sentiment analysis[C]//Proceedings of EMNLP.ACL,2021:9180-9192. [3]PORIA S,CAMBRIA E,BAJPAI R,et al.A review of affective computing:From unimodal analysis to multimodal fusion[J].Information Fusion,2017,37:98-125. [4]FU Z,LIU F,XU Q,et al.NHFNET:a non-homogeneousfusion network for multimodal sentiment analysis[C]//2022 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2022:1-6. [5]LINAN Z,ZHEHAO Z,CHENWEI Z,et al.Multimodal sentiment analysis based on fusion methods:A survey[J].Information Fusion,2023,95:306-325. [6]GKOUMAS D,LI Q,LIOMA C,et al.What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J].Information Fusion,2021,66:184-197. [7]LIANG P P,LIU Z,ZADEH A,et al.Multimodal language analysis with recurrent multistage fusion[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.ACL,2018:150-161. [8]ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis[C]//Conference on Empirical Methods in Natural Language Processing.ACL,2017:1103-1114. [9]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Efficient low-rank multimodal fusion with modality-specific factors[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.ACL,2018:2247-2256. [10]ZADEH A,LIANG P P,MAZUMDER N,et al.Memory fusion network for multi-view sequential learning[C]//Proceedings of the Thirty-Second 48AAAI Conference on Artificial Intelligence.AAAI,2018:5634-5641. [11]YU W,XU H,YUAN Z,et al.Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.AAAI,2021:10790-10797. [12]YANG J,YU Y,NIU D,et al.Confede:Contrastive feature decomposition for multimodal sentiment analysis[C]//Proceedings of the 61st Annual Meeting of the Association for Computa-tional Linguistics.ACL,2023:7617-7630. [13]HUANG J,PU Y,ZHOU D,et al.Dynamic hypergraph convolutional network for multimodal sentiment analysis[J].Neurocomputing,2024,565:126992. [14]RAHMAN W,HASAN M K,LEE S,et al.Integrating multimodal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.ACL,2020:2359-2369. [15]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2019:4171-4186. [16]TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computa-tional Linguistics.ACL,2019:6558-6569. [17]ZADEH A,LIANG P P,PORIA S,et al.Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2018:5642-5649. [18]WANG Y,SHEN Y,LIU Z,et al.Words can shift:Dynamically adjusting word representations using nonverbal behaviors[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2019:7216-7223. [19]GHOSAL D,AKHTAR M S,CHAUHAN D,et al.Contextual inter-modal attention for multi-modal sentiment analysis[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.ACL,2018:3454-3466. [20]CHENG H,YANG Z,ZHANG X,et al.Multimodal Sentiment Analysis Based on Attentional Temporal Convolutional Network and Multi-layer Feature Fusion[J].IEEE Transactions on Affective Computing,2023,14(4):3149-3163. [21]FU Y,ZHANG Z,YANG R,et al.Hybrid cross-modal interaction learning for multimodal sentiment analysis[J].Neurocomputing,2024,571:127201. [22]SUN L,LIAN Z,LIU B,et al.Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis[J].IEEE Transactions on Affective Computing,2023,15(1):309-325. [23]ZADEH A,ZELLERS R,PINCUS E,et al.Multimodal sentiment intensity analysis in videos:Facial gestures and verbal messages[J].IEEE Intelligent Systems,2016,31(6):82-88. [24]ZADEH A A B,LIANG P P,PORIA S,et al.Multimodal language analysis in the wild:Cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.ACL,2018:2236-2246. |
|