基于跨模态单向加权的多模态情感分析模型

doi:10.11896/jsjkx.240600066

Abstract

Abstract: Most multimodal sentiment analysis models utilize cross-modal attention mechanism to handle multimodal features.These approaches are prone to not only overlook the unique and effective information within each modality,but also suffer from the interference of redundant information shared across modalities,resulting in decreasing classification accuracy.To address this issue,this paper proposes a multimodal sentiment analysis model based on cross-modal unidirectional weighting.This model leverages a unidirectional weighting module to extract both shared and unique information within different modalities,and uses si-milar structure to interact between multimodal data.To prevent excessive extraction of repetitive information,it employs a KL divergence loss function for contrastive learning of identical modality information.Additionally,it introduces a gated temporal convolutional network with filtering function to extract features from unimodal data,thereby enhancing the expressive power of unimodal feature information.Evaluation on two public datasets,CMU-MOSI and CMU-MOSEI,against 13 baseline models show significant advantages in terms of classification accuracy,F1 score,and other metrics,validating the effectiveness of the proposed method.

Key words: Multimodal sentiment analysis, Transformer model, Unidirectional weighting, Attention mechanism, Kullback-Leibler divergence

CLC Number:

TP391.1

WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting[J].Computer Science, 2025, 52(7): 226-232.

References

[1]ANKITA G,KINJAL A,SOUJANYA P,et al.Multimodal sentiment analysis:A systematic review of history,datasets,multimodal fusion methods applications,challenges and future directions[J].Information Fusion,2023,91:424-444.
[2]HAN W,CHEN H,PORIA S.Improving multimodal fusionwith hierarchical mutual information maximization for multimodal sentiment analysis[C]//Proceedings of EMNLP.ACL,2021:9180-9192.
[3]PORIA S,CAMBRIA E,BAJPAI R,et al.A review of affective computing:From unimodal analysis to multimodal fusion[J].Information Fusion,2017,37:98-125.
[4]FU Z,LIU F,XU Q,et al.NHFNET:a non-homogeneousfusion network for multimodal sentiment analysis[C]//2022 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2022:1-6.
[5]LINAN Z,ZHEHAO Z,CHENWEI Z,et al.Multimodal sentiment analysis based on fusion methods:A survey[J].Information Fusion,2023,95:306-325.
[6]GKOUMAS D,LI Q,LIOMA C,et al.What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J].Information Fusion,2021,66:184-197.
[7]LIANG P P,LIU Z,ZADEH A,et al.Multimodal language analysis with recurrent multistage fusion[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.ACL,2018:150-161.
[8]ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis[C]//Conference on Empirical Methods in Natural Language Processing.ACL,2017:1103-1114.
[9]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Efficient low-rank multimodal fusion with modality-specific factors[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.ACL,2018:2247-2256.
[10]ZADEH A,LIANG P P,MAZUMDER N,et al.Memory fusion network for multi-view sequential learning[C]//Proceedings of the Thirty-Second 48AAAI Conference on Artificial Intelligence.AAAI,2018:5634-5641.
[11]YU W,XU H,YUAN Z,et al.Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.AAAI,2021:10790-10797.
[12]YANG J,YU Y,NIU D,et al.Confede:Contrastive feature decomposition for multimodal sentiment analysis[C]//Proceedings of the 61st Annual Meeting of the Association for Computa-tional Linguistics.ACL,2023:7617-7630.
[13]HUANG J,PU Y,ZHOU D,et al.Dynamic hypergraph convolutional network for multimodal sentiment analysis[J].Neurocomputing,2024,565:126992.
[14]RAHMAN W,HASAN M K,LEE S,et al.Integrating multimodal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.ACL,2020:2359-2369.
[15]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2019:4171-4186.
[16]TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computa-tional Linguistics.ACL,2019:6558-6569.
[17]ZADEH A,LIANG P P,PORIA S,et al.Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2018:5642-5649.
[18]WANG Y,SHEN Y,LIU Z,et al.Words can shift:Dynamically adjusting word representations using nonverbal behaviors[C]//Proceedings of the AAAI Conference on Artificial Intelligence.AAAI,2019:7216-7223.
[19]GHOSAL D,AKHTAR M S,CHAUHAN D,et al.Contextual inter-modal attention for multi-modal sentiment analysis[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.ACL,2018:3454-3466.
[20]CHENG H,YANG Z,ZHANG X,et al.Multimodal Sentiment Analysis Based on Attentional Temporal Convolutional Network and Multi-layer Feature Fusion[J].IEEE Transactions on Affective Computing,2023,14(4):3149-3163.
[21]FU Y,ZHANG Z,YANG R,et al.Hybrid cross-modal interaction learning for multimodal sentiment analysis[J].Neurocomputing,2024,571:127201.
[22]SUN L,LIAN Z,LIU B,et al.Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis[J].IEEE Transactions on Affective Computing,2023,15(1):309-325.
[23]ZADEH A,ZELLERS R,PINCUS E,et al.Multimodal sentiment intensity analysis in videos:Facial gestures and verbal messages[J].IEEE Intelligent Systems,2016,31(6):82-88.
[24]ZADEH A A B,LIANG P P,PORIA S,et al.Multimodal language analysis in the wild:Cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.ACL,2018:2236-2246.

Related Articles 15

[1]	LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[2]	ZHUANG Jianjun, WAN Li. SCF U²-Net:Lightweight U²-Net Improved Method for Breast Ultrasound Lesion SegmentationCombined with Fuzzy Logic [J]. Computer Science, 2025, 52(7): 161-169.
[3]	JIANG Kun, ZHAO Zhengpeng, PU Yuanyuan, HUANG Jian, GU Jinjing, XU Dan. Cross-modal Hypergraph Optimisation Learning for Multimodal Sentiment Analysis [J]. Computer Science, 2025, 52(7): 210-217.
[4]	ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[5]	KONG Yinling, WANG Zhongqing, WANG Hongling. Study on Opinion Summarization Incorporating Evaluation Object Information [J]. Computer Science, 2025, 52(7): 233-240.
[6]	LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[7]	GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
[8]	TAN Jiahui, WEN Chenyan, HUANG Wei, HU Kai. CT Image Segmentation of Intracranial Hemorrhage Based on ESC-TransUNet Network [J]. Computer Science, 2025, 52(6A): 240700030-9.
[9]	CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[10]	ZHENG Chuangrui, DENG Xiuqin, CHEN Lei. Traffic Prediction Model Based on Decoupled Adaptive Dynamic Graph Convolution [J]. Computer Science, 2025, 52(6A): 240400149-8.
[11]	HONG Yi, SHEN Shikai, SHE Yumei, YANG Bin, DAI Fei, WANG Jianxiao, ZHANG Liyi. Multivariate Time Series Prediction Based on Dynamic Graph Learning and Attention Mechanism [J]. Computer Science, 2025, 52(6A): 240700047-8.
[12]	TENG Minjun, SUN Tengzhong, LI Yanchen, CHEN Yuan, SONG Mofei. Internet Application User Profiling Analysis Based on Selection State Space Graph Neural Network [J]. Computer Science, 2025, 52(6A): 240900060-8.
[13]	ZHAO Chanchan, YANG Xingchen, SHI Bao, LYU Fei, LIU Libin. Optimization Strategy of Task Offloading Based on Meta Reinforcement Learning [J]. Computer Science, 2025, 52(6A): 240800050-8.
[14]	LI Daicheng, LI Han, LIU Zheyu, GONG Shiheng. MacBERT Based Chinese Named Entity Recognition Fusion with Dependent Syntactic Information and Multi-view Lexical Information [J]. Computer Science, 2025, 52(6A): 240600121-8.
[15]	HUANG Bocheng, WANG Xiaolong, AN Guocheng, ZHANG Tao. Transmission Line Fault Identification Method Based on Transfer Learning and Improved YOLOv8s [J]. Computer Science, 2025, 52(6A): 240800044-8.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0