基于主体注意力与多空间域信息协同的多模态情感分析

doi:10.11896/jsjkx.250200022

Abstract

Abstract: Multimodal sentiment analysis has significant applications in smart education,such as assessing students’ engagement and emotional states through speech,facial expressions,and tone to help teachers adjust teaching strategies in real time.How-ever,existing cross-modal attention mechanisms struggle to capture associations between heterogeneous modalities effectively,and the collaboration between shared and private spaces remains underexplored,limiting multimodal fusion learning.To address these issues,this paper proposes a multimodal sentiment analysis model that integrates heterogeneous modalities across multiple space domains using dominant attention.This mechanism enables effective fusion of heterogeneous modalities in both domains,enhancing cross-modal learning.Additionally,a gating mechanism preserves the modality independence of shared-space fusion vectors,ensuring complementary interactions between private and shared spaces.Experimental results on the MOSI and MOSEI datasets demonstrate that the proposed model achieves overall performance improvements,validating its ability to capture and integrate heterogeneous multimodal information effectively.

Key words: Multimodal sentiment analysis, Dominant attention, Multi-space domain, Gating mechanism, Smart education

CLC Number:

TP391.1

FENG Guang, LIN Yibao, ZHONG Ting, ZHENG Runting, HUANG Junhui, LIU Tianxiang, YANG Yanru. Multimodal Sentiment Analysis Based on Dominant Attention and Multi-space Domain Information Collaboration[J].Computer Science, 2025, 52(11A): 250200022-9.

References

[1]MIAO Y Q,YANG S,LIU T L,et al.Multimodal SentimentAnalysis based on Cross-Modal Gating Mechanism and Improved Fusion method[J].Computer Applications and Research,2023,40(7):2025-2030,2038.
[2]MORENCY L P,MIHALCEA R,DOSHI P.Towards multimodal sentiment analysis:Harvesting opinions from the web [C]//Proceedings of the 13th International Conference on Multimodal Interfaces.Alicante,Spain:ACM,2011:169-176.
[3]LUO Y Y,WU R,LIU J E,et al.Multimodal Sentiment Analysis Method Based on Adaptive Weight Fusion[J].Journal of Software,2024,35(10):4781-4793.
[4]TSAI Y H,BAI J,LIANG P P,et al.Multimodal transformerfor unaligned multimodal language sequences [C]//Proceedings of the Conference.Association for Computational Linguistics.Meeting.Florence,Italy:Association for Computational Linguistics,2019.
[5]HAZARIKA D,ZIMMERMANN R,PORIA S.MISA:Modali-ty-Invariant and-Specific Representations for Multimodal Sentiment Analysis [C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle,USA:ACM,2020:1122-1131.
[6]ZHONG T,FENG G,LIN J Z,et al.Sentiment Analysis Aimed at Multi-Source Information Clustering and Private Feature Learning [J].Computer Engineering and Applications,Early Access,2015:1-12.
[7]PORIA S,CHATURVEDI I,CAMBRIA E,et al.MKL BasedMultimodal Emotion Recognition and Sentiment Analysis [C]//IEEE 16th International Conference on Data Mining(ICDM 2016).San Diego,USA:IEEE,2016:439-448.
[8]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis [J].arXiv:1707.07250,2017.
[9]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient low-rank multimodal fusion with modality-specific factors [J].arXiv:1806.00064,2018.
[10]WANG Y S,SHEN Y,LIU Z,et al.Words can shift:Dynamically adjusting word representations using nonverbal behaviors [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7216-7223.
[11]PUTRA B,AZIZAH K,MAWALIMC O,et al.MAG-BERT-ARL for Fair Automated Video Interview Assessment [C]//IEEE Access.2024.
[12]YU W M,XU H,YUAN Z Q,et al.Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10790-10797.
[13]LIU Y H.Roberta:A robustly optimized bert pretraining approach [J].arXiv:1907.11692,2019,364.
[14]EYBEN F,WÖLLMER M,SCHULLER B,et al.Opensmile:the Munich versatile and fast open-source audio feature extractor [C]//Proceedings of the 18th ACM International Conference on Multimedia.2010:1459-1462.
[15]DEGOTTEX G,KANE J,DRUGMAN T,et al.COVAREP－A Collaborative Voice Analysis Repository for Speech Technologies [C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2014).Florence,Italy:IEEE,2014:960-964.
[16]GUSTAFSON,L,ROLLAND,C,RAVI,N,et al.FACET:Fairness in Computer Vision Evaluation Benchmark [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Paris,France:IEEE,2023:20370-20382.
[17]AMOS B,LUDWICZUK B,SATYANARAYANAN M.OpenFace:A General-Purpose Face Recognition Library with Mobile Applications [J].CMU School of Computer Science,2016,6(2):20.
[18]YONG L,WANG Y Z,CUI Z.Decoupled Multimodal Distilling for Emotion Recognition [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver,Canada:IEEE,2023:6631-6640.
[19]HAN D C,YE T Z,HAN Y Z,et al.Agent Attention:On the Integration of Softmax and Linear Attention [C]//European Conference on Computer Vision.Cham,Switzerland:Springer,2025:124-140.
[20]ZHOU Y,YU J,FAN J P,et al.Multi-Modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering [C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:1821-1830.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Advances in Neural Information Processing Systems(NeurIPS).Red Hook,NY,USA:Curran Associates,Inc.,2017:5998-6008.
[22]HAO S,WANG H Y,LIU J Q,et al.CubeMLP:An MLP-Based Model for Multimodal Sentiment Analysis and Depression Estimation [C]//Proceedings of the 30th ACM International Conference on Multimedia.Lisbon,Portugal:ACM,2022:3722-3729.
[23]CHEN W Z,HOU Y.A Temporal Multimodal Sentiment Analysis Model Fusing Multi-Level Attention and Sentiment Scale Vectors [J].Data Analysis and Knowledge Discovery,2025,9(3):1-18.
[24]LUO Y Y,WU R,LIU J F,et al.Multimodal Sentiment Analysis Method for Emotion-Semantic Inconsistency [J].Computer Research and Development,2025,62(2):374-382.
[25]ZELLINGER W,GRUBINGER T,LUGHOFER E,et al.Cen-tral Moment Discrepancy(CMD) for Domain-Invariant Representation Learning [J].arXiv:1702.08811,2017.
[26]ZMIAO Y Q,YANG S,LIU T L,et al.Multimodal SentimentAnalysis Based on Cross-modal Gating Mechanism and Improved Fusion Method[J].Application Research of Computers,2023,40(7):2025-2030,2038.
[27]ZADEH A, LIANG P P, PORIA S,et al.Memory Fusion Network for Multiview Sequential Learning [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:5634-5641.
[28]DI W,GUO X T,TIAN Y M,et al.TETFN:A Text Enhanced Transformer Fusion Network for Multimodal Sentiment Analysis [J].Pattern Recognition,2023,136:109259.
[29]HUANG J H,ZHOU J,TANG Z C,et al.TMBL:Transformer-Based Multimodal Binding Learning Model for Multimodal Sentiment Analysis [J].Knowledge-Based Systems,2024,285:111346.

Related Articles 11

[1]	JIANG Kun, ZHAO Zhengpeng, PU Yuanyuan, HUANG Jian, GU Jinjing, XU Dan. Cross-modal Hypergraph Optimisation Learning for Multimodal Sentiment Analysis [J]. Computer Science, 2025, 52(7): 210-217.
[2]	WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[3]	JIANG Wenwen, XIA Ying. Improved U-Net Multi-scale Feature Fusion Semantic Segmentation Network for RemoteSensing Images [J]. Computer Science, 2025, 52(5): 212-219.
[4]	YAN Jingtao, LI Yang, WANG Suge, PAN Bangze. Overlap Event Extraction Method with Language Granularity Fusion Based on Joint Learning [J]. Computer Science, 2024, 51(7): 287-295.
[5]	YIN Baosheng, KONG Weiyi. Electra Based Chinese Event Detection Model with Dependency Syntax Tree [J]. Computer Science, 2024, 51(6A): 230600158-6.
[6]	HE Xiaohui, ZHOU Tao, LI Panle, CHANG Jing, LI Jiamian. Study on Building Extraction from Remote Sensing Image Based on Multi-scale Attention [J]. Computer Science, 2024, 51(5): 134-142.
[7]	ZHAO Yufei, JIN Cong, LIU Xiaoyu, WANG Jie, ZHU Yonggui, LI Bo. Robot Performance Teaching Demonstration System Based on Imitation Learning [J]. Computer Science, 2024, 51(11A): 240300063-5.
[8]	ZHOU Yangtao, CHU Hua, ZHU Feifei, LI Xiangming, HAN Zihan, ZHANG Shuai. Survey on Deep Learning-based Personalized Learning Resource Recommendation [J]. Computer Science, 2024, 51(10): 17-32.
[9]	WANG Wei, DU Xiangcheng, JIN Cheng. Image Relighting Network Based on Context-gated Residuals and Multi-scale Attention [J]. Computer Science, 2023, 50(9): 168-175.
[10]	CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua. Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion [J]. Computer Science, 2023, 50(3): 298-306.
[11]	SHEN Hao-xi, NIU Bao-ning. Gating Mechanism for Real-time Network I/O Requests Based on Para-virtualization Virtio Framework [J]. Computer Science, 2022, 49(2): 368-376.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multimodal Sentiment Analysis Based on Dominant Attention and Multi-space Domain Information Collaboration

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 11

Metrics

Comments

Recommended 0