Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 250200022-9.doi: 10.11896/jsjkx.250200022

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Multimodal Sentiment Analysis Based on Dominant Attention and Multi-space Domain Information Collaboration

FENG Guang1, LIN Yibao2, ZHONG Ting1, ZHENG Runting2, HUANG Junhui2, LIU Tianxiang2, YANG Yanru2   

  1. 1 School of Automation,Guangdong University of Technology,Guangzhou 510006,China
    2 School of Computer Science,Guangdong University of Technology,Guangzhou 510006,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    Key Program of the National Natural Science Foundation of China(62237001) and Youth Project of Guangdong Provincial Philosophy and Social Sciences(GD23YJY08).

Abstract: Multimodal sentiment analysis has significant applications in smart education,such as assessing students’ engagement and emotional states through speech,facial expressions,and tone to help teachers adjust teaching strategies in real time.How-ever,existing cross-modal attention mechanisms struggle to capture associations between heterogeneous modalities effectively,and the collaboration between shared and private spaces remains underexplored,limiting multimodal fusion learning.To address these issues,this paper proposes a multimodal sentiment analysis model that integrates heterogeneous modalities across multiple space domains using dominant attention.This mechanism enables effective fusion of heterogeneous modalities in both domains,enhancing cross-modal learning.Additionally,a gating mechanism preserves the modality independence of shared-space fusion vectors,ensuring complementary interactions between private and shared spaces.Experimental results on the MOSI and MOSEI datasets demonstrate that the proposed model achieves overall performance improvements,validating its ability to capture and integrate heterogeneous multimodal information effectively.

Key words: Multimodal sentiment analysis, Dominant attention, Multi-space domain, Gating mechanism, Smart education

CLC Number: 

  • TP391.1
[1]MIAO Y Q,YANG S,LIU T L,et al.Multimodal SentimentAnalysis based on Cross-Modal Gating Mechanism and Improved Fusion method[J].Computer Applications and Research,2023,40(7):2025-2030,2038.
[2]MORENCY L P,MIHALCEA R,DOSHI P.Towards multimodal sentiment analysis:Harvesting opinions from the web [C]//Proceedings of the 13th International Conference on Multimodal Interfaces.Alicante,Spain:ACM,2011:169-176.
[3]LUO Y Y,WU R,LIU J E,et al.Multimodal Sentiment Analysis Method Based on Adaptive Weight Fusion[J].Journal of Software,2024,35(10):4781-4793.
[4]TSAI Y H,BAI J,LIANG P P,et al.Multimodal transformerfor unaligned multimodal language sequences [C]//Proceedings of the Conference.Association for Computational Linguistics.Meeting.Florence,Italy:Association for Computational Linguistics,2019.
[5]HAZARIKA D,ZIMMERMANN R,PORIA S.MISA:Modali-ty-Invariant and-Specific Representations for Multimodal Sentiment Analysis [C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle,USA:ACM,2020:1122-1131.
[6]ZHONG T,FENG G,LIN J Z,et al.Sentiment Analysis Aimed at Multi-Source Information Clustering and Private Feature Learning [J].Computer Engineering and Applications,Early Access,2015:1-12.
[7]PORIA S,CHATURVEDI I,CAMBRIA E,et al.MKL BasedMultimodal Emotion Recognition and Sentiment Analysis [C]//IEEE 16th International Conference on Data Mining(ICDM 2016).San Diego,USA:IEEE,2016:439-448.
[8]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis [J].arXiv:1707.07250,2017.
[9]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient low-rank multimodal fusion with modality-specific factors [J].arXiv:1806.00064,2018.
[10]WANG Y S,SHEN Y,LIU Z,et al.Words can shift:Dynamically adjusting word representations using nonverbal behaviors [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7216-7223.
[11]PUTRA B,AZIZAH K,MAWALIMC O,et al.MAG-BERT-ARL for Fair Automated Video Interview Assessment [C]//IEEE Access.2024.
[12]YU W M,XU H,YUAN Z Q,et al.Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10790-10797.
[13]LIU Y H.Roberta:A robustly optimized bert pretraining approach [J].arXiv:1907.11692,2019,364.
[14]EYBEN F,WÖLLMER M,SCHULLER B,et al.Opensmile:the Munich versatile and fast open-source audio feature extractor [C]//Proceedings of the 18th ACM International Conference on Multimedia.2010:1459-1462.
[15]DEGOTTEX G,KANE J,DRUGMAN T,et al.COVAREP-A Collaborative Voice Analysis Repository for Speech Technologies [C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2014).Florence,Italy:IEEE,2014:960-964.
[16]GUSTAFSON,L,ROLLAND,C,RAVI,N,et al.FACET:Fairness in Computer Vision Evaluation Benchmark [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Paris,France:IEEE,2023:20370-20382.
[17]AMOS B,LUDWICZUK B,SATYANARAYANAN M.OpenFace:A General-Purpose Face Recognition Library with Mobile Applications [J].CMU School of Computer Science,2016,6(2):20.
[18]YONG L,WANG Y Z,CUI Z.Decoupled Multimodal Distilling for Emotion Recognition [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver,Canada:IEEE,2023:6631-6640.
[19]HAN D C,YE T Z,HAN Y Z,et al.Agent Attention:On the Integration of Softmax and Linear Attention [C]//European Conference on Computer Vision.Cham,Switzerland:Springer,2025:124-140.
[20]ZHOU Y,YU J,FAN J P,et al.Multi-Modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering [C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:1821-1830.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Advances in Neural Information Processing Systems(NeurIPS).Red Hook,NY,USA:Curran Associates,Inc.,2017:5998-6008.
[22]HAO S,WANG H Y,LIU J Q,et al.CubeMLP:An MLP-Based Model for Multimodal Sentiment Analysis and Depression Estimation [C]//Proceedings of the 30th ACM International Conference on Multimedia.Lisbon,Portugal:ACM,2022:3722-3729.
[23]CHEN W Z,HOU Y.A Temporal Multimodal Sentiment Analysis Model Fusing Multi-Level Attention and Sentiment Scale Vectors [J].Data Analysis and Knowledge Discovery,2025,9(3):1-18.
[24]LUO Y Y,WU R,LIU J F,et al.Multimodal Sentiment Analysis Method for Emotion-Semantic Inconsistency [J].Computer Research and Development,2025,62(2):374-382.
[25]ZELLINGER W,GRUBINGER T,LUGHOFER E,et al.Cen-tral Moment Discrepancy(CMD) for Domain-Invariant Representation Learning [J].arXiv:1702.08811,2017.
[26]ZMIAO Y Q,YANG S,LIU T L,et al.Multimodal SentimentAnalysis Based on Cross-modal Gating Mechanism and Improved Fusion Method[J].Application Research of Computers,2023,40(7):2025-2030,2038.
[27]ZADEH A, LIANG P P, PORIA S,et al.Memory Fusion Network for Multiview Sequential Learning [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:5634-5641.
[28]DI W,GUO X T,TIAN Y M,et al.TETFN:A Text Enhanced Transformer Fusion Network for Multimodal Sentiment Analysis [J].Pattern Recognition,2023,136:109259.
[29]HUANG J H,ZHOU J,TANG Z C,et al.TMBL:Transformer-Based Multimodal Binding Learning Model for Multimodal Sentiment Analysis [J].Knowledge-Based Systems,2024,285:111346.
[1] JIANG Kun, ZHAO Zhengpeng, PU Yuanyuan, HUANG Jian, GU Jinjing, XU Dan. Cross-modal Hypergraph Optimisation Learning for Multimodal Sentiment Analysis [J]. Computer Science, 2025, 52(7): 210-217.
[2] WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[3] JIANG Wenwen, XIA Ying. Improved U-Net Multi-scale Feature Fusion Semantic Segmentation Network for RemoteSensing Images [J]. Computer Science, 2025, 52(5): 212-219.
[4] YAN Jingtao, LI Yang, WANG Suge, PAN Bangze. Overlap Event Extraction Method with Language Granularity Fusion Based on Joint Learning [J]. Computer Science, 2024, 51(7): 287-295.
[5] YIN Baosheng, KONG Weiyi. Electra Based Chinese Event Detection Model with Dependency Syntax Tree [J]. Computer Science, 2024, 51(6A): 230600158-6.
[6] HE Xiaohui, ZHOU Tao, LI Panle, CHANG Jing, LI Jiamian. Study on Building Extraction from Remote Sensing Image Based on Multi-scale Attention [J]. Computer Science, 2024, 51(5): 134-142.
[7] ZHAO Yufei, JIN Cong, LIU Xiaoyu, WANG Jie, ZHU Yonggui, LI Bo. Robot Performance Teaching Demonstration System Based on Imitation Learning [J]. Computer Science, 2024, 51(11A): 240300063-5.
[8] ZHOU Yangtao, CHU Hua, ZHU Feifei, LI Xiangming, HAN Zihan, ZHANG Shuai. Survey on Deep Learning-based Personalized Learning Resource Recommendation [J]. Computer Science, 2024, 51(10): 17-32.
[9] WANG Wei, DU Xiangcheng, JIN Cheng. Image Relighting Network Based on Context-gated Residuals and Multi-scale Attention [J]. Computer Science, 2023, 50(9): 168-175.
[10] CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua. Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion [J]. Computer Science, 2023, 50(3): 298-306.
[11] SHEN Hao-xi, NIU Bao-ning. Gating Mechanism for Real-time Network I/O Requests Based on Para-virtualization Virtio Framework [J]. Computer Science, 2022, 49(2): 368-376.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!