计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200022-9.doi: 10.11896/jsjkx.250200022

• 计算机图形学&多媒体 • 上一篇    下一篇

基于主体注意力与多空间域信息协同的多模态情感分析

冯广1, 林忆宝2, 钟婷1, 郑润庭2, 黄俊辉2, 刘天翔2, 杨燕茹2   

  1. 1 广东工业大学自动化学院 广州 510006
    2 广东工业大学计算机学院 广州 510006
  • 出版日期:2025-11-15 发布日期:2025-11-10
  • 通讯作者: 冯广(von@gdut.edu.cn)
  • 基金资助:
    国家自然科学基金重点项目(62237001);广东省哲学社会科学青年项目(GD23YJY08)

Multimodal Sentiment Analysis Based on Dominant Attention and Multi-space Domain Information Collaboration

FENG Guang1, LIN Yibao2, ZHONG Ting1, ZHENG Runting2, HUANG Junhui2, LIU Tianxiang2, YANG Yanru2   

  1. 1 School of Automation,Guangdong University of Technology,Guangzhou 510006,China
    2 School of Computer Science,Guangdong University of Technology,Guangzhou 510006,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    Key Program of the National Natural Science Foundation of China(62237001) and Youth Project of Guangdong Provincial Philosophy and Social Sciences(GD23YJY08).

摘要: 多模态情感分析在智慧教育中具有重要应用价值,例如通过分析学生的语言、表情和语调等多模态信息,来评估课堂参与度和情感状态,从而辅助教师实时调整教学策略。然而,现有多模态情感分析领域中,跨模态注意力机制对于异构模态间的关联捕捉不够充分,并且对共享空间与私有空间的信息协同并未进行深入探索,存在跨模态融合学习受限且多空间域信息协同不充分的问题。针对这些问题,文中提出了基于主体注意力融合多空间域异构模态的多模态情感分析模型,该模型通过主体注意力机制,对两个空间域中的异构模态分别进行充分融合,以解决跨模态融合学习受限的问题。然后,利用门控机制补充共享空间域异构模态融合向量的模态独立性,以实现私有空间与共享空间信息的协同,有效解决多空间域信息协同不充分的问题。实验结果表明,该模型在公共数据集MOSI和MOSEI上的得分整体都有提高,说明该方法可以充分捕捉多模态异构信息间的潜在关系并有效协同不同空间域的异构融合信息。

关键词: 多模态情感分析, 主体注意力, 多空间域, 门控机制, 智慧教育

Abstract: Multimodal sentiment analysis has significant applications in smart education,such as assessing students’ engagement and emotional states through speech,facial expressions,and tone to help teachers adjust teaching strategies in real time.How-ever,existing cross-modal attention mechanisms struggle to capture associations between heterogeneous modalities effectively,and the collaboration between shared and private spaces remains underexplored,limiting multimodal fusion learning.To address these issues,this paper proposes a multimodal sentiment analysis model that integrates heterogeneous modalities across multiple space domains using dominant attention.This mechanism enables effective fusion of heterogeneous modalities in both domains,enhancing cross-modal learning.Additionally,a gating mechanism preserves the modality independence of shared-space fusion vectors,ensuring complementary interactions between private and shared spaces.Experimental results on the MOSI and MOSEI datasets demonstrate that the proposed model achieves overall performance improvements,validating its ability to capture and integrate heterogeneous multimodal information effectively.

Key words: Multimodal sentiment analysis, Dominant attention, Multi-space domain, Gating mechanism, Smart education

中图分类号: 

  • TP391.1
[1]MIAO Y Q,YANG S,LIU T L,et al.Multimodal SentimentAnalysis based on Cross-Modal Gating Mechanism and Improved Fusion method[J].Computer Applications and Research,2023,40(7):2025-2030,2038.
[2]MORENCY L P,MIHALCEA R,DOSHI P.Towards multimodal sentiment analysis:Harvesting opinions from the web [C]//Proceedings of the 13th International Conference on Multimodal Interfaces.Alicante,Spain:ACM,2011:169-176.
[3]LUO Y Y,WU R,LIU J E,et al.Multimodal Sentiment Analysis Method Based on Adaptive Weight Fusion[J].Journal of Software,2024,35(10):4781-4793.
[4]TSAI Y H,BAI J,LIANG P P,et al.Multimodal transformerfor unaligned multimodal language sequences [C]//Proceedings of the Conference.Association for Computational Linguistics.Meeting.Florence,Italy:Association for Computational Linguistics,2019.
[5]HAZARIKA D,ZIMMERMANN R,PORIA S.MISA:Modali-ty-Invariant and-Specific Representations for Multimodal Sentiment Analysis [C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle,USA:ACM,2020:1122-1131.
[6]ZHONG T,FENG G,LIN J Z,et al.Sentiment Analysis Aimed at Multi-Source Information Clustering and Private Feature Learning [J].Computer Engineering and Applications,Early Access,2015:1-12.
[7]PORIA S,CHATURVEDI I,CAMBRIA E,et al.MKL BasedMultimodal Emotion Recognition and Sentiment Analysis [C]//IEEE 16th International Conference on Data Mining(ICDM 2016).San Diego,USA:IEEE,2016:439-448.
[8]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis [J].arXiv:1707.07250,2017.
[9]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient low-rank multimodal fusion with modality-specific factors [J].arXiv:1806.00064,2018.
[10]WANG Y S,SHEN Y,LIU Z,et al.Words can shift:Dynamically adjusting word representations using nonverbal behaviors [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7216-7223.
[11]PUTRA B,AZIZAH K,MAWALIMC O,et al.MAG-BERT-ARL for Fair Automated Video Interview Assessment [C]//IEEE Access.2024.
[12]YU W M,XU H,YUAN Z Q,et al.Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10790-10797.
[13]LIU Y H.Roberta:A robustly optimized bert pretraining approach [J].arXiv:1907.11692,2019,364.
[14]EYBEN F,WÖLLMER M,SCHULLER B,et al.Opensmile:the Munich versatile and fast open-source audio feature extractor [C]//Proceedings of the 18th ACM International Conference on Multimedia.2010:1459-1462.
[15]DEGOTTEX G,KANE J,DRUGMAN T,et al.COVAREP-A Collaborative Voice Analysis Repository for Speech Technologies [C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2014).Florence,Italy:IEEE,2014:960-964.
[16]GUSTAFSON,L,ROLLAND,C,RAVI,N,et al.FACET:Fairness in Computer Vision Evaluation Benchmark [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Paris,France:IEEE,2023:20370-20382.
[17]AMOS B,LUDWICZUK B,SATYANARAYANAN M.OpenFace:A General-Purpose Face Recognition Library with Mobile Applications [J].CMU School of Computer Science,2016,6(2):20.
[18]YONG L,WANG Y Z,CUI Z.Decoupled Multimodal Distilling for Emotion Recognition [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver,Canada:IEEE,2023:6631-6640.
[19]HAN D C,YE T Z,HAN Y Z,et al.Agent Attention:On the Integration of Softmax and Linear Attention [C]//European Conference on Computer Vision.Cham,Switzerland:Springer,2025:124-140.
[20]ZHOU Y,YU J,FAN J P,et al.Multi-Modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering [C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:1821-1830.
[21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Advances in Neural Information Processing Systems(NeurIPS).Red Hook,NY,USA:Curran Associates,Inc.,2017:5998-6008.
[22]HAO S,WANG H Y,LIU J Q,et al.CubeMLP:An MLP-Based Model for Multimodal Sentiment Analysis and Depression Estimation [C]//Proceedings of the 30th ACM International Conference on Multimedia.Lisbon,Portugal:ACM,2022:3722-3729.
[23]CHEN W Z,HOU Y.A Temporal Multimodal Sentiment Analysis Model Fusing Multi-Level Attention and Sentiment Scale Vectors [J].Data Analysis and Knowledge Discovery,2025,9(3):1-18.
[24]LUO Y Y,WU R,LIU J F,et al.Multimodal Sentiment Analysis Method for Emotion-Semantic Inconsistency [J].Computer Research and Development,2025,62(2):374-382.
[25]ZELLINGER W,GRUBINGER T,LUGHOFER E,et al.Cen-tral Moment Discrepancy(CMD) for Domain-Invariant Representation Learning [J].arXiv:1702.08811,2017.
[26]ZMIAO Y Q,YANG S,LIU T L,et al.Multimodal SentimentAnalysis Based on Cross-modal Gating Mechanism and Improved Fusion Method[J].Application Research of Computers,2023,40(7):2025-2030,2038.
[27]ZADEH A, LIANG P P, PORIA S,et al.Memory Fusion Network for Multiview Sequential Learning [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:5634-5641.
[28]DI W,GUO X T,TIAN Y M,et al.TETFN:A Text Enhanced Transformer Fusion Network for Multimodal Sentiment Analysis [J].Pattern Recognition,2023,136:109259.
[29]HUANG J H,ZHOU J,TANG Z C,et al.TMBL:Transformer-Based Multimodal Binding Learning Model for Multimodal Sentiment Analysis [J].Knowledge-Based Systems,2024,285:111346.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!