计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 250200022-9.doi: 10.11896/jsjkx.250200022
冯广1, 林忆宝2, 钟婷1, 郑润庭2, 黄俊辉2, 刘天翔2, 杨燕茹2
FENG Guang1, LIN Yibao2, ZHONG Ting1, ZHENG Runting2, HUANG Junhui2, LIU Tianxiang2, YANG Yanru2
摘要: 多模态情感分析在智慧教育中具有重要应用价值,例如通过分析学生的语言、表情和语调等多模态信息,来评估课堂参与度和情感状态,从而辅助教师实时调整教学策略。然而,现有多模态情感分析领域中,跨模态注意力机制对于异构模态间的关联捕捉不够充分,并且对共享空间与私有空间的信息协同并未进行深入探索,存在跨模态融合学习受限且多空间域信息协同不充分的问题。针对这些问题,文中提出了基于主体注意力融合多空间域异构模态的多模态情感分析模型,该模型通过主体注意力机制,对两个空间域中的异构模态分别进行充分融合,以解决跨模态融合学习受限的问题。然后,利用门控机制补充共享空间域异构模态融合向量的模态独立性,以实现私有空间与共享空间信息的协同,有效解决多空间域信息协同不充分的问题。实验结果表明,该模型在公共数据集MOSI和MOSEI上的得分整体都有提高,说明该方法可以充分捕捉多模态异构信息间的潜在关系并有效协同不同空间域的异构融合信息。
中图分类号:
| [1]MIAO Y Q,YANG S,LIU T L,et al.Multimodal SentimentAnalysis based on Cross-Modal Gating Mechanism and Improved Fusion method[J].Computer Applications and Research,2023,40(7):2025-2030,2038. [2]MORENCY L P,MIHALCEA R,DOSHI P.Towards multimodal sentiment analysis:Harvesting opinions from the web [C]//Proceedings of the 13th International Conference on Multimodal Interfaces.Alicante,Spain:ACM,2011:169-176. [3]LUO Y Y,WU R,LIU J E,et al.Multimodal Sentiment Analysis Method Based on Adaptive Weight Fusion[J].Journal of Software,2024,35(10):4781-4793. [4]TSAI Y H,BAI J,LIANG P P,et al.Multimodal transformerfor unaligned multimodal language sequences [C]//Proceedings of the Conference.Association for Computational Linguistics.Meeting.Florence,Italy:Association for Computational Linguistics,2019. [5]HAZARIKA D,ZIMMERMANN R,PORIA S.MISA:Modali-ty-Invariant and-Specific Representations for Multimodal Sentiment Analysis [C]//Proceedings of the 28th ACM International Conference on Multimedia.Seattle,USA:ACM,2020:1122-1131. [6]ZHONG T,FENG G,LIN J Z,et al.Sentiment Analysis Aimed at Multi-Source Information Clustering and Private Feature Learning [J].Computer Engineering and Applications,Early Access,2015:1-12. [7]PORIA S,CHATURVEDI I,CAMBRIA E,et al.MKL BasedMultimodal Emotion Recognition and Sentiment Analysis [C]//IEEE 16th International Conference on Data Mining(ICDM 2016).San Diego,USA:IEEE,2016:439-448. [8]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis [J].arXiv:1707.07250,2017. [9]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient low-rank multimodal fusion with modality-specific factors [J].arXiv:1806.00064,2018. [10]WANG Y S,SHEN Y,LIU Z,et al.Words can shift:Dynamically adjusting word representations using nonverbal behaviors [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:7216-7223. [11]PUTRA B,AZIZAH K,MAWALIMC O,et al.MAG-BERT-ARL for Fair Automated Video Interview Assessment [C]//IEEE Access.2024. [12]YU W M,XU H,YUAN Z Q,et al.Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10790-10797. [13]LIU Y H.Roberta:A robustly optimized bert pretraining approach [J].arXiv:1907.11692,2019,364. [14]EYBEN F,WÖLLMER M,SCHULLER B,et al.Opensmile:the Munich versatile and fast open-source audio feature extractor [C]//Proceedings of the 18th ACM International Conference on Multimedia.2010:1459-1462. [15]DEGOTTEX G,KANE J,DRUGMAN T,et al.COVAREP-A Collaborative Voice Analysis Repository for Speech Technologies [C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2014).Florence,Italy:IEEE,2014:960-964. [16]GUSTAFSON,L,ROLLAND,C,RAVI,N,et al.FACET:Fairness in Computer Vision Evaluation Benchmark [C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.Paris,France:IEEE,2023:20370-20382. [17]AMOS B,LUDWICZUK B,SATYANARAYANAN M.OpenFace:A General-Purpose Face Recognition Library with Mobile Applications [J].CMU School of Computer Science,2016,6(2):20. [18]YONG L,WANG Y Z,CUI Z.Decoupled Multimodal Distilling for Emotion Recognition [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Vancouver,Canada:IEEE,2023:6631-6640. [19]HAN D C,YE T Z,HAN Y Z,et al.Agent Attention:On the Integration of Softmax and Linear Attention [C]//European Conference on Computer Vision.Cham,Switzerland:Springer,2025:124-140. [20]ZHOU Y,YU J,FAN J P,et al.Multi-Modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering [C]//Proceedings of the IEEE International Conference on Computer Vision.Venice,Italy:IEEE,2017:1821-1830. [21]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need [C]//Advances in Neural Information Processing Systems(NeurIPS).Red Hook,NY,USA:Curran Associates,Inc.,2017:5998-6008. [22]HAO S,WANG H Y,LIU J Q,et al.CubeMLP:An MLP-Based Model for Multimodal Sentiment Analysis and Depression Estimation [C]//Proceedings of the 30th ACM International Conference on Multimedia.Lisbon,Portugal:ACM,2022:3722-3729. [23]CHEN W Z,HOU Y.A Temporal Multimodal Sentiment Analysis Model Fusing Multi-Level Attention and Sentiment Scale Vectors [J].Data Analysis and Knowledge Discovery,2025,9(3):1-18. [24]LUO Y Y,WU R,LIU J F,et al.Multimodal Sentiment Analysis Method for Emotion-Semantic Inconsistency [J].Computer Research and Development,2025,62(2):374-382. [25]ZELLINGER W,GRUBINGER T,LUGHOFER E,et al.Cen-tral Moment Discrepancy(CMD) for Domain-Invariant Representation Learning [J].arXiv:1702.08811,2017. [26]ZMIAO Y Q,YANG S,LIU T L,et al.Multimodal SentimentAnalysis Based on Cross-modal Gating Mechanism and Improved Fusion Method[J].Application Research of Computers,2023,40(7):2025-2030,2038. [27]ZADEH A, LIANG P P, PORIA S,et al.Memory Fusion Network for Multiview Sequential Learning [C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:5634-5641. [28]DI W,GUO X T,TIAN Y M,et al.TETFN:A Text Enhanced Transformer Fusion Network for Multimodal Sentiment Analysis [J].Pattern Recognition,2023,136:109259. [29]HUANG J H,ZHOU J,TANG Z C,et al.TMBL:Transformer-Based Multimodal Binding Learning Model for Multimodal Sentiment Analysis [J].Knowledge-Based Systems,2024,285:111346. |
|
||