计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 181-185.doi: 10.11896/jsjkx.181001941
刘启元, 张栋, 吴良庆, 李寿山
LIU Qi-yuan, ZHANG Dong, WU Liang-qing, LI Shou-shan
摘要: 近年来,多模态情感分析成为了越来越受欢迎的热门领域,它将传统的基于文本的情感分析扩展到文本、图像以及声音相结合的多模态分析层面。多模态情感分析通常需要获取单模态内部的信息以及多模态之间的交互信息。为了利用每个模态中语言表达的上下文来帮助获取这两种信息,文中提出了一种基于上下文增强LSTM的多模态情感分析方法。具体而言,首先对于多模态的每种表达,结合上下文特征,分别使用LSTM进行编码,再分别捕捉单模态内部的信息;接着融合这些单模态的独立信息,再使用LSTM获得多模态间的交互信息,从而形成多模态特征表示;最后采用最大池化策略,对多模态表示进行降维,从而构建情感分类器。该方法在MOSI数据集上的ACC值达到75.3%,F1达到了74.9。相比传统的机器学习方法(如SVM),所提方法的ACC值高出8.1%,F1值高出7.3。相比目前较为先进的深度学习方法值,所提方法在ACC值上高出0.9%,F1值上高出1.3,与此同时可训练参数量只有之前方法的1/20,训练速度提高了约10倍。大量的对比实验结果表明,相比传统的多模态情感分类方法,所提方法的性能有显著提升。
中图分类号:
[1]MORENCY L P,MIHALCEA R,DOSHI P.Towards Multimodal Sentiment Analysis:Harvesting Opinions from the Web[C]∥Proceedings of International Conference on Multimodal Interfaces.ACM,2011:169-176. [2]BUSSO C,BULUT M,LEE C C,et al.Iemocap:Interactive emotional dyadic motion capture database [J].Journal of Language Resources and Evaluation,2008,42(4):335-359. [3]PARK S,SHIM H S,CHATTERJEE M,et al.Computationalanalysis of persuasiveness in social multimedia:A novel dataset and multimodal prediction approach[C]∥Proceedings of the 16th International Conference on Multimodal Interaction.New York:ACM,2014:50-57. [4]PORIA S,CAMBRIA E,GELBUKH A F.Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-Level Multimodal Sentiment Analysis[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:ACL,2015:2539-2544. [5]PORIA S,CAMBRIA E,HAZARIKA D,et al.Context-dependent sentiment analysis in user-generated videos[C]∥Procee-dings of the 55th ACL.2017:873-883. [6]PORIA S,CHATURVEDI I,CAMBRIA E,et al.Convolutional MKL based multimodal emotion recognition and sentiment analy-sis[C]∥IEEE 16th ICDM.Piscataway,NJ:IEEE,2016:439-448. [7]NOJAVANASGHARI B,GOPINATH D,KOUSHIK J,et al.Deep Multimodal Fusion for Persuasiveness Prediction[C]∥Proceedings of International Conference on Multimodal Interaction.ACM,2016:284-288. [8]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Proces-sing.Stroudsburg,PA:ACL,2017:1103-1114. [9]DIANE J L,KATE F R.Predicting student emotions in computer-human tutoring dialogues[C]∥Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.Stroudsburg,PA:ACL,2004:351. [10]EYBEN F,WOLLMER M,GRAVES A,et al.On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues[J].Journal on Multimodal User Interfaces,2010,3(1/2):7-19. [11]WOLLMER M,WENINGER F,KNAUP T,et al.Youtubemovie reviews:Sentiment analysis in an audio-visual context[J].IEEE Intelligent Systems,2013,28(3):46-53. [12]WANG H,MEGHAWAT A,MORENCY L P,et al.Select-Additive Learning:Improving Cross-individual Generalization in Multimodal Sentiment Analysis[J].arXiv:1609.05244. [13]GU Y,CHEN S H,MARSIC I.Deep multimodal learning foremotion recognition on spoken language[C]∥2018 IEEE International Conference Proceedings of Speech and Signal Processing (ICASSP).Piscataway,NJ:IEEE,2018. [14]ZADEH A,LANG P P,VANBRIESEN J,et al.MultimodalLanguage Analysis in the Wild:CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph[C]∥Proceedings of the Meeting of the Association for Computational Linguistics.Stroudsburg,PA:ACL,2018:2236-2246. [15]CHEN M,WANG S,LIANG P P,et al.Multimodal Sentiment Analysis with Word-level Fusion and Reinforcement Learning[C]∥Proceedings of International Conference on Multimodal Interaction.ACM,2017:163-171. [16]ZADEH A,ZELLERS R,PINCUS E,et al.Mosi:Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J].arXiv:1606.0659. [17]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:ACL,2014:1532-1543. [18]DEGOTTEX G,KANE J,DRUGMAN T,et al.COVAREP — A Collaborative Voice Analysis Repository for Speech Technologies[C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2014:960-964. [19]ZADEH A,LIANG P P,PORIA S,et al.Multi-attention Recurrent Network for Human Communication Comprehension[C]∥Proceedings of the AAAI Conference on Artificial Intelligence.Menlopark,CA:AAAI,2018. [20]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980. [21]ZADEH A,ZELLERS R,PINCUS E,et al.Multimodal Sentiment Intensity Analysis in Videos:Facial Gestures and Verbal Messages[J].IEEE Intelligent Systems,2016,31(6):82-88. [22]HO T K.The Random Subspace Method for Constructing Decision Forests[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,1998,20(8):832-844. [23]HOCHREITER S,SCHMIDHUBER J.Long Short-term Memory[J].Neural Computation,1997,9(8):1735-1780. |
[1] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[2] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[3] | 常炳国, 石华龙, 常雨馨. 基于深度学习的黑色素瘤智能诊断多模型算法 Multi Model Algorithm for Intelligent Diagnosis of Melanoma Based on Deep Learning 计算机科学, 2022, 49(6A): 22-26. https://doi.org/10.11896/jsjkx.210500197 |
[4] | 李浩东, 胡洁, 范勤勤. 基于并行分区搜索的多模态多目标优化及其应用 Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application 计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019 |
[5] | 赵亮, 张洁, 陈志奎. 基于双图正则化的自适应多模态鲁棒特征学习 Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization 计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078 |
[6] | 丁锋, 孙晓. 基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取 Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF 计算机科学, 2022, 49(2): 223-230. https://doi.org/10.11896/jsjkx.210100046 |
[7] | 刘创, 熊德意. 多语言问答研究综述 Survey of Multilingual Question Answering 计算机科学, 2022, 49(1): 65-72. https://doi.org/10.11896/jsjkx.210900003 |
[8] | 陈志毅, 隋杰. 基于DeepFM和卷积神经网络的集成式多模态谣言检测方法 DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection 计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007 |
[9] | 袁景凌, 丁远远, 盛德明, 李琳. 基于视觉方面注意力的图像文本情感分析模型 Image-Text Sentiment Analysis Model Based on Visual Aspect Attention 计算机科学, 2022, 49(1): 219-224. https://doi.org/10.11896/jsjkx.201000074 |
[10] | 胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063 |
[11] | 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究 Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method 计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220 |
[12] | 戴宏亮, 钟国金, 游志铭, 戴宏明. 基于Spark的舆情情感大数据分析集成方法 Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark 计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280 |
[13] | 张晓宇, 王彬, 安卫超, 阎婷, 相洁. 基于融合损失函数的3D U-Net++脑胶质瘤分割网络 Glioma Segmentation Network Based on 3D U-Net++ with Fusion Loss Function 计算机科学, 2021, 48(9): 187-193. https://doi.org/10.11896/jsjkx.200800099 |
[14] | 张瑾, 段利国, 李爱萍, 郝晓燕. 基于注意力与门控机制相结合的细粒度情感分析 Fine-grained Sentiment Analysis Based on Combination of Attention and Gated Mechanism 计算机科学, 2021, 48(8): 226-233. https://doi.org/10.11896/jsjkx.200700058 |
[15] | 孙圣姿, 郭炳晖, 杨小博. 用于多模态语义分析的嵌入共识自动编码器 Embedding Consensus Autoencoder for Cross-modal Semantic Analysis 计算机科学, 2021, 48(7): 93-98. https://doi.org/10.11896/jsjkx.200600003 |
|