计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 181-185.doi: 10.11896/jsjkx.181001941

• 人工智能 • 上一篇    下一篇

基于上下文增强LSTM的多模态情感分析

刘启元, 张栋, 吴良庆, 李寿山   

  1. (苏州大学计算机科学与技术学院 江苏 苏州215006)
  • 收稿日期:2018-10-18 出版日期:2019-11-15 发布日期:2019-11-14
  • 通讯作者: 李寿山(1980-),男,教授,主要研究方向为自然语言处理、情感分析,E-mail:lishoushan@suda.edu.cn
  • 作者简介:刘启元(1994-),男,硕士生,CCF会员,主要研究方向为自然语言处理、情感分析,E-mail:qyliu@stu.suda.edu.cn;张栋(1991-),男,博士生,主要研究方向为自然语言处理、情感分析;吴良庆(1995-),男,硕士生,主要研究方向为自然语言处理、情感分析。
  • 基金资助:
    本文受国家自然科学基金(61331011,61375073)资助。

Multi-modal Sentiment Analysis with Context-augmented LSTM

LIU Qi-yuan, ZHANG Dong, WU Liang-qing, LI Shou-shan   

  1. (School of Computer Science & Technology,Soochow University,Suzhou,Jiangsu 215006,China)
  • Received:2018-10-18 Online:2019-11-15 Published:2019-11-14

摘要: 近年来,多模态情感分析成为了越来越受欢迎的热门领域,它将传统的基于文本的情感分析扩展到文本、图像以及声音相结合的多模态分析层面。多模态情感分析通常需要获取单模态内部的信息以及多模态之间的交互信息。为了利用每个模态中语言表达的上下文来帮助获取这两种信息,文中提出了一种基于上下文增强LSTM的多模态情感分析方法。具体而言,首先对于多模态的每种表达,结合上下文特征,分别使用LSTM进行编码,再分别捕捉单模态内部的信息;接着融合这些单模态的独立信息,再使用LSTM获得多模态间的交互信息,从而形成多模态特征表示;最后采用最大池化策略,对多模态表示进行降维,从而构建情感分类器。该方法在MOSI数据集上的ACC值达到75.3%,F1达到了74.9。相比传统的机器学习方法(如SVM),所提方法的ACC值高出8.1%,F1值高出7.3。相比目前较为先进的深度学习方法值,所提方法在ACC值上高出0.9%,F1值上高出1.3,与此同时可训练参数量只有之前方法的1/20,训练速度提高了约10倍。大量的对比实验结果表明,相比传统的多模态情感分类方法,所提方法的性能有显著提升。

关键词: 多模态, 情感分析, 上下文增强

Abstract: In recent years,multi-modal sentiment analysis has become an increasingly popular research area,which extends traditional text-based sentiment analysis to a multi-modal level that combines text,images and sound.Multi-modal sentiment analysis usually requires the acquisition of independent information within a single modality and interactive information between different modalities.In order to use the context information of language expression in each modality to obtain these two kinds of information,a multi-modal sentiment analysis approach based on context-augmented LSTM was proposed.Specifically,each modality is encoded in combination with the context feature using LSTM which aims to capture the independent information within single modality firstly.Subsequently,the independent information of multi-modality is merged,and the other LSTM layer is utilized to obtain the interactive information between the different modalities to form a multi-modal feature representation.Finally,the max-pooling strategy is used to reduce the dimension of the multi-modal representation,which will be fed to the sentiment classifier.The method achieves 75.3% ACC on the MOSI data set and F1 reaches 74.9.Compared to traditional machine learning methods such as SVM,ACC is 8.1% higher and F1 is 7.3 higher.Compared with the current advanced deep learning method,it is 0.9% higher on ACC and 1.3 higher on F1.At the same time,the trainable parameters are reduced by about 20 times,and the training speed is increased by 10 times.The experimental results demonstrate that the performance of the proposed approach significantly outperforms the competitive multi-modal sentiment classification baselines.

Key words: Context enhancement, Multi-modal, Sentiment analysis

中图分类号: 

  • TP391
[1]MORENCY L P,MIHALCEA R,DOSHI P.Towards Multimodal Sentiment Analysis:Harvesting Opinions from the Web[C]∥Proceedings of International Conference on Multimodal Interfaces.ACM,2011:169-176.
[2]BUSSO C,BULUT M,LEE C C,et al.Iemocap:Interactive emotional dyadic motion capture database [J].Journal of Language Resources and Evaluation,2008,42(4):335-359.
[3]PARK S,SHIM H S,CHATTERJEE M,et al.Computationalanalysis of persuasiveness in social multimedia:A novel dataset and multimodal prediction approach[C]∥Proceedings of the 16th International Conference on Multimodal Interaction.New York:ACM,2014:50-57.
[4]PORIA S,CAMBRIA E,GELBUKH A F.Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-Level Multimodal Sentiment Analysis[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:ACL,2015:2539-2544.
[5]PORIA S,CAMBRIA E,HAZARIKA D,et al.Context-dependent sentiment analysis in user-generated videos[C]∥Procee-dings of the 55th ACL.2017:873-883.
[6]PORIA S,CHATURVEDI I,CAMBRIA E,et al.Convolutional MKL based multimodal emotion recognition and sentiment analy-sis[C]∥IEEE 16th ICDM.Piscataway,NJ:IEEE,2016:439-448.
[7]NOJAVANASGHARI B,GOPINATH D,KOUSHIK J,et al.Deep Multimodal Fusion for Persuasiveness Prediction[C]∥Proceedings of International Conference on Multimodal Interaction.ACM,2016:284-288.
[8]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Proces-sing.Stroudsburg,PA:ACL,2017:1103-1114.
[9]DIANE J L,KATE F R.Predicting student emotions in computer-human tutoring dialogues[C]∥Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.Stroudsburg,PA:ACL,2004:351.
[10]EYBEN F,WOLLMER M,GRAVES A,et al.On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues[J].Journal on Multimodal User Interfaces,2010,3(1/2):7-19.
[11]WOLLMER M,WENINGER F,KNAUP T,et al.Youtubemovie reviews:Sentiment analysis in an audio-visual context[J].IEEE Intelligent Systems,2013,28(3):46-53.
[12]WANG H,MEGHAWAT A,MORENCY L P,et al.Select-Additive Learning:Improving Cross-individual Generalization in Multimodal Sentiment Analysis[J].arXiv:1609.05244.
[13]GU Y,CHEN S H,MARSIC I.Deep multimodal learning foremotion recognition on spoken language[C]∥2018 IEEE International Conference Proceedings of Speech and Signal Processing (ICASSP).Piscataway,NJ:IEEE,2018.
[14]ZADEH A,LANG P P,VANBRIESEN J,et al.MultimodalLanguage Analysis in the Wild:CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph[C]∥Proceedings of the Meeting of the Association for Computational Linguistics.Stroudsburg,PA:ACL,2018:2236-2246.
[15]CHEN M,WANG S,LIANG P P,et al.Multimodal Sentiment Analysis with Word-level Fusion and Reinforcement Learning[C]∥Proceedings of International Conference on Multimodal Interaction.ACM,2017:163-171.
[16]ZADEH A,ZELLERS R,PINCUS E,et al.Mosi:Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J].arXiv:1606.0659.
[17]PENNINGTON J,SOCHER R,MANNING C.Glove:GlobalVectors for Word Representation[C]∥Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:ACL,2014:1532-1543.
[18]DEGOTTEX G,KANE J,DRUGMAN T,et al.COVAREP — A Collaborative Voice Analysis Repository for Speech Technologies[C]∥Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2014:960-964.
[19]ZADEH A,LIANG P P,PORIA S,et al.Multi-attention Recurrent Network for Human Communication Comprehension[C]∥Proceedings of the AAAI Conference on Artificial Intelligence.Menlopark,CA:AAAI,2018.
[20]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980.
[21]ZADEH A,ZELLERS R,PINCUS E,et al.Multimodal Sentiment Intensity Analysis in Videos:Facial Gestures and Verbal Messages[J].IEEE Intelligent Systems,2016,31(6):82-88.
[22]HO T K.The Random Subspace Method for Constructing Decision Forests[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,1998,20(8):832-844.
[23]HOCHREITER S,SCHMIDHUBER J.Long Short-term Memory[J].Neural Computation,1997,9(8):1735-1780.
[1] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[2] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3] 常炳国, 石华龙, 常雨馨.
基于深度学习的黑色素瘤智能诊断多模型算法
Multi Model Algorithm for Intelligent Diagnosis of Melanoma Based on Deep Learning
计算机科学, 2022, 49(6A): 22-26. https://doi.org/10.11896/jsjkx.210500197
[4] 李浩东, 胡洁, 范勤勤.
基于并行分区搜索的多模态多目标优化及其应用
Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application
计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019
[5] 赵亮, 张洁, 陈志奎.
基于双图正则化的自适应多模态鲁棒特征学习
Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization
计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078
[6] 丁锋, 孙晓.
基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取
Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF
计算机科学, 2022, 49(2): 223-230. https://doi.org/10.11896/jsjkx.210100046
[7] 刘创, 熊德意.
多语言问答研究综述
Survey of Multilingual Question Answering
计算机科学, 2022, 49(1): 65-72. https://doi.org/10.11896/jsjkx.210900003
[8] 陈志毅, 隋杰.
基于DeepFM和卷积神经网络的集成式多模态谣言检测方法
DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection
计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007
[9] 袁景凌, 丁远远, 盛德明, 李琳.
基于视觉方面注意力的图像文本情感分析模型
Image-Text Sentiment Analysis Model Based on Visual Aspect Attention
计算机科学, 2022, 49(1): 219-224. https://doi.org/10.11896/jsjkx.201000074
[10] 胡艳丽, 童谭骞, 张啸宇, 彭娟.
融入自注意力机制的深度学习情感分析方法
Self-attention-based BGRU and CNN for Sentiment Analysis
计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[11] 周新民, 胡宜桂, 刘文洁, 孙荣俊.
基于多模态多层级数据融合方法的城市功能识别研究
Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method
计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[12] 戴宏亮, 钟国金, 游志铭, 戴宏明.
基于Spark的舆情情感大数据分析集成方法
Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark
计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280
[13] 张晓宇, 王彬, 安卫超, 阎婷, 相洁.
基于融合损失函数的3D U-Net++脑胶质瘤分割网络
Glioma Segmentation Network Based on 3D U-Net++ with Fusion Loss Function
计算机科学, 2021, 48(9): 187-193. https://doi.org/10.11896/jsjkx.200800099
[14] 张瑾, 段利国, 李爱萍, 郝晓燕.
基于注意力与门控机制相结合的细粒度情感分析
Fine-grained Sentiment Analysis Based on Combination of Attention and Gated Mechanism
计算机科学, 2021, 48(8): 226-233. https://doi.org/10.11896/jsjkx.200700058
[15] 孙圣姿, 郭炳晖, 杨小博.
用于多模态语义分析的嵌入共识自动编码器
Embedding Consensus Autoencoder for Cross-modal Semantic Analysis
计算机科学, 2021, 48(7): 93-98. https://doi.org/10.11896/jsjkx.200600003
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!