基于注意力神经网络的多模态情感分析

doi:10.11896/jsjkx.191100041

摘要/Abstract

摘要： 近年来,越来越多的人热衷于在社交媒体上同时用图片和文本等媒体形式表达自己的感受与看法,使得以图片和文本为主要内容的多模态数据不断增长。相比单模态数据,多模态数据包含的信息更丰富,更能揭示用户的真实情感。对这些海量多模态数据的情感进行分析有助于更好地理解人们的态度和观点,具有广泛的应用场景。为了解决多模态情感分类任务中的信息冗余的问题,在张量融合方案的基础上,提出了一种基于注意力神经网络的多模态情感分析方法。该方法构造了基于注意力神经网络的文本特征提取模型和图像特征提取模型,突出了图像情感信息关键区域和包含情感信息的单词,使得各单模态特征表达更简练精确。将各模态的张量积作为多模态数据的联合特征表达,采用主成分分析法剔除联合特征的冗余信息,进而使用支持向量机获取多模态数据的情感类别。在两个真实的Twitter图文数据集上对所提模型进行了评估,实验结果表明,与其他情感分类模型相比,该方法在分类准确率、召回率、F1 指标和准确率上都有较大的提升。

关键词: 多模态数据, 情感分析, 社交媒体, 张量融合, 注意力机制

Abstract: In recent years,more and more people are keen to express their feelings and opinions in the form of both pictures and texts on social media,and the scale of multimodal data including images and texts keeps growing.Compared with single mode data,multimodal data contains more information.It can better reveal the real emotion of users.Sentiment analysis of these huge amounts of multimodal data helps to better understand people's attitudes and opinions.In addition,it has a wide range of applications.In order to solve the problem of information redundancy in multimodal sentiment analysis task,this paper proposes a multimodal sentiment analysis method based on tensor fusion scheme and attention neural network.This method constructs the text feature extraction model and image feature extraction model based on attention neural network to highlight the key areas of image emotion information and words containing emotion information,so as to make the expression of each feature more concise and accurate.It fuses each modal feature using tensor fusion method in order to obtain the joint feature vector.Finally,it uses support vector machine for sentiment classification.The experimental results of this model on two real Twitter data sets show that compared with other sentiment analysis models,this method has a great improvement in precision rate,recall rate,F1 score andaccuracy rate.

Key words: Attention mechanism, Multimodal data, Sentiment analysis, Social media, Tensor fusion

中图分类号:

TP391

林敏鸿, 蒙祖强. 基于注意力神经网络的多模态情感分析[J]. 计算机科学, 2020, 47(11A): 508-514. https://doi.org/10.11896/jsjkx.191100041

LIN Min-hong, MENG Zu-qiang. Multimodal Sentiment Analysis Based on Attention Neural Network[J]. Computer Science, 2020, 47(11A): 508-514. https://doi.org/10.11896/jsjkx.191100041

参考文献

[1] ASUR S,HUBERMAN B A.Predicting the future with socialmedia [C] //Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.Toronto,Canada,2010:492-499.
[2] O'CONNOR B,BALASUBRAMANYAN R,ROUTLEDGE B R,et al.From tweets to polls:Linking text sentiment to public opinion time series [C]//Proceedings of the International AAAI Conference on Weblogs And Social Media.Washington,United States,2010:11:122-129.
[3] TUMASJAN A,SPRENGER T O,SANDNER P G,et al.Predicting elections with twitter:What 140 characters reveal about political sentiment[C]//Proceedings of the International AAAI Conference on Weblogs And Social Media.Washington,USA,2010:10:122-129.
[4] WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional Block Attention Module[C]//Proceedings of the European Conference on Computer Vision 2018.ECCV,Munich,Germany,2018:3-19.
[5] LI X,XIE H,CHEN L,et al.News impact on stock price return via sentiment analysis[J].Knowledge-Based Systems,2014,69:14-23.
[6] NGUYEN T H,SHIRAI K.Topic modeling based sentimentanalysis on social media for stock market prediction[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:Long Papers).Beijing,China,2015:1354-1364.
[7] ZHANG L,ZHAO Y,ZHU Z F.Advances in SemanticallyShared Subspace Learning for Cross-Media Data[J].Chinese Journal of Computers,2017.
[8] ZHANG L,WANG S,LIU B.Deep learning for sentiment analysis:A survey[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2018:e1253.
[9] TURNEY P D.Thumbs up or thumbs down:semantic orientation applied to unsupervised classification of reviews[J].Proceedings of Annual Meeting of the Association for Computational Linguistics,2002:417-424.
[10] TABOADAM,BROOKE J,TOFILOSKI M,et al.Lexicon-Based Methods for Sentiment Analysis[J].Computational Linguistics,2011,37(2):267-307.
[11] BACCIANELLA S,ESULI A,SEBASTIANI F.Sentiwordnet3.0:an enhanced lexical resource for sentiment analysis and opinion mining[C]//Proceedings of the International Conference on Language Resources and Evaluation.Valletta,Malta,European.2010,2010(10):2200-2204.
[12] HU M,LIU B.Mining and summarizing customer reviews[C]//Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2004:168-177.
[13] PANG B,LEE L,VAITHYANATHAN S.Thumbs up?:sentiment classification using machine learning techniques [C]//Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing.2002.
[14] KIM Y.Convolutional neural networks for sentence classification[J].arXiv,2014:1408.5882.
[15] KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A Convolutional Neural Network for Modelling Sentences[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).Association for Computational Linguistics,2014:655-665.
[16] TAY Y,TUAN L A,HUI S C.Learning to attend via word-as-pect associative fusion for aspect-based sentiment analysis[C]//Thirty-Second AAAI Conference on Artificial Intelligence,Louisiana,USA,2018.
[17] TANG D,QIN B,LIU T.Document modeling with gated re-current neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal,2015:1422-1432.
[18] SONG J,YU Q,SONG Y Z,et al.Deep spatial-semantic attention for fine-grained sketch-based image retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV 2017).Venice,Italy,2017:5551-5560.
[19] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[20] SIERSDORFER S,MINACK E,DENG F,et al.Analyzing and predicting sentiment of images on the social web[C]//Proceedings of the 18th ACM international conference on Multimedia.Irenze,Italy,2010:715-718.
[21] BORTH D,JI R,CHEN T,et al.Large-scale visual sentimentontology and detectors using adjective noun pairs[C]//Proceedings of the 21st ACM international conference on Multimedia.Barcelona,Spain,2013:223-232.
[22] XU C,CETINTAS S,LEE K C,et al.Visual sentiment prediction with deep convolutional neural networks[J].arXiv:1411.5731.
[23] YOU Q,JIN H,LUO J.Visual sentiment analysis by attending on local image regions[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.San Francisco,California,USA,AAAI Press,2017:231-237.
[24] YANG Y,JIA J,ZHANG S,et al.How do your friends on social media disclose your emotions?[C]//Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence.Québec City,Québec,Canada,AAAI Press,2014:306-312.
[25] YUAN J,MCDONOUGH S,YOU Q,et al.Sentribute:image sentiment analysis from a mid-level perspective[C]//Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining(WISDOM 2013).Chicago,IL,USA,ACM,2013:10:1-10:8.
[26] WANG M,CAO D,LI L,et al.Microblog sentiment analysis based on cross-media bag-of-words model[C]//Proceedings of the International Conference on Internet Multimedia Computing and Service.Xiamen,China,ACM,2014:76.
[27] CAO D,JI R,LIN D,et al.A cross-media public sentimentanalysis system for microblog[J].Multimedia Systems,2016,22(4):479-486.
[28] YOU Q,LUO J,JIN H,et al.Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia[C]//Proceedings of the Ninth ACM International Conference on Web Search and Data Mining.San Francisco,CA,USA,ACM,2016:13-22.
[29] YOU Q,CAO L,JIN H,et al.Robust visual-textual sentiment analysis:When attention meets tree-structured recursive neural networks[C]//Proceedings of the 24th ACM International Conference on Multimedia.Amsterdam,Netherlands,ACM,2016:1008-1017.
[30] ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis [C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing(EMNLP 2017).Copenhagen,Denmark,2017:1103-1114.
[31] BAHDANAU D,CHO K,BENGIO Y.Neural machine transla-tion by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[32] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[33] YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.San Diego,California,USA,2016:1480-1489.
[34] LU J,XIONG C,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2017).Honolulu,HI,USA,2017:375-383.
[35] 杨琬琪,高阳,周新民,等.多模态张量数据挖掘算法及应用[J].计算机科学,2012,39(1):9-13.

相关文章 15

[1]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2]	周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[4]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[5]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[7]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[8]	王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[9]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[10]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[13]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[14]	曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[15]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed