计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 219-224.doi: 10.11896/jsjkx.201000074

• 计算机图形学&多媒体 • 上一篇    下一篇

基于视觉方面注意力的图像文本情感分析模型

袁景凌, 丁远远, 盛德明, 李琳   

  1. 武汉理工大学计算机科学与技术学院 武汉430070
  • 收稿日期:2020-10-14 修回日期:2021-04-19 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 袁景凌(yuanjingling@126.com)
  • 基金资助:
    国家社会科学基金(15BGL048)

Image-Text Sentiment Analysis Model Based on Visual Aspect Attention

YUAN Jing-ling, DING Yuan-yuan, SHENG De-ming, LI Lin   

  1. School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China
  • Received:2020-10-14 Revised:2021-04-19 Online:2022-01-15 Published:2022-01-18
  • About author:YUAN Jing-ling,born in 1975,doctor,is a member of China Computer Federation.Her main research interests include mechine learning,intelligent ana-lysis and green computing.
  • Supported by:
    National Key Research and Development Program of China(2017YFB0802303) and National Natural Science Foundation of China(62076127,61571226).

摘要: 社交网络已经成为人们日常生活中不可分割的一部分,对社交媒体信息进行情感分析有助于了解人们在社交网站上的观点、态度和情绪。传统情感分析主要依赖文本内容,随着智能手机的兴起,网络上的信息逐渐多样化,除了文本内容,还包括图像。通过研究发现,在多数情况下,图像对文本有着支持增强作用,而不独立于文本来表达情感。文中提出了一种新颖的图像文本情感分析模型(LSTM-VistaNet),具体来说,LSTM-VistaNet模型未将图像信息作为直接输入,而是利用VGG16网络提取图像特征,进一步生成视觉方面注意力,赋予文档中核心句子更高的权重,得到基于视觉方面注意力的文档表示;此外,还使用LSTM模型对文本情感进行提取,得到基于文本的文档表示。最后,将两组分类结果进行融合,以获得最终的分类标签。在Yelp餐馆评论的数据集上,所提模型在精确度上达到了62.08%,比精度较高的模型BiGRU-mVGG提高了18.92%,验证了将视觉信息作为方面注意力辅助文本进行情感分类的有效性;比VistaNet模型提高了0.32%,验证了使用LSTM模型可以有效弥补VistaNet模型中图像无法完全覆盖文本的缺陷。

关键词: LSTM, 多模态, 情感分析, 社交图像, 视觉方面注意力

Abstract: Social network has become an integral part of our daily life.Sentiment analysis of social media information is helpful to understand people's views,attitudes and emotions on social networking sites.Traditional sentiment analysis mainly relies on text.With the rise of smart phones,information on the network is gradually diversified,including not only text,but also images.It is found that,in many cases,images can enhance the text rather than express emotions independently.We propose a novel image text sentiment analysis model (LSTM-VistaNet).Specifically,this model does not take the picture information as the direct input,but uses the VGG16 network to extract the image features,and then generates the visual aspect attention,and gives the core sentences in the document a higher weight,and get a document representation based on the visual aspect attention.In addition,we use the LSTM network to extract the text sentiment and get the document representation based on text only.Finally,we fuse the two groups of classification results to obtain the final classification label.On the Yelp restaurant reviews data set,our model achieves an accuracy of 62.08%,which is 18.92% higher than BiGRU-mVGG,which verifies the effectiveness of using visual information as aspect attention assisted text for emotion classification;It is 0.32% higher than VistaNet model,which proves that LSTM model can effectively make up for the defect that images in VistaNet model cannot completely cover text.

Key words: LSTM, Multimodel, Sentiment analysis, Social images, Visual aspect attention

中图分类号: 

  • TP391.1
[1]LI X,XIE H,CHEN L,et al.News impact on stock price return via sentiment analysis[J].Proceedings of the Knowledeg Based System,2014,69:14-23.
[2]KAGAN V,STEVENS A,SUBRAHMANIAN V S.Usingtwitter sentiment to forecast the 2013 Pakistani election and the 2014 Indian election[J].Proceedings of the IEEE Intelligent Systems,2015,30(1):2-5.
[3]YADAV S,EKBAL A,SAHA S,et al.Medical sentiment analysis using social media:towards building a patient assisted system[C]//Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018).2018:2790-2797.
[4]TRUONG T Q,LAUW H W.VistaNet:Visual Aspect Atten-tion Network for Multimodal Sentiment Analysis[C]//Procee-dings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-2019).2019:305-312.
[5]YU D,FU J,MEI T,et al.Multi-level attention networks for visual question answering[C]//Proceedings of the Computer Vision and Pattern Recognition (CVPR).2017:4187-4195.
[6]PORIA S,CHATURVEDI I,CAMBRIA E,et al.Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis[C]//Proceedings of 2016 IEEE 16th International Conference on Data Mining (ICDM).2017:439-448.
[7]XU N,MAO W,CHEN G.Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis[C]//Procee-dings of the AAAI Conference on Artificial Intelligence,2019,33(1),371-378.
[8]CHANSW K,CHONGMW C.Sentiment analysis in financial texts[J].Proceedings of the Decision Support Systems,2017,94(2017):53-64.
[9]SEVERYN A,MOSCHITTI A.Twitter sentiment analysis with deep convolutional neural networks[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.2015:959-962.
[10]LAI S,XU L,LIU K,et al.Recurrent convolutional neural networks for text classification[C]//Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.2015:2267-2273.
[11]NGUYEN T,KAVURI S,LEE M.A multimodal convolutional neuro-fuzzy network for emotion understanding of movie clips[J].Proceedings of the Neural Networks.2019,118:208-219.
[12]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[13]YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//Proceedings of North American Chapter of the Association for Computational Linguistics (HLT-NAACL).2016:1480-1489.
[14]TANG D,QIN B,LIU T.Aspect level sentiment classification with deep memory network[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).2016:214-224.
[15]TRUONG Q,LAUW H W.Visual sentiment analysis for review images with item-oriented and user-oriented CNN[C]//Proceedings of the 25th ACM International Conference on Multimedia.2017:1274-1282.
[16]LI M,GAN T,LIU M,et al.Long-tail Hashtag Recommendation for Micro-videos with Graph Convolutional Network[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.2019:509-518.
[17]SIERSDORFER S,MINACK E,DENG F,et al.Analyzing and predicting sentiment of images on the social web[C]//Procee-dings of the 18th ACM International Conference on Multimedia.2010:715-718.
[18]YOU Q,LUO J,JIN H,et al.Robust image sentiment analysis using progressively trained and domain transferred deep networks[C]//Proceedings In AAAI.2015:381-388.
[19]BORTH D,JI R,CHEN T,et al.Large-scale visual sentiment ontology and detectors using adjective noun pairs[C]//Procee-dings of the 21st ACM International Conference on Multimedia.2013:223-232.
[20]YOU Q,JIN H,LUO J.Visual sentiment analysis by attending on local image regions[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence.2017:231-237.
[21]XU N,MAO W.MultiSentiNet:A Deep Semantic Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2399-2402.
[22]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of theConfe-rence on Empirical Methods in Natural Language Processing.2014:1532-1543.
[23]CHO K,VAN M,GULCEHRE C,et al.Learning phrase representations using rnn encoder-decoder for statistical machine translation[C]//Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014).2014:1724-1734.
[24]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the International Conference on Learning Representations.2015.
[25]YUE W,WAEL A,PREMKUMAR N.Multi-Modality ImageManipu-lation Detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2019:9543-9552.
[26]TANG D,QIN B,LIU T.Document modeling with gated recurrent neural network for sentiment classification[C]//Procee-dings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).2015:1422-1432.
[1] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[2] 周旭, 钱胜胜, 李章明, 方全, 徐常胜.
基于对偶变分多模态注意力网络的不完备社会事件分类方法
Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification
计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[3] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[4] 常炳国, 石华龙, 常雨馨.
基于深度学习的黑色素瘤智能诊断多模型算法
Multi Model Algorithm for Intelligent Diagnosis of Melanoma Based on Deep Learning
计算机科学, 2022, 49(6A): 22-26. https://doi.org/10.11896/jsjkx.210500197
[5] 于家畦, 康晓东, 白程程, 刘汉卿.
一种新的中文电子病历文本检索模型
New Text Retrieval Model of Chinese Electronic Medical Records
计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198
[6] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[7] 王杉, 徐楚怡, 师春香, 张瑛.
基于CNN-LSTM的卫星云图云分类方法研究
Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM
计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177
[8] 李浩东, 胡洁, 范勤勤.
基于并行分区搜索的多模态多目标优化及其应用
Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application
计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019
[9] 赵亮, 张洁, 陈志奎.
基于双图正则化的自适应多模态鲁棒特征学习
Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization
计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078
[10] 丁锋, 孙晓.
基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取
Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF
计算机科学, 2022, 49(2): 223-230. https://doi.org/10.11896/jsjkx.210100046
[11] 刘创, 熊德意.
多语言问答研究综述
Survey of Multilingual Question Answering
计算机科学, 2022, 49(1): 65-72. https://doi.org/10.11896/jsjkx.210900003
[12] 陈志毅, 隋杰.
基于DeepFM和卷积神经网络的集成式多模态谣言检测方法
DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection
计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007
[13] 胡艳丽, 童谭骞, 张啸宇, 彭娟.
融入自注意力机制的深度学习情感分析方法
Self-attention-based BGRU and CNN for Sentiment Analysis
计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[14] 周新民, 胡宜桂, 刘文洁, 孙荣俊.
基于多模态多层级数据融合方法的城市功能识别研究
Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method
计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[15] 戴宏亮, 钟国金, 游志铭, 戴宏明.
基于Spark的舆情情感大数据分析集成方法
Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark
计算机科学, 2021, 48(9): 118-124. https://doi.org/10.11896/jsjkx.210400280
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!