计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 219-224.doi: 10.11896/jsjkx.201000074

• 计算机图形学&多媒体 • 上一篇    下一篇

基于视觉方面注意力的图像文本情感分析模型

袁景凌, 丁远远, 盛德明, 李琳   

  1. 武汉理工大学计算机科学与技术学院 武汉430070
  • 收稿日期:2020-10-14 修回日期:2021-04-19 出版日期:2022-01-15 发布日期:2022-01-18
  • 通讯作者: 袁景凌(yuanjingling@126.com)
  • 基金资助:
    国家社会科学基金(15BGL048)

Image-Text Sentiment Analysis Model Based on Visual Aspect Attention

YUAN Jing-ling, DING Yuan-yuan, SHENG De-ming, LI Lin   

  1. School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China
  • Received:2020-10-14 Revised:2021-04-19 Online:2022-01-15 Published:2022-01-18
  • About author:YUAN Jing-ling,born in 1975,doctor,is a member of China Computer Federation.Her main research interests include mechine learning,intelligent ana-lysis and green computing.
  • Supported by:
    National Key Research and Development Program of China(2017YFB0802303) and National Natural Science Foundation of China(62076127,61571226).

摘要: 社交网络已经成为人们日常生活中不可分割的一部分,对社交媒体信息进行情感分析有助于了解人们在社交网站上的观点、态度和情绪。传统情感分析主要依赖文本内容,随着智能手机的兴起,网络上的信息逐渐多样化,除了文本内容,还包括图像。通过研究发现,在多数情况下,图像对文本有着支持增强作用,而不独立于文本来表达情感。文中提出了一种新颖的图像文本情感分析模型(LSTM-VistaNet),具体来说,LSTM-VistaNet模型未将图像信息作为直接输入,而是利用VGG16网络提取图像特征,进一步生成视觉方面注意力,赋予文档中核心句子更高的权重,得到基于视觉方面注意力的文档表示;此外,还使用LSTM模型对文本情感进行提取,得到基于文本的文档表示。最后,将两组分类结果进行融合,以获得最终的分类标签。在Yelp餐馆评论的数据集上,所提模型在精确度上达到了62.08%,比精度较高的模型BiGRU-mVGG提高了18.92%,验证了将视觉信息作为方面注意力辅助文本进行情感分类的有效性;比VistaNet模型提高了0.32%,验证了使用LSTM模型可以有效弥补VistaNet模型中图像无法完全覆盖文本的缺陷。

关键词: 视觉方面注意力, LSTM, 多模态, 情感分析, 社交图像

Abstract: Social network has become an integral part of our daily life.Sentiment analysis of social media information is helpful to understand people's views,attitudes and emotions on social networking sites.Traditional sentiment analysis mainly relies on text.With the rise of smart phones,information on the network is gradually diversified,including not only text,but also images.It is found that,in many cases,images can enhance the text rather than express emotions independently.We propose a novel image text sentiment analysis model (LSTM-VistaNet).Specifically,this model does not take the picture information as the direct input,but uses the VGG16 network to extract the image features,and then generates the visual aspect attention,and gives the core sentences in the document a higher weight,and get a document representation based on the visual aspect attention.In addition,we use the LSTM network to extract the text sentiment and get the document representation based on text only.Finally,we fuse the two groups of classification results to obtain the final classification label.On the Yelp restaurant reviews data set,our model achieves an accuracy of 62.08%,which is 18.92% higher than BiGRU-mVGG,which verifies the effectiveness of using visual information as aspect attention assisted text for emotion classification;It is 0.32% higher than VistaNet model,which proves that LSTM model can effectively make up for the defect that images in VistaNet model cannot completely cover text.

Key words: Visual aspect attention, LSTM, Multimodel, Sentiment analysis, Social images

中图分类号: 

  • TP391.1
[1]LI X,XIE H,CHEN L,et al.News impact on stock price return via sentiment analysis[J].Proceedings of the Knowledeg Based System,2014,69:14-23.
[2]KAGAN V,STEVENS A,SUBRAHMANIAN V S.Usingtwitter sentiment to forecast the 2013 Pakistani election and the 2014 Indian election[J].Proceedings of the IEEE Intelligent Systems,2015,30(1):2-5.
[3]YADAV S,EKBAL A,SAHA S,et al.Medical sentiment analysis using social media:towards building a patient assisted system[C]//Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018).2018:2790-2797.
[4]TRUONG T Q,LAUW H W.VistaNet:Visual Aspect Atten-tion Network for Multimodal Sentiment Analysis[C]//Procee-dings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-2019).2019:305-312.
[5]YU D,FU J,MEI T,et al.Multi-level attention networks for visual question answering[C]//Proceedings of the Computer Vision and Pattern Recognition (CVPR).2017:4187-4195.
[6]PORIA S,CHATURVEDI I,CAMBRIA E,et al.Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis[C]//Proceedings of 2016 IEEE 16th International Conference on Data Mining (ICDM).2017:439-448.
[7]XU N,MAO W,CHEN G.Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis[C]//Procee-dings of the AAAI Conference on Artificial Intelligence,2019,33(1),371-378.
[8]CHANSW K,CHONGMW C.Sentiment analysis in financial texts[J].Proceedings of the Decision Support Systems,2017,94(2017):53-64.
[9]SEVERYN A,MOSCHITTI A.Twitter sentiment analysis with deep convolutional neural networks[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.2015:959-962.
[10]LAI S,XU L,LIU K,et al.Recurrent convolutional neural networks for text classification[C]//Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence.2015:2267-2273.
[11]NGUYEN T,KAVURI S,LEE M.A multimodal convolutional neuro-fuzzy network for emotion understanding of movie clips[J].Proceedings of the Neural Networks.2019,118:208-219.
[12]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[13]YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[C]//Proceedings of North American Chapter of the Association for Computational Linguistics (HLT-NAACL).2016:1480-1489.
[14]TANG D,QIN B,LIU T.Aspect level sentiment classification with deep memory network[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).2016:214-224.
[15]TRUONG Q,LAUW H W.Visual sentiment analysis for review images with item-oriented and user-oriented CNN[C]//Proceedings of the 25th ACM International Conference on Multimedia.2017:1274-1282.
[16]LI M,GAN T,LIU M,et al.Long-tail Hashtag Recommendation for Micro-videos with Graph Convolutional Network[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.2019:509-518.
[17]SIERSDORFER S,MINACK E,DENG F,et al.Analyzing and predicting sentiment of images on the social web[C]//Procee-dings of the 18th ACM International Conference on Multimedia.2010:715-718.
[18]YOU Q,LUO J,JIN H,et al.Robust image sentiment analysis using progressively trained and domain transferred deep networks[C]//Proceedings In AAAI.2015:381-388.
[19]BORTH D,JI R,CHEN T,et al.Large-scale visual sentiment ontology and detectors using adjective noun pairs[C]//Procee-dings of the 21st ACM International Conference on Multimedia.2013:223-232.
[20]YOU Q,JIN H,LUO J.Visual sentiment analysis by attending on local image regions[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence.2017:231-237.
[21]XU N,MAO W.MultiSentiNet:A Deep Semantic Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2399-2402.
[22]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of theConfe-rence on Empirical Methods in Natural Language Processing.2014:1532-1543.
[23]CHO K,VAN M,GULCEHRE C,et al.Learning phrase representations using rnn encoder-decoder for statistical machine translation[C]//Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014).2014:1724-1734.
[24]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the International Conference on Learning Representations.2015.
[25]YUE W,WAEL A,PREMKUMAR N.Multi-Modality ImageManipu-lation Detection [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2019:9543-9552.
[26]TANG D,QIN B,LIU T.Document modeling with gated recurrent neural network for sentiment classification[C]//Procee-dings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).2015:1422-1432.
[1] 刘创, 熊德意. 多语言问答研究综述[J]. 计算机科学, 2022, 49(1): 65-72.
[2] 陈志毅, 隋杰. 基于DeepFM和卷积神经网络的集成式多模态谣言检测方法[J]. 计算机科学, 2022, 49(1): 101-107.
[3] 胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法[J]. 计算机科学, 2022, 49(1): 252-258.
[4] 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究[J]. 计算机科学, 2021, 48(9): 50-58.
[5] 戴宏亮, 钟国金, 游志铭, 戴宏明. 基于Spark的舆情情感大数据分析集成方法[J]. 计算机科学, 2021, 48(9): 118-124.
[6] 张晓宇, 王彬, 安卫超, 阎婷, 相洁. 基于融合损失函数的3D U-Net++脑胶质瘤分割网络[J]. 计算机科学, 2021, 48(9): 187-193.
[7] 张瑾, 段利国, 李爱萍, 郝晓燕. 基于注意力与门控机制相结合的细粒度情感分析[J]. 计算机科学, 2021, 48(8): 226-233.
[8] 孙圣姿, 郭炳晖, 杨小博. 用于多模态语义分析的嵌入共识自动编码器[J]. 计算机科学, 2021, 48(7): 93-98.
[9] 程思伟, 葛唯益, 王羽, 徐建. BGCN:基于BERT和图卷积网络的触发词检测[J]. 计算机科学, 2021, 48(7): 292-298.
[10] 胡聿文. 基于优化LSTM模型的股票预测[J]. 计算机科学, 2021, 48(6A): 151-157.
[11] 史伟, 付月. 考虑语境的微博短文本挖掘:情感分析的方法[J]. 计算机科学, 2021, 48(6A): 158-164.
[12] 陈慧琴, 郭贯成, 秦朝轩, 李兆碧. 基于GM-LSTM模型的南京市老年人口预测研究[J]. 计算机科学, 2021, 48(6A): 231-234.
[13] 潘芳, 张会兵, 董俊超, 首照宇. 基于高效Transformer的中文在线课程评论方面情感分析[J]. 计算机科学, 2021, 48(6A): 264-269.
[14] 俞建业, 戚湧, 王宝茁. 基于Spark的车联网分布式组合深度学习入侵检测方法[J]. 计算机科学, 2021, 48(6A): 518-523.
[15] 张明阳, 王刚, 彭起, 张岩峰. 学术论文公开评审平台数据分析[J]. 计算机科学, 2021, 48(6): 63-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 郭帅,刘亮,秦小麟. 用户偏好约束的空间关键词范围查询处理方法[J]. 计算机科学, 2018, 45(4): 182 -189 .
[2] 王军, 吴泽民, 杨巍, 胡磊, 张兆丰, 姜青竹. 基于稀疏恢复与优化的显著性目标检测算法[J]. 计算机科学, 2018, 45(8): 258 -263 .
[3] 赵尔平, 孟小峰. 基于Spark的3D点云数据空间索引技术[J]. 计算机科学, 2018, 45(9): 213 -219 .
[4] 鞠恒荣,李华雄,周献中,黄兵,杨习贝. 基于Local约简的序贯三支分类器[J]. 计算机科学, 2017, 44(9): 34 -39 .
[5] 姜文超,林穗,王多强,李东明,金海. Calculix三级并行优化及其在天河二号超级计算机中的应用[J]. 计算机科学, 2017, 44(3): 32 -35 .
[6] 桂小庆,张俊,张晓民,于鹏飞. 时态主题模型方法及应用研究综述[J]. 计算机科学, 2017, 44(2): 46 -55 .
[7] 金瑜,蔡超,何亨,李鹏. BTDA:基于半可信第三方的动态云数据更新审计方案[J]. 计算机科学, 2018, 45(3): 144 -150 .
[8] 杨思星,郭艳,刘杰,孙保明. 基于动态格点的压缩感知目标计数和定位算法[J]. 计算机科学, 2018, 45(1): 223 -227 .
[9] 侯晓媛,王显荣,李华,沈维维. U2TP到TTCN-3自动转换的研究与实现[J]. 计算机科学, 2014, 41(Z6): 433 -437 .
[10] 罗文俊,弓守朋. 多变量公钥密码体制扩展方案的改进[J]. 计算机科学, 2014, 41(Z6): 361 -362 .