计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 250-254.doi: 10.11896/jsjkx.190800154

• 人工智能 • 上一篇    下一篇

基于视觉语义联合嵌入和注意力机制的情感预测

蓝亦伦, 孟敏, 武继刚   

  1. 广东工业大学计算机学院 广州 510006
  • 收稿日期:2019-08-29 修回日期:2019-11-22 出版日期:2020-11-15 发布日期:2020-11-05
  • 通讯作者: 孟敏(minmeng@gdut.edu.cn)
  • 作者简介:allentoretto@163.com
  • 基金资助:
    国家自然科学基金(61702114);广东省科技计划重点领域研发项目(2019B010121001)

Visual Sentiment Prediction with Visual Semantic Embedding and Attention Mechanism

LAN Yi-lun, MENG Min, WU Ji-gang   

  1. Department of Computer Science,Guangdong University of Technology,Guangzhou 510006,China
  • Received:2019-08-29 Revised:2019-11-22 Online:2020-11-15 Published:2020-11-05
  • About author:LAN Yi-lun,born in 1995,postgra-duate.His main research interests include visual sentiment prediction and image classification
    MENG Min,born in 1985,Ph.D,asso-ciate professor,postgraduate supervisor,is a member of China Computer Federation.Her main research interests include image processing and machine learning.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61702114) and Guangdong Key R&D Project of China (2019B010121001).

摘要: 为了缓解图像视觉特征与情感语义特征之间存在的鸿沟,减弱图像中情感无关区域对情感分类的影响,提出了一种结合视觉语义联合嵌入和注意力模型的情感分类算法。首先利用自编码器学习图像的视觉特征和情感属性的语义特征的联合嵌入特征,缩小低层次的视觉特征与高层次的语义特征之间的差距;然后提取图像的一组显著区域特征,引入注意力模型建立显著区域与联合嵌入特征的关联,确定与情感相关的显著区域;最后基于这些显著区域特征构建情感分类器,实现图像的情感分类。实验结果表明,该算法有效地改进了现有的图像情感分类方法,显著提高了对测试样本的情感分类精度。

关键词: 视觉情感预测, 视觉语义联合嵌入, 显著区域检测, 注意力机制

Abstract: In order to bridge the semantic gap between visual features and sentiments and reduce the impact of sentiment irrelevant regions in the image,this paper presents a novel visual sentiment prediction method by integrating visual semantic embedding and attention mechanism.Firstly,the method employs the auto-encoder to learn joint embedding of image features and semantic features,so as to alleviate the difference between the low-level visual features and the high-level semantic features.Secondly,a set of salient region features are extracted as input to the attention model,in which the correlations between salient regions and joint embedding features can be established to discover sentiment relevant regions.Finally,the sentiment classifier is built on top of these regions for visual sentiment prediction.The experimental results show that,the proposed method significantly improves the classification performance on testing samples and outperforms the state-of-the-art algorithms on visual sentiment analysis.

Key words: Attention mechanism, Salient regions detection, Visual semantic embedding, Visual sentiment prediction

中图分类号: 

  • TP391.41
[1] PANG B,LEE L.Opinion mining and sentiment analysis[J].Foundations and Trends® in Information Retrieval,2008,2(1/2):1-135.
[2] YANG J,SHE D,SUN M,et al.Visual sentiment predictionbased on automatic discovery of affective regions[J].IEEE Transactions on Multimedia,2018,20(9):2513-2525.
[3] YOU Q,JIN H,LUO J.Visual sentiment analysis by attendingon local image regions[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017:231-237.
[4] SONG K,YAO T,LING Q,et al.Boosting image sentimentanalysis with visual attention[J].Neurocomputing,2018,312:218-228.
[5] FAN S,JIANG M,SHEN Z,et al.The Role of Visual Attention in Sentiment Prediction[C]//Proceedings of the 25th ACM International Conference on Multimedia.ACM,2017:217-225.
[6] FAN S,SHEN Z,JIANG M,et al.Emotional attention:A study of image sentiment and visual attention[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7521-7531.
[7] ANDERSON P,HE X,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086.
[8] WEI-NING W,YING-LIN Y,SHENG-MING J.Image retrieval by emotional semantics:A study of emotional space and feature extraction[C]//2006 IEEE International Conference on Systems,Man and Cybernetics.IEEE,2006,4:3534-3539.
[9] MACHAJDIK J,HANBURY A.Affective image classificationusing features inspired by psychology and art theory[C]//Proceedings of the 18th ACM international conference on Multimedia.ACM,2010:83-92.
[10] ZHAO S,GAO Y,JIANG X,et al.Exploring principles-of-art features for image emotion recognition[C]//Proceedings of the 22nd ACM international conference on Multimedia.ACM,2014:47-56.
[11] BORTH D,JI R,CHEN T,et al.Large-scale visual sentimentontology and detectors using adjective noun pairs[C]//Proceedings of the 21st ACM International Conference on Multimedia.ACM,2013:223-232.
[12] LI Z,FAN Y,LIU W,et al.Image sentiment prediction based on textual descriptions with adjective noun pairs[J].Multimedia Tools and Applications,2018,77(1):1115-1132.
[13] CAMPOS V,JOU B,GIRO-I-NIETO X.From pixels to sentiment:Fine-tuning CNNs for visual sentiment prediction[J].Ima-ge and Vision Computing,2017,65:15-22.
[14] DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE conference on computer vision and pattern recognition.IEEE,2009:248-255.
[15] ZHU X,LI L,ZHANG W,et al.Dependency Exploitation:AUnified CNN-RNN Approach for Visual Emotion Recognition[C]//IJCAI.2017:3595-3601.
[16] BENGIO Y,LAMBLIN P,POPOVICI D,et al.Greedy layer-wise training of deep networks[C]//Advances in Neural Information Processing Systems.2007:153-160.
[17] VINCENT P,LAROCHELLE H,LAJOIE I,et al.Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion[J].Journal of Machine Learning Research,2010,11(Dec):3371-3408.
[18] XU K,BA J,KIROS R,et al.Show,attend and tell:Neural image caption generation with visual attention[C]//International Conference on Machine Learning.2015:2048-2057.
[19] FAN S,NG T T,HERBERG J S,et al.An automated estimator of image visual realism based on human cognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:4201-4208.
[20] CHEN T,BORTH D,DARRELL T,et al.Deepsentibank:Visual sentiment concept classification with deep convolutional neural networks[J].arXiv:1410.8586,2014.
[21] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convol-utional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105.
[22] YOU Q,CAO L,JIN H,et al.Robust visual-textual sentiment analysis:When attention meets tree-structured recursive neural networks[C]//Proceedings of the 24th ACM International Conference on Multimedia.ACM,2016:1008-1017.
[23] CAMPOS V,JOU B,GIRO-I-NIETO X.From pixels to sentiment:Fine-tuning CNNs for visual sentiment prediction[J].Ima-ge and Vision Computing,2017,65:15-22.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[3] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[6] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[7] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[8] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[12] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[13] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[14] 孟月波, 穆思蓉, 刘光辉, 徐胜军, 韩九强.
基于向量注意力机制GoogLeNet-GMP的行人重识别方法
Person Re-identification Method Based on GoogLeNet-GMP Based on Vector Attention Mechanism
计算机科学, 2022, 49(7): 142-147. https://doi.org/10.11896/jsjkx.210600198
[15] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!