计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 231200163-8.doi: 10.11896/jsjkx.231200163
高玮军, 孙子博, 刘书君
GAO Weijun, SUN Zibi, LIU Shujun
摘要: 在社交媒体中,人们往往首先被图片中的人物表情所吸引,直接触及到情感。然而,对于情感的完整表达,场景也扮演着不可或缺的角色,为情感分析提供了必要的背景和支持。但许多学者忽视了场景在情感表达中的重要性,导致结果并非最优。针对图文双模态情感分析模型存在忽略多模态间的对齐、图片特征提取不充分和模型泛化能力不高的问题,提出了多视角图像文本情感分析网络(Multi-view Image-Text Emotion Analysis Network Model,MITN)。在图像特征提取中,在面部表情方面加入注意力机制来更好地捕捉人物面部表情,在场景方面加入空洞卷积引入膨胀率来增大感受野,并利用Places数据集对Scene-VGG进行迁移学习训练,以此来充分利用场景。使用BERT+BiGRU来提取文本表达特征,在多模态情感数据集 MVSA 上的实验验证了所提模型的有效性。
中图分类号:
[1]GIATSOGLOU M,VOZALIS M G,DIAMANTARAS K,et al.Sentiment analysis leveraging emotions and word embeddings[J].Expert Systems with Applications,2017,69:214-224. [2]SINGH V,RAM M,PANT B.Identification of zonal-wise passenger's issues in Indian railways using latent Dirichlet allocation(LDA):A sentiment analysis approach on tweets[M]//Mathematics Applied in Information Systems.2018. [3]CHATURVEDI I,RAGUSA E,GASTALDO P,et al.Bayesian network based extreme learning machine for subjectivity detection[J].Journal of The Franklin Institute,2018,355(4):1780-1797. [4]BANDHAKAVI A,WIRATUNGA N,MASSIES,et al.Lexicon generation for emotion detection from text[J].IEEE intelligent systems,2017,32(1):102-108. [5]PORIA S,CAMBRIA E,BAJPAIR,et al.A review of affective computing:From unimodal analysis to multimodal fusion[J].Information Fusion,2017,37:98-125. [6]HUANG Y,DU C,XUE Z,et al.What Makes MultimodalLearning Better than Single(Provably)[J].Advances in Neural Information Processing Systems,2021,34:10944-10956. [7]DENG D,ZHOU Y,PI J,et al.Multimodal utterance-level affect analysis using visual,audio and text features[J].arXiv:1805.00625,2018. [8]SHUTOVA E,KIELA D,MAILLARD J.Black holes and white rabbits:Metaphor identification with visual features[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:160-170. [9]YU Y,LIN H,MENG J,et al.Visual and textual sentimentanalysis of a microblog using deep convolutional neural networks[J].Algorithms,2016,9(2):41. [10]LIU H Y,HU Z G,PENG D L.The interaction of emotion andlanguage processing[J].Advances in Psychological Science,2009,17(4):714. [11]ORTIS A,FARINELLA G M,BATTIATO S.An Overview on Image Sentiment Analysis:Methods,Datasets and Current Challenges[J].ICETE(1),2019:296-306. [12]COLOMBO C,DEL BIMBO A,PALA P.Semantics in visual information retrieval[J].IEEE Multimedia,1999,6(3):38-53. [13]SCHMIDT S,STOCK W G.Collective indexing of emotions in images.A study in emotional information retrieval[J].Journal of the American Society for Information Science and Technology,2009,60(5):863-876. [14]BORTH D,JI R,CHEN T,et al.Large-scale visual sentimentontology and detectors using adjective noun pairs[C]//Proceedings of the 21st ACM International Conference on Multimedia.2013:223-232. [15]YOU Q,JIN H,LUO J.Visual sentiment analysis by attending on local image regions[C]//Thirty-First AAAI Conference on Artificial Intelligence.2017. [16]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [17]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 million image database for scene recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,40(6):1452-1464. [18]ASGHAR M Z,KUNDI F M,AHMAD S,et al.T-SAF:Twitter sentiment analysis framework using a hybrid classification scheme[J].Expert Systems,2018,35(1):e12233. [19]HAMOUDA A,ROHAIM M.Reviews classification usingsentiwordnet lexicon[C]//World Congress on Computer Science and Information Technology.sn,2011,23:104-105. [20]TANG D,WEI F,QIN B,et al.Coooolll:A deep learning system for twitter sentiment classification[C]//Proceedings of the 8th International Workshop on Semantic Evaluation(SemEval 2014).2014:208-212. [21]YANG Z,YANG D,DYER C,et al.Hierarchical attention net-works for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:1480-1489. [22]CHIONG R,FAN Z,HU Z,et al.A sentiment analysis-based machine learning approach for financial market prediction via news disclosures[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion.2018:278-279. [23]XU J,HUANG F,ZHANG X,et al.Visual-textual sentiment classification with bi-directional multi-level attention networks[J].Knowledge-Based Systems,2019,178:61-73. [24]HUANG F,ZHANG X,ZHAO Z,et al.Image-text sentimentanalysis via deep multimodal attentive fusion[J].Knowledge-Based Systems,2019,167:26-37. [25]XU J,LI Z,HUANG F,et al.Social image sentiment analysis by exploiting multimodal content and heterogeneous relations[J].IEEE Transactions on Industrial Informatics,2020,17(4):2974-2982. [26]YANG J,YU Y,NIU D,et al.ConFEDE:Contrastive Feature Decomposition for Multimodal Sentiment Analysis[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2023:7617-7630. [27]FAN F,FENG Y,ZHAO D.Multi-grained attention network for aspect-level sentiment classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3433-3442. [28]ZHANG L,ZHANG X,PAN J.Hierarchical cross-modality semantic correlation learning model for multimodal summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(10):11676-11684. [29]PORIA S,CAMBRIA E,HAZARIKA D,et al.Multi-level multiple attentions for contextual multimodal sentiment analysis[C]//2017 IEEE International Conference on Data Mining(ICDM).IEEE,2017:1033-1038. [30]ZADEH A,CHEN M,PORIA S,et al.Tensor fusion network for multimodal sentiment analysis[J].arXiv:1707.07250,2017. [31]AREVALO J,SOLORIO T,MONTES-Y-GÓMEZ M,et al.Gated multimodal units for information fusion[J].arXiv:1702.01992,2017. [32]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient low-rank multimodal fusion with modality-specific factors[J].arXiv:1806.00064,2018. [33]YOU Q,LUO J,JIN H,et al.Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia[C]//Proceedings of the Ninth ACM InternationalConfe-rence on Web Search and Data Mining.2016:13-22. [34]SIMONYAN K,ZISSERMANA.Very deep convolutional net-works for large-scale image recognition[J].arXiv:1409.1556,2014. [35]ZHOU B,LAPEDRIZA A,KHOSLA A,et al.Places:A 10 Million Image Database for Scene Recognition[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,40(6):1452-1464. [36]DEVLIN J,CHANG M,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//NAACL.2019:4171-4186. [37]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[C]//Proceedings of the International Conference on Learning Representations.2015. [38]SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,1997,45(11):2673-2681. [39]NIU T,ZHU S,PANG L,et al.Sentiment analysis on multi-view social data,[C]//Proceedings of the International Confe-renceon Multimedia Modeling.2016:15-27. [40]HUANG J,WANG Y.Emotional Analysis Method for ImageText Fusion Based on Image Semantic Translation [J].Compu-ter Engineering and Applications,2023,59(11):180-187. [41]HUANG H Z,MENG Z Q.Multimodal sentiment classification method based on bidirectional attention mechanism [J].Computer Engineering and Applications,2021,57(11):9. [42]YANG X,FENG S,WANG D,et al.Image-text multimodal emotion classification via multi-view attentional network[J].IEEE Transactions on Multimedia,2020,23:4014-4026. [43]ZHU T,LI L,YANG J,et al.Multimodal sentiment analysiswith image-text interaction network[J].IEEE Transactions on Multimedia,2022,25:3375-3385. [44]LI Z,XU B,ZHU C,et al.CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection[J].arXiv:2204.05515,2022. |
|