计算机科学 ›› 2023, Vol. 50 ›› Issue (12): 246-254.doi: 10.11896/jsjkx.221100038

• 人工智能 • 上一篇    下一篇

基于可信细粒度对齐的多模态方面级情感分析

范东旭1, 过弋1,2,3   

  1. 1 华东理工大学信息科学与技术学院 上海 200237
    2 大数据流通与交易技术国家工程实验室-商业智能与可视化技术研究中心 上海 200436
    3 上海大数据与互联网受众工程技术研究中心 上海 200072
  • 收稿日期:2022-11-06 修回日期:2023-03-09 出版日期:2023-12-15 发布日期:2023-12-07
  • 通讯作者: 过弋(guoyi@ecust.edu.cn)
  • 作者简介:(956701698@qq.com)
  • 基金资助:
    上海市科学技术委员会科技计划项目(22DZ204903,22511104800)

Aspect-based Multimodal Sentiment Analysis Based on Trusted Fine-grained Alignment

FAN Dongxu1, GUO Yi1,2,3   

  1. 1 School of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
    2 Business Intelligence and Visualization Research Center,National Engineering Laboratory for Big Data Distribution and Exchange Technologies,Shanghai 200436,China
    3 Shanghai Engineering Research Center of Big Data & Internet Audience,Shanghai 200072,China
  • Received:2022-11-06 Revised:2023-03-09 Online:2023-12-15 Published:2023-12-07
  • About author:FAN Dongxu,born in 2000,postgra-duate.Her main research interests include sentiment analysis and data mi-ning.
    GUO Yi,born in 1975,Ph.D,professor.His main research interests include text mining,knowledge discovery and business intelligence.
  • Supported by:
    Science and Technology Plan Project of Shanghai Municipal Commission of Science and Technology(22DZ204903,22511104800).

摘要: 基于方面的多模态情感分析任务(Multimodal Aspect-Based Sentiment Analysis,MABSA),旨在根据文本和图像信息识别出文本中某特定方面词的情感极性。然而,目前主流的模型并没有充分利用不同模态之间的细粒度语义对齐,而是采用整个图像的视觉特征与文本中的每一个单词进行信息融合,忽略了图像视觉区域和方面词之间的强对应关系,这将导致图片中的噪声信息也被融合进最终的多模态表征中,因此提出了一个可信细粒度对齐模型TFGA(MABSA Based on Trusted Fine-grained Alignment)。具体来说,使用FasterRCNN捕获到图像中包含的视觉目标后,分别计算其与方面词之间的相关性,为了避免视觉区域与方面词的局部语义相似性在图像文本的全局角度不一致的情况,使用置信度对局部语义相似性进行加权约束,过滤掉不可靠的匹配对,使得模型重点关注图片中与方面词相关性最高且最可信的视觉局域信息,降低图片中多余噪声信息的影响;接着提出细粒度特征融合机制,将聚焦到的视觉信息与文本信息进行充分融合,以得到最终的情感分类结果。在Twitter数据集上进行实验,结果表明,文本与视觉的细粒度对齐对方面级情感分析是有利的。

关键词: 方面级情感分析, 多模态, 细粒度对齐, 情感分析, 自然语言处理

Abstract: Aspect based multimodal sentiment analysis task(MABSA) aims to identify the sentiment polarity of a specific aspect word in a text based on text and image information.However,the current mainstream model does not make full use of the fine-grained semantic alignment between different modes.Instead,it uses the image features of the entire image to fuse information with each word in the text,ignoring the strong correspondence between the local image information and aspect words,which will lead to the noise information in the image being integrated into the final multimodal representation,Therefore,this paper proposes a trusted fine-grained alignment model TFGA(MABSA based on trusted fine-grained alignment).Specifically,we use FasterRCNN to capture the visual objects contained in the image,and then calculate the correlation between them and aspect words respectively.To avoid the inconsistency of the local semantic similarity between the visual object and aspect words in the global perspective of the image-text,confidence is used to weight the local semantic similarity and filter out the unreliable matching pairs,then the model can focuse on the most reliable and highest visual local information related to aspect words in the image to reduce the impact of redundant noise information in the image.Then a fine-grained feature fusion mechanism is proposed to fully fuse the focused local image information with the text information to obtain the final sentiment classification result.Experiments on Twitter datasets show that fine-grained alignment of text and vision is beneficial to aspect based sentiment analysis.

Key words: Aspect-based sentiment analysis, Multimodal, Fine-grained alignment, Sentiment analysis, Natural language processing

中图分类号: 

  • TP391.1
[1]YU J,JIANG J,XIA R.Entity-sensitive attention and fusionnetwork for entity-level multimodal sentiment classification[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2019,28:429-439.
[2]REN S,HE K,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].Advances in Neural Information Processing Systems,2017,39(6):1137-1149.
[3]LIU Q,LI N,TIAN Y A.Annotation of Logical Structure in Re-flowable Document for Machine Learning[J].Journal of Chinese Information Processing,2019,33(9):50-59,78.
[4]REXIDANMU T,WUSHOUR S,YIERXIATI T.Uyghur Text Sentiment Analysis by Combining Lexical Knowledge with Machine Learning Methods[J].Journal of Chinese Information Processing,2017,31(1):177-183.
[5]WANG Y,HUANG M,ZHU X,et al.Attention-based LSTM for aspect-level sentiment classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:606-615.
[6]TAY Y,TUAN L A,HUI S C.Learning to attend via word-aspect associative fusion for aspect-based sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[7]NGUYEN H T,LE NGUYEN M.Effective attention networks for aspect-level sentiment classification[C]//2018 10th International Conference on Knowledge and Systems Engineering(KSE).IEEE,2018:25-30.
[8]LIU J,ZHANG Y.Attention modeling for targeted sentiment[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics:Vo-lume 2,Short Papers.2017:572-577.
[9]CHENG J,ZHAO S,ZHANG J,et al.Aspect-level sentiment classification with heat(hierarchical attention) network[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:97-106.
[10]MA D H,LI S J,ZHA X D,et al.Interactive attention networks for aspect-level sentiment classification[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.Melbourne,Australia:International Joint Conferences on Artificial Intelligence,2017:4068-4074.
[11]FAN F,FENG Y,ZHAO D.Multi-grained attention network for aspect-level sentiment classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:3433-3442.
[12]XUE W,LI T.Aspect based sentiment analysis with gated con-volutional networks[J].arXiv:1805.07043,2018.
[13]LI X,BING L,LAM W,et al.Transformation networks for target-oriented sentiment classification[J].arXiv:1805.01086,2018.
[14]ZHANG M,ZHANG Y,VO D T.Gated neural networks for targeted sentiment analysis[C]//Thirtieth AAAI Conference on Artificial Intelligence.2016.
[15]DAI J,YAN H,SUN T,et al.Does syntax matter? a strong baseline for aspect-based sentiment analysis with roberta[J].arXiv:2104.04986,2021.
[16]SUN C,HUANG L,QIU X.Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence[J].arXiv:1903.09588,2019.
[17]WU Z,ONG D C.Context-guided bert for targeted aspect-based sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:14094-14102.
[18]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neutral Information Processing.Red Hook,NY:Curran Associates Inc,2017:5998-6008.
[19]XU N,MAO W,G C.Multi-interactive memory network for aspect based multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto,California USA:AAAI Press,2019:371-378.
[20]LIU L L,YANG Y,WANG J.ABAFN:Aspect-Based Sentiment Analysis Model for Multimodal[J].Computer Engineering and Applications,2022,58(10):193-199.
[21]YU J,JIANG J.Adapting BERT for target-oriented multimodal sentiment classification[C]//IJCAI.2019.
[22]YU Y,ZHANG D,LI S.Unified Multi-modal Pre-training for Few-shot Sentiment Analysis with Prompt-based Learning[C]//Proceedings of the 30th ACM International Conference on Multimedia.2022:189-198.
[23]ZHAO F,WU Z,LONG S,et al.Learning from Adjective-Noun Pairs:A Knowledge-enhanced Framework for Target-Oriented Multimodal Sentiment Classification[C]//Proceedings of the 29th International Conference on Computational Linguistics.2022:6784-6794.
[24]KHAN Z,FU Y.Exploiting BERT for multimodal target sentiment classification through input space translation[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:3034-3042.
[25]YU J,CHEN K,XIA R.Hierarchical Interactive MultimodalTransformer for Aspect-Based Multimodal Sentiment Analysis[J/OL].IEEE Transactions on Affective Computing,2022.https://newsletter.x-mol.com/paper/1534586027781197824.
[26]JU X C,ZHANG D,XIAO R,et al.Joint multi-modal aspect-sentiment analysis with auxiliary cross-modal relation detection[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.Online and Punta Cana,Domi-nican Republic:Association for Computational Linguistics,2021:4395-4405.
[27]TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences[J].arXiv:1906.00295,2019.
[28]CHEN P,SUN Z,BING L,et al.Recurrent attention network on memory for aspect sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:452-461.
[29]KIRITCHENKO S,ZHU X,CHERRY C,et al.Detecting aspects and sentiment in customer reviews[C]//8th International Workshop on Semantic Evaluation(SemEval).2014.
[30]YU J,WANG J,XIA R,et al.Targeted multimodal sentiment classification based on coarse-to-fine grained image-target ma-tching[C]//Proceedings of the Thirty-FirstInternational Joint Conference on Artificial Intelligence(IJCAI 2022).2022:4482-4488.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!