基于视觉语义与提示学习的多模态情感分析模型

doi:10.11896/jsjkx.230600047

Abstract

Abstract: With the development of deep learning technology,multimodal sentiment analysis has become one of the research highlights.However,most multimodal sentiment analysis models either extract eigenvector from different modalities and simply use weighted sum method,resulting in data that cannot be accurately mapped into a unified multimodal vector space,or rely on image description models to translate image into text,resulting in the extraction of too many visual semantics without sentimental information and information redundancy,and ultimately affecting the performance of the model.To address these issues,a multimodal sentiment analysis model VSPL based on visual semantics and prompt learning is proposed.This model translates images into precise,concise,and sentimentally informative visual semantic vocabulary to alleviate the problem of information redundancy.Based on prompt learning,the obtained visual semantic vocabulary is combined with pre-designed prompt templates for sentiment classification tasks to form new text,achieving modal fusion.It not only avoids the problem of inaccurate feature space mapping caused by weighted sum method,but also stimulates the potential performance of pre-trained language model through prompt learning methods.Comparative experiments are conducted on multimodal sentiment analysis tasks,and the proposed model VSPL outperforms advanced baseline models on three public datasets.In addition,ablation experiments,feature visualization,and sample analysis are conducted to verify the effectiveness of VSPL.

Key words: Multimodal, Visual semantics, Prompt learning, Sentiment analysis, Pre-trained language model

CLC Number:

TP391

MO Shuyuan, MENG Zuqiang. Multimodal Sentiment Analysis Model Based on Visual Semantics and Prompt Learning[J].Computer Science, 2024, 51(9): 250-257.

References

[1]YUE L,CHEN W,LI X,et al.A survey of sentiment analysis in social media [J].Knowledge and Information Systems,2019,60:617-663.
[2]GAO Y,ZHEN Y,LI H,et al.Filtering of brand-related micro-blogs using social-smooth multiview embedding [J].IEEE Tran-sactions on Multimedia,2016,18(10):2115-2126.
[3]PANG L,ZHU S,NGO C W.Deep multimodal learning for affective analysis and retrieval [J].IEEE Transactions on Multimedia,2015,17(11):2008-2020.
[4]CAMBRIA E,SCHULLER B,XIA Y,et al.New avenues inopinion mining and sentiment analysis [J].IEEE Intelligent Systems,2013,28(2):15-21.
[5]GUO W,ZHANG Y,CAI X,et al.LD-MAN:Layout-drivenmultimodal attention network for online news sentiment recognition [J].IEEE Transactions on Multimedia,2020,23:1785-1798.
[6]ZHU T,LI L,YANG J,et al.Multimodal sentiment analysiswith image-text interaction network [J].IEEE Transactions on Multi-media,2023,40(1):1-27.
[7]KHAN Z,FU Y.Exploiting BERT for multimodal target sentiment classification through input space translation [C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:3034-3042.
[8]MACHAJDIK J,HANBURY A.Affective image classificationusing features inspired by psychology and art theory [C]//Proceedings of the 18th ACM International Conference on Multi-media.2010:83-92.
[9]ZHAO S,GAO Y,JIANG X,et al.Exploring principles-of-art features for image emotion recognition [C]//Proceedings of the 22nd ACM International Conference on Multimedia.2014:47-56.
[10]BORTH D,JI R,CHEN T,et al.Large-scale visual sentimentontology and detectors using adjective noun pairs [C]//Procee-dings of the 21st ACM International Conference on Multi-media.2013:223-232.
[11]LI Z,FAN Y,LIU W,et al.Image sentiment prediction based on textual descriptions with adjective noun pairs [J].Multimedia Tools and Applications,2018,77:1115-1132.
[12]ZHAO S,YAO H,YANG Y,et al.Affective image retrieval via multi-graph learning [C]//Proceedings of the 22nd ACM International Conference on Multimedia.2014:1025-1028.
[13]YOU Q,JIN H,LUO J.Visual sentiment analysis by attending on local image regions [C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.2017.
[14]SHE D,YANG J,CHENG M M,et al.Wscnet:Weakly supervised coupled networks for visual sentiment classification and detection [J].IEEE Transactions on Multimedia,2019,22(5):1358-1371.
[15]ZHANG J,LIU X,CHEN M,et al.Image sentiment classification via multi-level sentiment region correlation analysis [J].Neurocomputing,2022,469:221-233.
[16]YANG J,LI J,WANG X,et al.Stimuli-aware visual emotion analysis [J].IEEE Transactions on Image Processing,2021,30:7432-7445.
[17]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding [J].arXiv:1810.04805,2018.
[18]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners [J].OpenAI blog,2019,1(8):9.
[19]JIANG Z,XU F F,ARAKI J,et al.How can we know what language models know? [J].Transactions of the Association for Computational Linguistics,2020,8:423-438.
[20]SHIN T,RAZEGHI Y,LOGAN IV R L,et al.Autoprompt:Eliciting knowledge from language models with automatically generated prompts [J].arXiv:2010.15980,2020.
[21]ZHANG R,GUO Z,ZHANG W,et al.Pointclip:Point cloud understanding by clip [C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2022:8552-8562.
[22]ZHOU K,YANG J,LOY C C,et al.Learning to prompt for vision-language models [J].International Journal of Computer Vision,2022,130(9):2337-2348.
[23]ZHOU K,YANG J,LOY C C,et al.Conditional prompt learning for vision-language models [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:16816-16825.
[24]WU H,SHI X.Adversarial soft prompt tuning for cross-domain sentiment analysis [C]//Proceedings of the 60th Annual Mee-ting of the Association for Computational Linguistics(Volume 1:Long Papers).2022:2438-2447.
[25]KAUR R,KAUTISH S.Multimodal sentiment analysis:A survey and comparison [J].International Journal of Service Science Management Engineering & Technology,2019,10(2):38-58.
[26]SOLEYMANI M,GARCIA D,JOU B,et al.A survey of multimodal sentiment analysis [J].Image and Vision Computing,2017,65:3-14.
[27]MORENCY L P,MIHALCEA R,DOSHI P.Towards multimodal sentiment analysis:Harvesting opinions from the web [C]//Proceedings of the 13th International Conference on Multimodal Interfaces.2011:169-176.
[28]ZADEH A A B,LIANG P P,PORIA S,et al.Multimodal language analysis in the wild:Cmu-mosei dataset and interpretable dynamic fusion graph [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Vo-lume 1:Long Papers).2018:2236-2246.
[29]STAPPEN L,SCHUMANN L,SERTOLLI B,et al.Muse-toolbox:The multimodal sentiment analysis continuous annotation fusion and discrete class transformation toolbox [C]//Procee-dings of the 2nd on Multimodal Sentiment Analysis Challenge.2021:75-82.
[30]LIANG C,XU J,ZHAO J,et al.Deep Learning-based Con-struction and Processing of Multimodal Corpus for IoT Devices in Mobile Edge Computing[J].Computational Intelligence and Neuroscience,2022,2022(1):2241310.
[31]NIU T,ZHU S,PANG L,et al.Sentiment analysis on multi-view social data [C]//22nd International Conference Multi-Media Modeling(MMM 2016),Miami,FL,USA,Part II 22.Springer International Publishing,2016:15-27.
[32]YU J,JIANG J.Adapting BERT for target-oriented multimodal sentiment classification [C]//IJCAI,2019.
[33]XU N,MAO W.Multisentinet:A deep semantic network formultimodal sentiment analysis [C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2399-2402.
[34]XU N,MAO W,CHEN G.A co-memory network for Multimodal sentiment analysis [C]//The 41st international ACM SIGIR Conference on Research & Development in Information Retrie-val.2018:929-932.
[35]CAI G,XIA B.Convolutional neural networks for multimedia sentiment analysis [C]//4th CCF Conference Natural Language Processing and Chinese Computing(NLPCC 2015).Nanchang,China:Springer International Publishing,2015:159-167.
[36]CHOCHLAKIS G,SRINIVASAN T,THOMASON J,et al.VAuLT:Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations [J].arXiv:2208.09021,2022.
[37]KIM W,SON B,KIM I.Vilt:Vision-and-language transformer without convolution or region supervision [C]//International Conference on Machine Learning.PMLR,2021:5583-5594.
[38]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach [J].arXiv:1907.11692,2019.
[39]LI J,SELVARAJU R,GOTMARE A,et al.Align before fuse:Vision and language representation learning with momentum distillation [J].Advances in Neural Information Processing Systems,2021,34:9694-9705.
[40]YE J,ZHOU J,TIAN J,et al.Sentiment-aware multimodal pre-training for multimodal sentiment analysis [J].Knowledge-Based Systems,2022,258:110021.
[41]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE [J].Journal of Machine Learning Research,2008,9(11):2579-2605.
[42]CHEFER H,GUR S,WOLF L.Transformer interpretability beyond attention visualization [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:782-791.

Related Articles 15

[1]	DAI Chaofan, DING Huahua. Domain-adaptive Entity Resolution Algorithm Based on Semi-supervised Learning [J]. Computer Science, 2024, 51(9): 214-222.
[2]	ZHANG Tianzhi, ZHOU Gang, LIU Hongbo, LIU Shuo, CHEN Jing. Text-Image Gated Fusion Mechanism for Multimodal Aspect-based Sentiment Analysis [J]. Computer Science, 2024, 51(9): 242-249.
[3]	LU Xulin, LI Zhihua. IoT Device Recognition Method Combining Multimodal IoT Device Fingerprint and Ensemble Learning [J]. Computer Science, 2024, 51(9): 371-382.
[4]	WANG Chao, TANG Chao, WANG Wenjian, ZHANG Jing. Infrared Human Action Recognition Method Based on Multimodal Attention Network [J]. Computer Science, 2024, 51(8): 232-241.
[5]	YAN Qiuyan, SUN Hao, SI Yuqing, YUAN Guan. Multimodality and Forgetting Mechanisms Model for Knowledge Tracing [J]. Computer Science, 2024, 51(7): 133-139.
[6]	YANG Binxia, LUO Xudong, SUN Kaili. Recent Progress on Machine Translation Based on Pre-trained Language Models [J]. Computer Science, 2024, 51(6A): 230700112-8.
[7]	WANG Yifan, ZHANG Xuefang. Modality Fusion Strategy Research Based on Multimodal Video Classification Task [J]. Computer Science, 2024, 51(6A): 230300212-5.
[8]	JIANG Haoda, ZHAO Chunlei, CHEN Han, WANG Chundong. Construction Method of Domain Sentiment Lexicon Based on Improved TF-IDF and BERT [J]. Computer Science, 2024, 51(6A): 230800011-9.
[9]	BAI Yu, WANG Xinzhe. Study on Hypernymy Recognition Based on Combined Training of Attention Mechanism and Prompt Learning [J]. Computer Science, 2024, 51(6A): 230700226-5.
[10]	YU Bihui, TAN Shuyue, WEI Jingxuan, SUN Linzhuang, BU Liping, ZHAO Yiman. Vision-enhanced Multimodal Named Entity Recognition Based on Contrastive Learning [J]. Computer Science, 2024, 51(6): 198-205.
[11]	ZHANG Haoyan, DUAN Liguo, WANG Qinchen, GAO Hao. Long Text Multi-entity Sentiment Analysis Based on Multi-task Joint Training [J]. Computer Science, 2024, 51(6): 309-316.
[12]	XU Yiran, ZHOU Yu. Prompt Learning Based Parameter-efficient Code Generation [J]. Computer Science, 2024, 51(6): 61-67.
[13]	LIU Jun, RUAN Tong, ZHANG Huanhuan. Prompt Learning-based Generative Approach Towards Medical Dialogue Understanding [J]. Computer Science, 2024, 51(5): 258-266.
[14]	ZHANG Zebao, YU Hannan, WANG Yong, PAN Haiwei. Combining Syntactic Enhancement with Graph Attention Networks for Aspect-based Sentiment Classification [J]. Computer Science, 2024, 51(5): 200-207.
[15]	DUAN Yuxiao, HU Yanli, GUO Hao, TAN Zhen, XIAO Weidong. Study on Improved Fake Information Detection Method Based on Cross-modal CorrelationAmbiguity Learning [J]. Computer Science, 2024, 51(4): 307-313.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multimodal Sentiment Analysis Model Based on Visual Semantics and Prompt Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0