Computer Science ›› 2023, Vol. 50 ›› Issue (3): 298-306.doi: 10.11896/jsjkx.220100156

• Artificial Intelligence • Previous Articles     Next Articles

Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion

CHEN Zhen1, PU Yuanyuan1,2, ZHAO Zhengpeng1, XU Dan1, QIAN Wenhua1   

  1. 1 College of Information Science and Engineering,Yunnan University,Kunming 650504,China
    2 University Key Laboratory of Internet of Things Technology and Application,Yunnan Province,Kunming 650504,China
  • Received:2022-01-16 Revised:2022-09-20 Online:2023-03-15 Published:2023-03-15
  • About author:CHEN Zhen,born in 1994,postgra-duate.His main research interests include multimodal sentiment analysis and image processing.
    ZHAO Zhengpeng,born in 1973,master,associate professor.His main research interests include digital image processing and speech signal proces-sing.
  • Supported by:
    National Natural Science Foundation of China(62162068,61271361,61761046,62061049),Yunnan Science and Technology Department Project(2018FB100) and Key Program of the Applied Basic Research Programs of Yunnan(202001BB050043,2019FA044).

Abstract: The goal of multimodal sentiment analysis is to achieve reliable and robust sentiment analysis by utilizing complementary information provided by multiple modalities.Recently,extracting deep semantic features by neural networks has achieved remarkable results in multimodal sentiment analysis.But the fusion of features at different levels of multimodal information is also an important part in determining the effectiveness of sentiment analysis.Thus,a multimodal sentiment analysis model based on adaptive gating information fusion(AGIF) is proposed.Firstly,the different levels of visual and color features extracted by swin transformer and ResNet are organically fused through a gated information fusion network based on their contribution to sentiment analysis.Secondly,the sentiment of an image is often expressed by multiple subtle local regions due to the abstraction and complexity of sentiment,and these sentiment discriminating regions can be located accurately by iterative attention based on past information.The latest ERNIE pre-training model is utilized to solve the problem of Word2Vec and GloVe's inability to handle the word polysemy.Finally,the auto-fusion network is utilized to “dynamically” fuse the features of each modality,solving the pro-blem of information redundancy caused by the deterministic operation(concatenation or TFN) to construct multimodal joint representation.Extensive experiments on three publicly available real datasets demonstrate the effectiveness of the proposed model.

Key words: Multimodal sentiment analysis, Gated information fusion networks, Iterative attention, ERNIE, Auto-fusion network

CLC Number: 

  • TP391
[1]KAGAN V,STEVENS A,SUBRAHMANIAN V S.UsingTwitter Sentiment to Forecast the 2013 Pakistani Election and the 2014 Indian Election [J].IEEE Intelligent Systems,2015,30(1):2-5.
[2]BOLLEN J,MAO H N,ZENG S J.Twitter mood predicts the stock market [J].Journal of Computational Science,2011,2(1):1-8.
[3]LI X D,XIE H R,CHEN L,et al.News impact on stock price return via sentiment analysis [J].Knowledge-Based Systems,2014,69(15):14-23.
[4]HUR M,KANG P,CHO S.Box-office forecasting based on sentiments of movie reviews and Independent subspace method [J].Information Sciences,2016:608-624.
[5]XU N,MAO W J.MultiSentiNet:A Deep Semantic Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2399-2402.
[6]HUANG F R,ZHANG X M,ZHAO Z H,et al.Image-text sentiment analysis via deep multimodal attentive fusion [J].Know-ledge-Based Systems,2019:167:26-37.
[7]LIN M H,MENG Z Q.Multimodal Sentiment Analysis Based on Attention Neural Network [J].Computer Science,2020,47(S2):508-514,548.
[8]XU N,MAO W J,CHEN G D.A co-memory network for multimodal sentiment analysis[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:929-932.
[9]XU J,HUANG F R,ZHANG X M,et al.Visual-textual sentiment classification with bi-directional multi-level attention networks [J].Knowledge Based Systems,2019,178(AUG.15):61-73.
[10]YANG X C,FENG S,WAND D L,et al.Image-Text Multimodal Emotion Classification via Multi-View Attentional Network [J].IEEE Transactions on Multimedia,2021,23(1):4014-4026.
[11]ANDERSON P,HE X D,BUEHLER C,et al.Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086.
[12]JIANG H,MISRA I,ROHRBACH M,et al.In defense of grid features for visual question answering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10264-10273.
[13]WEN Z,PENG Y.Multi-level knowledge injecting for visualcommonsense reasoning [J].IEEE Transactions on Circuits and Systems for Video Technology,2020,31(3):1042-1054.
[14]ENGIN D,SCHNITZLER F,DUONG N Q K,et al.On the hidden treasure of dialog in video question answering[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:2064-2073.
[15]MIKOLOV T,CORRADO G,KAI C,et al.Efficient Estimation of Word Representations in Vector Space [J].Advances in Neural Information Processing Systems,2013,26(1):3111-3119.
[16]PENNINGTON J,SOCHER R,MANNING C D.Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing.EMNLP,2014:1532-1543.
[17]LIU Z,LIN Y T,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.IEEE,2021:10012-10022.
[18]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:770-778.
[19]ZHAND X D,GAO X B,LU W,et al.A Gated Peripheral-Fo-veal Convolutional Neural Network for Unified Image Aesthetic Prediction [J].IEEE Transactions on Multimedia,2019,21(11):2815-2826.
[20]MNIH V,HEESS N,GRAVES A,et al.Recurrent models ofvisual attention[C]//Proceedings of the Neural Information Processing Systems.2014:2204-2212.
[21]SUN Y,WANG S,LI Y,et al.ERNIE 2.0:A Continual Pre-Training Framework for Language Understanding[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:8968-8975.
[22]HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Networks [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8) 2011-2023.
[23]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling [J].arXiv::1412.3555,2014.
[24]ZHAO L,SHANG M,GAO F,et al.Representation learning of image composition for aesthetic prediction [J].Computer Vision and Image Understanding,2020,199(9):103024.
[25]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.Copenhagen Denmark,2017:1103-1114.
[26]TENG N,ZHU S,LEI P,et al.Sentiment analysis on multi-view social data[C]//International Conference on Multimedia Mode-ling.2016:15-27.
[27]MACHAJDIK J,HANBURY A.Affective image classificationusing features inspired by psychology and art theory[C]//Proceedings of the 18th ACM International Conference on Multimedia.New York,NY,USA,2010:83-92.
[28]SIMONYAN K,ZISSERMAN A.Very Deep Convolutional Net-works for Large-Scale Image Recognition [J].arXiv:1409.1556,2014.
[29]SONG K K,YAO T,LING Q,et al.Boosting Image Sentiment Analysis with Visual Attention [J].Neurocomputing,2018,312(27):218-228.
[30]CAI G Y,CHU Y Y.Visual SentimentAnalysis Based on Multi-level Features Fusion of Dual Attention [J].Computer Engineering,2021,47(9):227-234.
[31]HU A,FLAXMAN S.Multimodal Sentiment Analysis To Explore the Structure of Emotions[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Disco-very & Data Mining.2018:350-358.
[32]GUO K X,ZHANG Y X.Visual-textual sentiment analysis method with multi-level spatial attention [J].Journal of Computer Applications,2021,41(10):2835-2841.
[1] LI Shuai, XU Bin, HAN Yike, LIAO Tongxin. SS-GCN:Aspect-based Sentiment Analysis Model with Affective Enhancement and Syntactic Enhancement [J]. Computer Science, 2023, 50(3): 3-11.
[2] WANG Jingbin, LAI Xiaolian, LIN Xinyu, YANG Xinyi. Context-aware Temporal Knowledge Graph Completion Based on Relation Constraints [J]. Computer Science, 2023, 50(3): 23-33.
[3] CHEN Fuqiang, KOU Jiamin, SU Limin, LI Ke. Multi-information Optimized Entity Alignment Model Based on Graph Neural Network [J]. Computer Science, 2023, 50(3): 34-41.
[4] DENG Liang, QI Panhu, LIU Zhenlong, LI Jingxin, TANG Jiqiang. BGPNRE:A BERT-based Global Pointer Network for Named Entity-Relation Joint Extraction Method [J]. Computer Science, 2023, 50(3): 42-48.
[5] LI Zhifei, ZHAO Yue, ZHANG Yan. Survey of Knowledge Graph Reasoning Based on Representation Learning [J]. Computer Science, 2023, 50(3): 94-113.
[6] RAO Dan, SHI Hongwei. Study on Air Traffic Flow Recognition and Anomaly Detection Based on Deep Clustering [J]. Computer Science, 2023, 50(3): 121-128.
[7] DUAN Shunran, YIN Meijuan, LIU Fenlin, JIAO Longlong, YU Lanlan. Nodes’ Ranking Model Based on Influence Prediction [J]. Computer Science, 2023, 50(3): 155-163.
[8] DONG Yongfeng, HUANG Gang, XUE Wanruo, LI Linhao. Graph Attention Deep Knowledge Tracing Model Integrated with IRT [J]. Computer Science, 2023, 50(3): 173-180.
[9] MEI Pengcheng, YANG Jibin, ZHANG Qiang, HUANG Xiang. Sound Event Joint Estimation Method Based on Three-dimension Convolution [J]. Computer Science, 2023, 50(3): 191-198.
[10] BAI Xuefei, MA Yanan, WANG Wenjian. Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion [J]. Computer Science, 2023, 50(3): 199-207.
[11] LIU Hang, PU Yuanyuan, LYU Dahua, ZHAO Zhengpeng, XU Dan, QIAN Wenhua. Polarized Self-attention Constrains Color Overflow in Automatic Coloring of Image [J]. Computer Science, 2023, 50(3): 208-215.
[12] LIU Songyue, WANG Huan. Leaf Classification and Ranking Method Based on Multi-granularity Feature Fusion [J]. Computer Science, 2023, 50(3): 216-222.
[13] ZHANG Weiliang, CHEN Xiuhong. SSD Object Detection Algorithm with Cross-layer Fusion and Receptive Field Amplification [J]. Computer Science, 2023, 50(3): 231-237.
[14] CHEN Liang, WANG Lu, LI Shengchun, LIU Changhong. Study on Visual Dashboard Generation Technology Based on Deep Learning [J]. Computer Science, 2023, 50(3): 238-245.
[15] ZHANG Yi, WU Qin. Crowd Counting Network Based on Feature Enhancement Loss and Foreground Attention [J]. Computer Science, 2023, 50(3): 246-253.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!