Computer Science ›› 2022, Vol. 49 ›› Issue (11A): 210900107-6.doi: 10.11896/jsjkx.210900107

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Sentiment Analysis Framework Based on Multimodal Representation Learning

HU Xin-rong, CHEN Zhi-heng, LIU Jun-ping, PENG Tao, YE Peng, ZHU Qiang   

  1. Hubei Provincial Engineering Research Center for Intelligent Textile and Fashion,Wuhan Textile University,Wuhan 430200,China
    Engineering Research Center of Hubei Province for Clothing Information,Wuhan Textile University,Wuhan 430200,China
    School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan 430200,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:HU Xin-rong,born in 1973,Ph.D,professor,postgraduate supervisor,is a member of China Computer Federation.Her main research interests include ima-ge processing and pattern recognition,virtual reality and natural language processing.
    LIU Jun-ping,born in 1979,Ph.D,professor,postgraduate supervisor,is a member of China Computer Federation.His main research interests include industrial big data and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61103085),Hubei Provincial Outstanding Young and Middle-aged Scientific and Technological Innovation Team Program(T201807),Hubei Province University Intellectual Property Promotion Project(GXYS2018009) and Major Project of Hubei Province Education Department Scientific Research Fund(D20191708).

Abstract: In the process of learning the overall loss of multimodal representations,the dependence of reconstruction loss on the model is relatively less,resulting in hidden representations that cannot effectively capture the details of their respective modalities.This paper proposes a multi-subspace sentiment analysis framework.Firstly,the framework projects each modality to two distinct utterance representations:modality-invariant and modality-specific.We construct the main shared subspace and the auxi-liary shared subspace that helps the main subspace to reduce the modality gap in the modality-invariant representation.Also,construct the private subspaces in the modality-specific representation to capture the characteristic features of each modality.We take the hidden vectors in all subspaces as the input of the decoder function and reconstruct the modal vector to achieve optimization of reconstruction loss.Secondly,in the fusion procedure,we perform a multi-headed self-attention based on Transformer on these representations,so that each cross-modal representation can induce potential information from fellow representations that have a synergistic effect on the overall emotional orientation.Finally,we construct a joint-vector by using concatenation and use fully connected layers to generate task predictions.Experimental results on both MOSI and MOSEI datasets show that the proposed framework outperforms the baselines in most evaluation criteria.

Key words: Multimodal representation, Sentiment analysis, Transformer, Self-attention, Cross-modality

CLC Number: 

  • TP391.41
[1]ABDU S A,YOUSEF A H,SALEM A.Multimodal video sentiment analysis using deep learning approaches,a survey[J].Information Fusion,2021,76(2021):204-226.
[2]ZADEH A,ZELLERS R,PINCUS E,et al.Mul-timodal sentiment intensity analysis in videos:facial gestures and verbal messages[J].IEEE Intelligent Systems,2016,31(6):82-88.
[3]ZADEH A,LIANG P P,VANBRIESEN J,et al.Multimodallanguage analysis in the wild:CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2236-2246.
[4]RAJAGOPALAN S S,MORENCY L P,BALTRUSAITIS T,et al.Extending long short-term memory for multi-view structured learning [C] //European Conference on Computer Vision.2016:338-353.
[5]HAZARIKA D,ZIMMERMANN R,PORIA S.MISA:modality-invariant and -specific representations for multimodal sentiment analysis [C] //ACM International Conference on Multimedia.2020:1122-1131.
[6]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C] //Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[7]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C] //Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:1103-1114.
[8]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient Low-rank Multimodal Fusion with Modality-Specific Factors[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2247-2256.
[9]MAI S,HU H,XING S,et al.Divide,Conquer and Combine:Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing [C] //Procee-dings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:481-492.
[10]PORIA S,CAMBRIA E,HAZARIKA D,et al.Context-Depen-dent Sentiment Analysis in User-Generated Videos [C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:873-883.
[11]GHOSAL D,AKHTAR M S,CHAUHAN D,et al.Contextual Inter-modal Attention for Multi-modal Sentiment Analysis [C] //Proceedings of the 2018 Conference on Empirical Me-thods in Natural Language Processing.2018:3454-3466.
[12]CHAUHAN D S,AKHTAR M S,EKBAL A,et al.Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis [C] //Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.2019:5647-5657.
[13]PORIA S,CAMBRIA E,HAZARIKA D,et al.Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis [C] //IEEE International Conference on Data Mining.2017:1033-1038.
[14]ZHANG Y Z,SONG D W,ZHANG P.A Quantum-InspiredMultimodal Sentiment Analysis Framework [J].Theoretical Computer Science,2018,752(2018):21-40.
[15]LI Q C,GKOUMAS D,LIOMA C,et al.Quantum-inspired Multimodal Fusion for Video Sentiment Analysis [J].Information Fusion,2021,65(2021):58-71.
[16]ZHANG Y,SONG D,LI X,et al.A quantum-like multimodal network framework for modeling interaction dynamics in multiparty conversational sentiment analysis [J].Information Fusion,2020,62(2020):14-31.
[17]OLSON D.From utterance to text:The bias of language inspeech and writing [J].Harvard Educational Review,1977,47(3):257-281.
[18]GUO W Z,WANG J W,WANG S P.Deep Multimodal Representation Learning:A Survey [J].IEEE Access,2019,7(2019):63373-63394.
[19]ZELLINGER W,LUGHOFER E,SAMINGER-PLATZ S,et al.Central moment discrepancy (cmd) for domain-invariant representation learning[J].arXiv:1702.08811,2017.
[20]ZADEH A,LIANG P P,PORIS S,et al.Multi-attention Recurrent Network for Human Communication Comprehension [C] //AAAI Conference on Artificial Intelligence.2018:5642-5649.
[21]TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences [C] //Proceedings of the conference.Association for Computational Linguistics.Meeting,2019:6558-6569.
[22]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding [J] arXiv:1810.04805,2018.
[23]EKMAN P,ROSENBERG E L.What the face reveals:Basic and applied studies of spontaneous expression using the Facial Action Coding System(FACS) [M]//USA:Oxford University Press.
[24]DEGOTTEX G,KANE J,DRUGMAN T,et al.Cvarep-a colla-borative voice analysis repository for speech technologies [C] //2014 IEEE International Conference on Acoustics.Speech and Signal Processing,2014:960-964.
[25]DRUGMAN T,ALWAN A.Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics [J].arXiv:2001.00459,2019.
[26]DRUGMAN T,THOMAS M,GUDNASON J,et al.Detectionof Glottal Closure Instants from Speech Signals:a Quantitative Review [J].IEEE Transactions on Audio,Speech,and Language Processing,2011,20(3):994-1006.
[27]LIANG P P,LIU Z Y,ZADEH A,et al.Multimodal Language Analysis with Recurrent Multistage Fusion [J].arXiv:1808.03920,2018.
[28]WANG Y S,SHEN Y,LIU Z,et al.Words Can Shift:Dynamically Adjusting Word Representations Using Nonverbal Beha-viors [C] //Proceedings of the AAAI Conference on Artificial Intelligence.2019:7216-7223.
[29]PHAM H,LIANG P P,MANZINI T,et al.Found in Translation:Learning Robust Joint Representations by Cyclic Translations between Modalities [C] //Proceedings of the AAAI Conference on Artificial Intelligence.2019:6892-6899.
[30]TSAI Y H H,LIANG P P,ZADEH A.Learning factorized multimodal representations[J].arXiv:1806.06176,2018.
[31]SUN Z K,SARMA P K,SETHARES W A,et al.Learning Relationships between Text,Audio,and Video via Deep Canonical Correlation for Multimodal Language Analysis [C] //Procee-dings of the AAAI Conference on Artificial Intelligence.2020:8992-8999.
[1] WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[2] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[3] FANG Yi-qiu, ZHANG Zhen-kun, GE Jun-wei. Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning [J]. Computer Science, 2022, 49(8): 70-77.
[4] CHEN Kun-feng, PAN Zhi-song, WANG Jia-bao, SHI Lei, ZHANG Jin. Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation [J]. Computer Science, 2022, 49(8): 165-171.
[5] JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186.
[6] ZHANG Jia-hao, LIU Feng, QI Jia-yin. Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer [J]. Computer Science, 2022, 49(6A): 370-377.
[7] KANG Yan, XU Yu-long, KOU Yong-qi, XIE Si-yu, YANG Xue-kun, LI Hao. Drug-Drug Interaction Prediction Based on Transformer and LSTM [J]. Computer Science, 2022, 49(6A): 17-21.
[8] ZHAO Xiao-hu, YE Sheng, LI Xiao. Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction [J]. Computer Science, 2022, 49(6): 269-275.
[9] ZHAO Dan-dan, HUANG De-gen, MENG Jia-na, DONG Yu, ZHANG Pan. Chinese Entity Relations Classification Based on BERT-GRU-ATT [J]. Computer Science, 2022, 49(6): 319-325.
[10] LU Liang, KONG Fang. Dialogue-based Entity Relation Extraction with Knowledge [J]. Computer Science, 2022, 49(5): 200-205.
[11] HAN Jie, CHEN Jun-fen, LI Yan, ZHAN Ze-cong. Self-supervised Deep Clustering Algorithm Based on Self-attention [J]. Computer Science, 2022, 49(3): 134-143.
[12] DING Feng, SUN Xiao. Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF [J]. Computer Science, 2022, 49(2): 223-230.
[13] LI Chuan, LI Wei-hua, WANG Ying-hui, CHEN Wei, WEN Jun-ying. Gated Two-tower Transformer-based Model for Predicting Antigenicity of Influenza H1N1 [J]. Computer Science, 2022, 49(11A): 211000209-6.
[14] WANG Shuai, ZHANG Shu-jun, YE Kang, GUO Qi. Continuous Sign Language Recognition Method Based on Improved Transformer [J]. Computer Science, 2022, 49(11A): 211200198-6.
[15] QIAN Wen-xiang, YI Yang. Survey of Deep Learning Networks for Video Recognition [J]. Computer Science, 2022, 49(11A): 211200025-10.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!