计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 210900107-6.doi: 10.11896/jsjkx.210900107

• 图像处理&多媒体技术 • 上一篇    下一篇

基于多模态表示学习的情感分析框架

胡新荣, 陈志恒, 刘军平, 彭涛, 叶鹏, 朱强   

  1. 纺织服装智能化湖北省工程研究中心 武汉 430200
    湖北省服装信息化工程技术研究中心 武汉 430200
    武汉纺织大学计算机与人工智能学院 武汉 430200
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 刘军平(jpliu@wtu.edu.cn)
  • 作者简介:(hxr@wtu.edu.cn)
  • 基金资助:
    国家自然科学基金(61103085);湖北省高等学校优秀中青年科技创新团队计划项目(T201807);湖北省高校知识产权推进工程项目(GXYS2018009);湖北省教育厅科学研究计划重点项目(D20191708)

Sentiment Analysis Framework Based on Multimodal Representation Learning

HU Xin-rong, CHEN Zhi-heng, LIU Jun-ping, PENG Tao, YE Peng, ZHU Qiang   

  1. Hubei Provincial Engineering Research Center for Intelligent Textile and Fashion,Wuhan Textile University,Wuhan 430200,China
    Engineering Research Center of Hubei Province for Clothing Information,Wuhan Textile University,Wuhan 430200,China
    School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan 430200,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:HU Xin-rong,born in 1973,Ph.D,professor,postgraduate supervisor,is a member of China Computer Federation.Her main research interests include ima-ge processing and pattern recognition,virtual reality and natural language processing.
    LIU Jun-ping,born in 1979,Ph.D,professor,postgraduate supervisor,is a member of China Computer Federation.His main research interests include industrial big data and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(61103085),Hubei Provincial Outstanding Young and Middle-aged Scientific and Technological Innovation Team Program(T201807),Hubei Province University Intellectual Property Promotion Project(GXYS2018009) and Major Project of Hubei Province Education Department Scientific Research Fund(D20191708).

摘要: 在多模态表示对整体损失的学习过程中,重构损失对模型的依赖性相对较小,导致隐含表示无法有效捕捉它们各自模态的细节。文中提出了一个基于多模态表示学习的多子空间情感分析框架。首先将每个模态投射到模态不变和模态特定两种不同的话语表示中,在模态不变表示中构建主共享子空间以及帮助该子空间减少模态差距的辅助共享子空间,在模态特定表示中构建私有子空间以捕获每个模态独有的特征,将所有子空间中的隐藏向量作为解码函数的输入并重构模态向量,以实现对重构损失的优化。然后,在融合阶段对每个模态表示执行基于Transformer的自注意力,使每个表示能从对整体情感取向具有协同作用的其他跨模态表示中获取潜在信息。最后,通过串联生成联合向量并利用全连接层生成任务预测。在两个公开数据集MOSI和MOSEI上的实验结果表明,该框架在大多数评价指标上都优于基线模型。

关键词: 多模态表示, 情感分析, Transformer, 自注意力, 跨模态

Abstract: In the process of learning the overall loss of multimodal representations,the dependence of reconstruction loss on the model is relatively less,resulting in hidden representations that cannot effectively capture the details of their respective modalities.This paper proposes a multi-subspace sentiment analysis framework.Firstly,the framework projects each modality to two distinct utterance representations:modality-invariant and modality-specific.We construct the main shared subspace and the auxi-liary shared subspace that helps the main subspace to reduce the modality gap in the modality-invariant representation.Also,construct the private subspaces in the modality-specific representation to capture the characteristic features of each modality.We take the hidden vectors in all subspaces as the input of the decoder function and reconstruct the modal vector to achieve optimization of reconstruction loss.Secondly,in the fusion procedure,we perform a multi-headed self-attention based on Transformer on these representations,so that each cross-modal representation can induce potential information from fellow representations that have a synergistic effect on the overall emotional orientation.Finally,we construct a joint-vector by using concatenation and use fully connected layers to generate task predictions.Experimental results on both MOSI and MOSEI datasets show that the proposed framework outperforms the baselines in most evaluation criteria.

Key words: Multimodal representation, Sentiment analysis, Transformer, Self-attention, Cross-modality

中图分类号: 

  • TP391.41
[1]ABDU S A,YOUSEF A H,SALEM A.Multimodal video sentiment analysis using deep learning approaches,a survey[J].Information Fusion,2021,76(2021):204-226.
[2]ZADEH A,ZELLERS R,PINCUS E,et al.Mul-timodal sentiment intensity analysis in videos:facial gestures and verbal messages[J].IEEE Intelligent Systems,2016,31(6):82-88.
[3]ZADEH A,LIANG P P,VANBRIESEN J,et al.Multimodallanguage analysis in the wild:CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2236-2246.
[4]RAJAGOPALAN S S,MORENCY L P,BALTRUSAITIS T,et al.Extending long short-term memory for multi-view structured learning [C] //European Conference on Computer Vision.2016:338-353.
[5]HAZARIKA D,ZIMMERMANN R,PORIA S.MISA:modality-invariant and -specific representations for multimodal sentiment analysis [C] //ACM International Conference on Multimedia.2020:1122-1131.
[6]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C] //Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[7]ZADEH A,CHEN M,PORIA S,et al.Tensor Fusion Network for Multimodal Sentiment Analysis[C] //Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.2017:1103-1114.
[8]LIU Z,SHEN Y,LAKSHMINARASIMHAN V B,et al.Effi-cient Low-rank Multimodal Fusion with Modality-Specific Factors[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.2018:2247-2256.
[9]MAI S,HU H,XING S,et al.Divide,Conquer and Combine:Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing [C] //Procee-dings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:481-492.
[10]PORIA S,CAMBRIA E,HAZARIKA D,et al.Context-Depen-dent Sentiment Analysis in User-Generated Videos [C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.2017:873-883.
[11]GHOSAL D,AKHTAR M S,CHAUHAN D,et al.Contextual Inter-modal Attention for Multi-modal Sentiment Analysis [C] //Proceedings of the 2018 Conference on Empirical Me-thods in Natural Language Processing.2018:3454-3466.
[12]CHAUHAN D S,AKHTAR M S,EKBAL A,et al.Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis [C] //Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.2019:5647-5657.
[13]PORIA S,CAMBRIA E,HAZARIKA D,et al.Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis [C] //IEEE International Conference on Data Mining.2017:1033-1038.
[14]ZHANG Y Z,SONG D W,ZHANG P.A Quantum-InspiredMultimodal Sentiment Analysis Framework [J].Theoretical Computer Science,2018,752(2018):21-40.
[15]LI Q C,GKOUMAS D,LIOMA C,et al.Quantum-inspired Multimodal Fusion for Video Sentiment Analysis [J].Information Fusion,2021,65(2021):58-71.
[16]ZHANG Y,SONG D,LI X,et al.A quantum-like multimodal network framework for modeling interaction dynamics in multiparty conversational sentiment analysis [J].Information Fusion,2020,62(2020):14-31.
[17]OLSON D.From utterance to text:The bias of language inspeech and writing [J].Harvard Educational Review,1977,47(3):257-281.
[18]GUO W Z,WANG J W,WANG S P.Deep Multimodal Representation Learning:A Survey [J].IEEE Access,2019,7(2019):63373-63394.
[19]ZELLINGER W,LUGHOFER E,SAMINGER-PLATZ S,et al.Central moment discrepancy (cmd) for domain-invariant representation learning[J].arXiv:1702.08811,2017.
[20]ZADEH A,LIANG P P,PORIS S,et al.Multi-attention Recurrent Network for Human Communication Comprehension [C] //AAAI Conference on Artificial Intelligence.2018:5642-5649.
[21]TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences [C] //Proceedings of the conference.Association for Computational Linguistics.Meeting,2019:6558-6569.
[22]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding [J] arXiv:1810.04805,2018.
[23]EKMAN P,ROSENBERG E L.What the face reveals:Basic and applied studies of spontaneous expression using the Facial Action Coding System(FACS) [M]//USA:Oxford University Press.
[24]DEGOTTEX G,KANE J,DRUGMAN T,et al.Cvarep-a colla-borative voice analysis repository for speech technologies [C] //2014 IEEE International Conference on Acoustics.Speech and Signal Processing,2014:960-964.
[25]DRUGMAN T,ALWAN A.Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics [J].arXiv:2001.00459,2019.
[26]DRUGMAN T,THOMAS M,GUDNASON J,et al.Detectionof Glottal Closure Instants from Speech Signals:a Quantitative Review [J].IEEE Transactions on Audio,Speech,and Language Processing,2011,20(3):994-1006.
[27]LIANG P P,LIU Z Y,ZADEH A,et al.Multimodal Language Analysis with Recurrent Multistage Fusion [J].arXiv:1808.03920,2018.
[28]WANG Y S,SHEN Y,LIU Z,et al.Words Can Shift:Dynamically Adjusting Word Representations Using Nonverbal Beha-viors [C] //Proceedings of the AAAI Conference on Artificial Intelligence.2019:7216-7223.
[29]PHAM H,LIANG P P,MANZINI T,et al.Found in Translation:Learning Robust Joint Representations by Cyclic Translations between Modalities [C] //Proceedings of the AAAI Conference on Artificial Intelligence.2019:6892-6899.
[30]TSAI Y H H,LIANG P P,ZADEH A.Learning factorized multimodal representations[J].arXiv:1806.06176,2018.
[31]SUN Z K,SARMA P K,SETHARES W A,et al.Learning Relationships between Text,Audio,and Video via Deep Canonical Correlation for Multimodal Language Analysis [C] //Procee-dings of the AAAI Conference on Artificial Intelligence.2020:8992-8999.
[1] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[2] 吴子仪, 李邵梅, 姜梦函, 张建朋.
基于自注意力模型的本体对齐方法
Ontology Alignment Method Based on Self-attention
计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[3] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[4] 方义秋, 张震坤, 葛君伟.
基于自注意力机制和迁移学习的跨领域推荐算法
Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning
计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011
[5] 陈坤峰, 潘志松, 王家宝, 施蕾, 张锦.
基于双目叠加仿生的微换衣行人再识别
Moderate Clothes-Changing Person Re-identification Based on Bionics of Binocular Summation
计算机科学, 2022, 49(8): 165-171. https://doi.org/10.11896/jsjkx.210600140
[6] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[7] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[8] 康雁, 徐玉龙, 寇勇奇, 谢思宇, 杨学昆, 李浩.
基于Transformer和LSTM的药物相互作用预测
Drug-Drug Interaction Prediction Based on Transformer and LSTM
计算机科学, 2022, 49(6A): 17-21. https://doi.org/10.11896/jsjkx.210400150
[9] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[10] 赵小虎, 叶圣, 李晓.
多算法融合的骨骼重建信息动作分类方法
Multi-algorithm Fusion Behavior Classification Method for Body Bone Information Reconstruction
计算机科学, 2022, 49(6): 269-275. https://doi.org/10.11896/jsjkx.210500070
[11] 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀.
基于BERT-GRU-ATT模型的中文实体关系分类
Chinese Entity Relations Classification Based on BERT-GRU-ATT
计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[12] 陆亮, 孔芳.
面向对话的融入知识的实体关系抽取
Dialogue-based Entity Relation Extraction with Knowledge
计算机科学, 2022, 49(5): 200-205. https://doi.org/10.11896/jsjkx.210300198
[13] 韩洁, 陈俊芬, 李艳, 湛泽聪.
基于自注意力的自监督深度聚类算法
Self-supervised Deep Clustering Algorithm Based on Self-attention
计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[14] 丁锋, 孙晓.
基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取
Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF
计算机科学, 2022, 49(2): 223-230. https://doi.org/10.11896/jsjkx.210100046
[15] 李川, 李维华, 王迎晖, 陈伟, 文俊颖.
基于transformer的门控双塔模型预测H1N1流感抗原性
Gated Two-tower Transformer-based Model for Predicting Antigenicity of Influenza H1N1
计算机科学, 2022, 49(11A): 211000209-6. https://doi.org/10.11896/jsjkx.211000209
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!