基于跨模态超图优化学习的多模态情感分析

doi:10.11896/jsjkx.240600127

Abstract

Abstract: Sentiment expressions are multimodal,and more accurate emotions can be derived through multiple modalities such as verbal,audio,and visual.Studying the interactions among modalities can effectively improve the accuracy of multimodal sentiment analysis.Previous studies have used graph models to capture rich interactions across modalities and time to obtain highly expressive and fine-grained sequence representations,but there is a greater need to tap into the expression of higher-order information in multimodal data,which can only be achieved on a one-to-one basis in graph neural networks,which restricts the utilisation of the interactions of higher-order information.This paper explores the application of hypergraph neural networks in multimodal sentiment analysis,where the hypergraph structure can connect two or more nodes to make full use of intra-and inter-modal higher order information and to achieve the interaction of higher-order information between data.Furthermore,this paper proposes a hypergraph adaptive module to optimise the structure of the original hypergraph,where the hypergraph adaptive network is designed to detect potential hidden information by means of point-edge cross-attention,hyperedge sampling and event node sampling to discover potential implicit connections and prune redundant hyperedges as well as irrelevant event nodes to update and optimise the hypergraph structure,the updated hypergraph structure represents the higher-order correlations of the data more accurately and completely than the initial structure.Extensive experiments on two publicly available datasets show that the proposed framework improves 1% to 6% in several performance metrics over other state-of-the-art algorithms on the CMU-MOSI and CMU-MOSEI datasets.

Key words: Multimodal sentiment analysis, Hypergraph neural networks, Hypergraph optimisation, Adaptive networks, Node-edge information fusion

CLC Number:

TP391

JIANG Kun, ZHAO Zhengpeng, PU Yuanyuan, HUANG Jian, GU Jinjing, XU Dan. Cross-modal Hypergraph Optimisation Learning for Multimodal Sentiment Analysis[J].Computer Science, 2025, 52(7): 210-217.

References

[1]BUSSO C,BULUT M,LEE C C,et al.IEMOCAP:Interactive emotional dyadic motion capture database[J].Language Resources and Evaluation,2008,42:335-359.
[2]LIU J M,ZHANG P X,LIU Y,et al.Summary of multi-modal sentiment analysis technology[J].Journal of Frontiers of Computer Science & Technology,2021,15(7):1165.
[3]SNOEK C G M,WORRING M,SMEULDERS A W M.Early versus late fusion in semantic video analysis[C]//Proceedings of the 13th Annual ACM International Conference on Multimedia.2005:399-402.
[4]TSAI Y H H,LIANG P P,ZADEH A,et al.Learning factorized multimodal representations[J].arXiv:1806.06176,2018.
[5]HAZARIKA D,ZIMMERMANN R,PORIA S.Misa:Modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1122-1131.
[6]WU J,MAI S,HU H.Graph capsule aggregation for unaligned multimodal sequences[C]//Proceedings of the 2021 International Conference on Multimodal Interaction.2021:521-529.
[7]HUANG J,PU Y,ZHOU D,et al.Dynamic hypergraph convo-lutional network for multimodal sentiment analysis[J].Neurocomputing,2024,565:126992.
[8]SOLEYMANI M,GARCIA D,JOU B,et al.A survey of multi-modal sentiment analysis[J].Image and Vision Computing,2017,65:3-14.
[9]D'MELLO S K,KORY J.A review and meta-analysis of multimodal affect detection systems[J].ACM computing surveys(CSUR),2015,47(3):1-36.
[10]GKOUMAS D,LI Q,LIOMA C,et al.What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J].Information Fusion,2021,66:184-197.
[11]RAHMAN W,HASAN M K,LEE S,et al.Integrating multimodal information in large pretrained transformers[C]//Proceedings of the conference.Association for Computational Linguistics.Meeting.NIH Public Access,2020.
[12]HAN W,CHEN H,GELBUKH A,et al.Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction.2021:6-15.
[13]YU W,XU H,YUAN Z,et al.Learning modality-specific repre-sentations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.2021:10790-10797.
[14]YANG J,WANG Y,YI R,et al.MTAG:Modal-temporal attention graph for unaligned human multimodal language sequences[J].arXiv:2010.11985,2020.
[15]GAO Y,ZHANG Z,LIN H,et al.Hypergraph learning:Methods and practices[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(5):2548-2566.
[16]ZHOU D,HUANG J,SCHÖLKOPF B.Learning with hypergraphs:Clustering,classification,and embedding[C]//Procee-dings of the 20th Annual Conference on Neural Information Processing Systems.2006.
[17]BAI S,ZHANG F,TORR P H S.Hypergraph convolution and hypergraph attention[J].Pattern Recognition,2021,110:107637.
[18]ZHANG R,ZOU Y,MA J.Hyper-SAGNN:a self-attentionbased graph neural network for hypergraphs[J].arXiv:1911.02613,2019.
[19]FENG Y,YOU H,ZHANG Z,et al.Hypergraph neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3558-3565.
[20]YADATI N,NIMISHAKAVI M,YADAV P,et al.HyperGCN:A new method fo of training graph convolutional networks on hypergraphs[C]//Proceedings of the 33rd International Confe-rence onNeural Information Processing Systems,2019,32.
[21]HUANG J,YANG J.Unignn:a unified framework for graphand hypergraph neural networks[J].arXiv:2105.00956,2021.
[22]CHIEN E,PAN C,PENG J,et al.You are allset:A multisetfunction framework for hypergraph neural networks[J].arXiv:2106.13264,2021.
[23]ZHANG Z,LIN H,ZHAO X,et al.Inductive multi-hypergraph learning and its application on view-based 3D object classification[J].IEEE Transactions on Image Processing,2018,27(12):5957-5968.
[24]WANG M,LIU X,WU X.Visual classification by l_1 $-hypergraph modeling[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(9):2564-2574.
[25]GAO Y,WANG M,ZHA Z J,et al.Visual-textual joint rele-vance learning for tag-based social image search[J].IEEE Transactions on Image Processing,2012,22(1):363-376.
[26]HE J,HU H.MF-BERT:Multimodal fusion in pre-trainedBERT for sentiment analysis[J].IEEE Signal Processing Letters,2021,29:454-458.
[27]SHI H,PU Y,ZHAO Z,et al.Co-space Representation Interaction Network for multimodal sentiment analysis[J].Knowledge-Based Systems,2024,283:111149.
[28]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[29]DEGOTTEX G,KANE J,DRUGMAN T,et al.COVAREP－A collaborative voice analysis repository for speech technologies[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2014:960-964.
[30]BALTRUŠAITIS T,ROBINSON P,MORENCY L P.Open-face:an open source facial behavior analysis toolkit[C]//2016 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2016:1-10.
[31]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[32]KRISHNA K,MURTY M N.Genetic K-means algorithm[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics),1999,29(3):433-439.
[33]MADDISON C J,MNIH A,TEH Y W.The concrete distribution:A continuous relaxation of discrete random variables[J].arXiv:1611.00712,2016.
[34]ZADEH A,ZELLERS R,PINCUS E,et al.Mosi:multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J].arXiv:1606.06259,2016.
[35]ZADEH A A B,LIANG P P,PORIA S,et al.Multimodal language analysis in the wild:Cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2018:2236-2246.
[36]ZADEH A,LIANG P P,PORIA S,et al.Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[37]TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences[J].arXiv:1906.00295,2019.
[38]CHEN M,LI X.Swafn:Sentimental words aware fusion net-work for multimodal sentiment analysis[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:1067-1077.
[39]WU J,MAI S,HU H.Graph capsule aggregation for unaligned multimodal sequences[C]//Proceedings of the 2021 Interna-tional Conference on Multimodal Interaction.2021:521-529.
[40]MAI S,XING S,HE J,et al.Multimodal graph for unalignedmultimodal sequence analysis via graph convolution and graph pooling[J].ACM Transactions on Multimedia Computing,Communications and Applications,2023,19(2):1-24.
[41]LI Y,WANG Y,CUI Z.Decoupled multimodal distilling foremotion recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:6631-6640.
[42]PHAM H,LIANG P P,MANZINI T,et al.Found in translation:Learning robust joint representations by cyclic translations between modalities[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6892-6899.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Cross-modal Hypergraph Optimisation Learning for Multimodal Sentiment Analysis

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 3

Metrics

Comments

Recommended 0

[1]	WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[2]	CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua. Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion [J]. Computer Science, 2023, 50(3): 298-306.
[3]	LIN Yun, HUANG Zhen-hang, GAO Fan. Diffusion Variable Tap-length Maximum Correntropy Criterion Algorithm [J]. Computer Science, 2021, 48(5): 263-269.