Computer Science ›› 2025, Vol. 52 ›› Issue (7): 210-217.doi: 10.11896/jsjkx.240600127

• Artificial Intelligence • Previous Articles     Next Articles

Cross-modal Hypergraph Optimisation Learning for Multimodal Sentiment Analysis

JIANG Kun1, ZHAO Zhengpeng1, PU Yuanyuan1,2, HUANG Jian1, GU Jinjing1, XU Dan1   

  1. 1 School of Information Science and Engineering, Yunnan University, Kunming 650500, China
    2 Internet of Things Technology and Application Key Laboratory of Universities in Yunnan, Kunming 650500, China
  • Received:2024-06-21 Revised:2024-09-18 Published:2025-07-17
  • About author:JIANG Kun,born in 1998,master.His main research interests include multimodal sentiment analysis and so on.
    ZHAO Zhengpeng,born in 1973,associa-te professor,master's supervisor.His main research interests include signal and information processing,and computer systems and applications.
  • Supported by:
    National Natural Science Foundation of China(61271361,61761046,62162068,52102382,62362070),Key Project of Applied Basic Research Programe of Yunnan Provincial Department of Science and Technology(202001BB050043,202401AS070149),Yunnan Provincial Science and Technology Major Project(202302AF080006) and Graduate Student Innovation Project(KC-23236053).

Abstract: Sentiment expressions are multimodal,and more accurate emotions can be derived through multiple modalities such as verbal,audio,and visual.Studying the interactions among modalities can effectively improve the accuracy of multimodal sentiment analysis.Previous studies have used graph models to capture rich interactions across modalities and time to obtain highly expressive and fine-grained sequence representations,but there is a greater need to tap into the expression of higher-order information in multimodal data,which can only be achieved on a one-to-one basis in graph neural networks,which restricts the utilisation of the interactions of higher-order information.This paper explores the application of hypergraph neural networks in multimodal sentiment analysis,where the hypergraph structure can connect two or more nodes to make full use of intra-and inter-modal higher order information and to achieve the interaction of higher-order information between data.Furthermore,this paper proposes a hypergraph adaptive module to optimise the structure of the original hypergraph,where the hypergraph adaptive network is designed to detect potential hidden information by means of point-edge cross-attention,hyperedge sampling and event node sampling to discover potential implicit connections and prune redundant hyperedges as well as irrelevant event nodes to update and optimise the hypergraph structure,the updated hypergraph structure represents the higher-order correlations of the data more accurately and completely than the initial structure.Extensive experiments on two publicly available datasets show that the proposed framework improves 1% to 6% in several performance metrics over other state-of-the-art algorithms on the CMU-MOSI and CMU-MOSEI datasets.

Key words: Multimodal sentiment analysis, Hypergraph neural networks, Hypergraph optimisation, Adaptive networks, Node-edge information fusion

CLC Number: 

  • TP391
[1]BUSSO C,BULUT M,LEE C C,et al.IEMOCAP:Interactive emotional dyadic motion capture database[J].Language Resources and Evaluation,2008,42:335-359.
[2]LIU J M,ZHANG P X,LIU Y,et al.Summary of multi-modal sentiment analysis technology[J].Journal of Frontiers of Computer Science & Technology,2021,15(7):1165.
[3]SNOEK C G M,WORRING M,SMEULDERS A W M.Early versus late fusion in semantic video analysis[C]//Proceedings of the 13th Annual ACM International Conference on Multimedia.2005:399-402.
[4]TSAI Y H H,LIANG P P,ZADEH A,et al.Learning factorized multimodal representations[J].arXiv:1806.06176,2018.
[5]HAZARIKA D,ZIMMERMANN R,PORIA S.Misa:Modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1122-1131.
[6]WU J,MAI S,HU H.Graph capsule aggregation for unaligned multimodal sequences[C]//Proceedings of the 2021 International Conference on Multimodal Interaction.2021:521-529.
[7]HUANG J,PU Y,ZHOU D,et al.Dynamic hypergraph convo-lutional network for multimodal sentiment analysis[J].Neurocomputing,2024,565:126992.
[8]SOLEYMANI M,GARCIA D,JOU B,et al.A survey of multi-modal sentiment analysis[J].Image and Vision Computing,2017,65:3-14.
[9]D'MELLO S K,KORY J.A review and meta-analysis of multimodal affect detection systems[J].ACM computing surveys(CSUR),2015,47(3):1-36.
[10]GKOUMAS D,LI Q,LIOMA C,et al.What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis[J].Information Fusion,2021,66:184-197.
[11]RAHMAN W,HASAN M K,LEE S,et al.Integrating multimodal information in large pretrained transformers[C]//Proceedings of the conference.Association for Computational Linguistics.Meeting.NIH Public Access,2020.
[12]HAN W,CHEN H,GELBUKH A,et al.Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction.2021:6-15.
[13]YU W,XU H,YUAN Z,et al.Learning modality-specific repre-sentations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Confe-rence on Artificial Intelligence.2021:10790-10797.
[14]YANG J,WANG Y,YI R,et al.MTAG:Modal-temporal attention graph for unaligned human multimodal language sequences[J].arXiv:2010.11985,2020.
[15]GAO Y,ZHANG Z,LIN H,et al.Hypergraph learning:Methods and practices[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,44(5):2548-2566.
[16]ZHOU D,HUANG J,SCHÖLKOPF B.Learning with hypergraphs:Clustering,classification,and embedding[C]//Procee-dings of the 20th Annual Conference on Neural Information Processing Systems.2006.
[17]BAI S,ZHANG F,TORR P H S.Hypergraph convolution and hypergraph attention[J].Pattern Recognition,2021,110:107637.
[18]ZHANG R,ZOU Y,MA J.Hyper-SAGNN:a self-attentionbased graph neural network for hypergraphs[J].arXiv:1911.02613,2019.
[19]FENG Y,YOU H,ZHANG Z,et al.Hypergraph neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3558-3565.
[20]YADATI N,NIMISHAKAVI M,YADAV P,et al.HyperGCN:A new method fo of training graph convolutional networks on hypergraphs[C]//Proceedings of the 33rd International Confe-rence onNeural Information Processing Systems,2019,32.
[21]HUANG J,YANG J.Unignn:a unified framework for graphand hypergraph neural networks[J].arXiv:2105.00956,2021.
[22]CHIEN E,PAN C,PENG J,et al.You are allset:A multisetfunction framework for hypergraph neural networks[J].arXiv:2106.13264,2021.
[23]ZHANG Z,LIN H,ZHAO X,et al.Inductive multi-hypergraph learning and its application on view-based 3D object classification[J].IEEE Transactions on Image Processing,2018,27(12):5957-5968.
[24]WANG M,LIU X,WU X.Visual classification by l_1 $-hypergraph modeling[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(9):2564-2574.
[25]GAO Y,WANG M,ZHA Z J,et al.Visual-textual joint rele-vance learning for tag-based social image search[J].IEEE Transactions on Image Processing,2012,22(1):363-376.
[26]HE J,HU H.MF-BERT:Multimodal fusion in pre-trainedBERT for sentiment analysis[J].IEEE Signal Processing Letters,2021,29:454-458.
[27]SHI H,PU Y,ZHAO Z,et al.Co-space Representation Interaction Network for multimodal sentiment analysis[J].Knowledge-Based Systems,2024,283:111149.
[28]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[29]DEGOTTEX G,KANE J,DRUGMAN T,et al.COVAREP-A collaborative voice analysis repository for speech technologies[C]//2014 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2014:960-964.
[30]BALTRUŠAITIS T,ROBINSON P,MORENCY L P.Open-face:an open source facial behavior analysis toolkit[C]//2016 IEEE Winter Conference on Applications of Computer Vision(WACV).IEEE,2016:1-10.
[31]CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[J].arXiv:1412.3555,2014.
[32]KRISHNA K,MURTY M N.Genetic K-means algorithm[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics),1999,29(3):433-439.
[33]MADDISON C J,MNIH A,TEH Y W.The concrete distribution:A continuous relaxation of discrete random variables[J].arXiv:1611.00712,2016.
[34]ZADEH A,ZELLERS R,PINCUS E,et al.Mosi:multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J].arXiv:1606.06259,2016.
[35]ZADEH A A B,LIANG P P,PORIA S,et al.Multimodal language analysis in the wild:Cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2018:2236-2246.
[36]ZADEH A,LIANG P P,PORIA S,et al.Multi-attention recurrent network for human communication comprehension[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[37]TSAI Y H H,BAI S,LIANG P P,et al.Multimodal transformer for unaligned multimodal language sequences[J].arXiv:1906.00295,2019.
[38]CHEN M,LI X.Swafn:Sentimental words aware fusion net-work for multimodal sentiment analysis[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:1067-1077.
[39]WU J,MAI S,HU H.Graph capsule aggregation for unaligned multimodal sequences[C]//Proceedings of the 2021 Interna-tional Conference on Multimodal Interaction.2021:521-529.
[40]MAI S,XING S,HE J,et al.Multimodal graph for unalignedmultimodal sequence analysis via graph convolution and graph pooling[J].ACM Transactions on Multimedia Computing,Communications and Applications,2023,19(2):1-24.
[41]LI Y,WANG Y,CUI Z.Decoupled multimodal distilling foremotion recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:6631-6640.
[42]PHAM H,LIANG P P,MANZINI T,et al.Found in translation:Learning robust joint representations by cyclic translations between modalities[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6892-6899.
[1] WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[2] CHEN Zhen, PU Yuanyuan, ZHAO Zhengpeng, XU Dan, QIAN Wenhua. Multimodal Sentiment Analysis Based on Adaptive Gated Information Fusion [J]. Computer Science, 2023, 50(3): 298-306.
[3] LIN Yun, HUANG Zhen-hang, GAO Fan. Diffusion Variable Tap-length Maximum Correntropy Criterion Algorithm [J]. Computer Science, 2021, 48(5): 263-269.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!