Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 241100071-8.doi: 10.11896/jsjkx.241100071

• Artificial Intelligence • Previous Articles     Next Articles

Multimodal Entity-Relation Joint Extraction Method Based on Quantum Transformer

LI Daiyi, KONG Delong, WU Huaiguang, ZHANG Jiahui, HAN Yucan   

  1. College of Computer Science and Technology,Zhengzhou University of Light Industry,Zhengzhou 450000,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    National Natural Science Foundation of China(61672470),Major Science and Technology Research Projects in Henan Province(221100210400),Major Public Welfare Projects in Henan Province,China(201300210200) and Doctoral Research Fund of Zhengzhou University of Light Industry(2024BSJJ014).

Abstract: Multimodal Name Entity Recognition(MNER) and Multimodal Relation Extraction(MRE) are two key technologies in the construction of multimodal knowledge graphs.However,the existing MNER and MRE methods still have certain limitations in feature extraction and fusion of high-dimensional data.To address these issues,this paper proposes a multimodal entity relation joint extraction method based on quantum Transformer.Firstly,a parameterized quantum circuit for text data processing is design,which utilizes the superposition and entanglement characteristics in quantum mechanics,and combines with the Transformer model to extract deep features from text;Secondly,the pyramid visual feature extraction model is designed to obtain hierarchical features from high to low,which fully considers the multi-scale information of the image.Finally,by designing a hierarchical visual prefix network,the hierarchical multi-scale image features are aligned and fused with the text features to obtain a highly robust text representation.This study provides a new research approach for multimodal entity relation joint extraction.Experimental results on three public benchmark datasets show that the multimodal entity relation extraction method based on quantum Transformer proposed in this paper is effective and stable.

Key words: MNER, MRE, Pyramid visual feature, Transformer, Feature fusion

CLC Number: 

  • TP391
[1]LI J,SUN A,HAN J,et al.A survey on deep learning for namedentity recognition[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):50-70.
[2]LI D,YAN L,YANG J,et al.Dependency syntax guidedbert-bilstm-gam-crf for chinese ner[J].Expert Systems with Applications,2022,196:116682.
[3]MOON S,NEVES L,CARVALHO V.Multimodal Named Entity Recognition for Short Social Media Posts[C]//Proceedings of NAACL-HLT.2018:852-860.
[4]ZHENG C,FENG J,FU Z,et al.Multimodal relation extraction with efficient graph alignment[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:5298-5306.
[5]SUN L,WANG J,ZHANG K,et al.RpBERT:a text-image relation propagation-based BERT model for multimodal NER[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:13860-13868.
[6]XU Z,WANG C,QIU M,et al.Making pre-trained language models end-to-end few-shot learners with contrastive prompt tuning[C]//Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining.2023:438-446.
[7]SUN S,GAO H.Meta-AdaM:An meta-learned adaptive optimizer with momentum for few-shot learning[J].Advances in Neural Information Processing Systems,2023,36:65441-65455.
[8]WANG Y,SUN Y,FU Y,et al.Spectrum-BERT:pre-training of deep bidirectional transformers for spectral classification of Chinese liquors[J].IEEE Transactions on Instrumentation and Measurement,2024,73:1-13.
[9]HAN B,HE L,KE J,et al.Weighted parallel decoupled feature pyramid network for object detection[J].Neurocomputing,2024,593:127809.
[10]TIWARI P,ZHANG L,QU Z,et al.Quantum fuzzy neural network for multimodal sentiment and sarcasm detection[J].Information Fusion,2024,103:102085.
[11]PHUKAN A,HAQ KHAN A,EKBAL A.QuMIN:quantum multi-modal data fusion for humor detection[J].Multimedia Tools and Applications,2025,84(18):18855-18872.
[12]XU B,HUANG S,SHA C,et al.MAF:a general matching and alignment framework for multimodal named entity recognition[C]//Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining.2022:1215-1223.
[13]CHEN X,ZHANG N,XIE X,et al.Knowprompt:Knowledge-aware prompt-tuning with synergistic optimization for relation extraction[C]//Proceedings of the ACM Web Conference 2022.2022:2778-2788.
[14]ZHANG Q,FU J,LIU X,et al.Adaptive co-attention network for named entity recognition in tweets[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[15]NIE Y,TIAN Y,WAN X,et al.Named Entity Recognition for Social Media Texts with Semantic Augmentation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:1383-1391.
[16]YU J,JIANG J,YANG L,et al.Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:3342-3352.
[17]LI G,DUAN N,FANG Y,et al.Unicoder-vl:A universal en-coder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11336-11344.
[18]LI L H,YATSKAR M,YIN D,et al.Visualbert:A simple and performant baseline for vision and language[J].arXiv:1908.03557,2019.
[19]SU W,ZHU X,CAO Y,et al.VL-BERT:Pre-training of Gene-ric Visual-Linguistic Representations[C]//International Confe-rence on Learning Representations.2019.
[20]CHEN Y C,LI L,YU L,et al.Uniter:Universal image-text representation learning[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:104-120.
[21]TAN H,BANSAL M.LXMERT:Learning Cross-Modality Encoder Representations from Transformers[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.2019:5100-5111.
[22]LU J,BATRA D.Vilbert:Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[J].Advances in Neural Information Processing Systems,2019,32.
[23]ZHANG D,WEI S,LI S,et al.Multi-modal graph fusion fornamed entity recognition with targeted visual guidance[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:14347-14355.
[24]ZHANG T M,ZHANG S,LIU X,et al.Multimodal Data fusion for Few-shot Named Entity Recognition Method[J].Journal of Software,2024,35(3):1107-1124.
[25]ZHENG C,FENG J,FU Z,et al.Multimodal relation extraction with efficient graph alignment[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:5298-5306.
[26]WU J K,LI W J.Remote Supervised Relationship Extraction Method Based on PCNN Similar Sentence Bag Attention [J].Journal of Chinese Information Science,2024,38(5):65-75.
[27]SOARES L B,FITZGERALD N,LING J,et al.Matching the Blanks:Distributional Similarity for Relation Learning[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:2895-2905.
[28]WU Z,ZHENG C,CAI Y,et al.Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1038-1046.
[29]CHEN X,ZHANG N,LI L,et al.Good Visual Guidance Make A Better Extractor:Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction[C]//Findings of the Association for Computational Linguistics:NAACL 2022.2022:1607-1618.
[30]CHEN X,ZHANG N,LI L,et al.Hybrid transformer withmulti-level fusion for multimodal knowledge graph completion[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.2022:904-915.
[31]ZHENG C,FENG J,CAI Y,et al.Rethinking Multimodal Entity and Relation Extraction from a Translation Point of View[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:6810-6824.
[1] HU Hailong, XU Xiangwei, LI Yaqian. Drug Combination Recommendation Model Based on Dynamic Disease Modeling [J]. Computer Science, 2025, 52(9): 96-105.
[2] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[3] DENG Jiayan, TIAN Shirui, LIU Xiangli, OUYANG Hongwei, JIAO Yunjia, DUAN Mingxing. Trajectory Prediction Method Based on Multi-stage Pedestrian Feature Mining [J]. Computer Science, 2025, 52(9): 241-248.
[4] GUO Husheng, ZHANG Xufei, SUN Yujie, WANG Wenjian. Continuously Evolution Streaming Graph Neural Network [J]. Computer Science, 2025, 52(8): 118-126.
[5] DING Zhengze, NIE Rencan, LI Jintao, SU Huaping, XU Hang. MTFuse:An Infrared and Visible Image Fusion Network Based on Mamba and Transformer [J]. Computer Science, 2025, 52(8): 188-194.
[6] LIU Huayong, XU Minghui. Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss [J]. Computer Science, 2025, 52(8): 204-213.
[7] LIU Chengzhuang, ZHAI Sulan, LIU Haiqing, WANG Kunpeng. Weakly-aligned RGBT Salient Object Detection Based on Multi-modal Feature Alignment [J]. Computer Science, 2025, 52(7): 142-150.
[8] HUANG Xingyu, WANG Lihui, TANG Kun, CHENG Xinyu, ZHANG Jian, YE Chen. EFormer:Efficient Transformer for Medical Image Registration Based on Frequency Division and Board Attention [J]. Computer Science, 2025, 52(7): 151-160.
[9] XU Yongwei, REN Haopan, WANG Pengfei. Object Detection Algorithm Based on YOLOv8 Enhancement and Its Application Norms [J]. Computer Science, 2025, 52(7): 189-200.
[10] FANG Chunying, HE Yuankun, WU Anxin. Emotion Recognition Based on Brain Network Connectivity and EEG Microstates [J]. Computer Science, 2025, 52(7): 201-209.
[11] WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[12] LIU Yajun, JI Qingge. Pedestrian Trajectory Prediction Based on Motion Patterns and Time-Frequency Domain Fusion [J]. Computer Science, 2025, 52(7): 92-102.
[13] LUO Xuyang, TAN Zhiyi. Knowledge-aware Graph Refinement Network for Recommendation [J]. Computer Science, 2025, 52(7): 103-109.
[14] PIAO Mingjie, ZHANG Dongdong, LU Hu, LI Rupeng, GE Xiaoli. Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer [J]. Computer Science, 2025, 52(6A): 240500054-10.
[15] LI Weirong, YIN Jibin. FB-TimesNet:An Improved Multimodal Emotion Recognition Method Based on TimesNet [J]. Computer Science, 2025, 52(6A): 240900046-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!