计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241100071-8.doi: 10.11896/jsjkx.241100071
李代祎, 孔德龙, 吴怀广, 张佳慧, 韩宇璨
LI Daiyi, KONG Delong, WU Huaiguang, ZHANG Jiahui, HAN Yucan
摘要: 多模态命名实体识别(Multimodal Name Entity Recognition,MNER)和多模态关系抽取(Multimodal Relation Extraction,MRE)是多模态知识图谱构建中的两个关键技术。然而,现有的MNER和MRE方法在对高维数据进行特征提取和融合时还存在一定的局限性。为了解决这些问题,提出了一种基于量子Transformer的多模态实体关系联合抽取方法。首先,设计一种针对文本数据处理的参数化量子电路,该线路利用量子力学中的叠加和纠缠特性,结合Transformer模型提取文本深层特征;其次,通过设计的金字塔视觉特征提取模型获取包含从高到底的金字塔状的层次特征,充分考虑到了图像的多尺度信息。最后,通过设计的分层视觉前缀网络将分层多尺度图像特征与文本特征对齐并融合,获取鲁棒性高的文本表示。本研究为多模态实体关系抽取提供了新的研究思路,在3个公开基准数据集上的实验结果表明,提出的基于量子Transformer多模态实体关系抽取方法是有效且稳定的。
中图分类号:
| [1]LI J,SUN A,HAN J,et al.A survey on deep learning for namedentity recognition[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(1):50-70. [2]LI D,YAN L,YANG J,et al.Dependency syntax guidedbert-bilstm-gam-crf for chinese ner[J].Expert Systems with Applications,2022,196:116682. [3]MOON S,NEVES L,CARVALHO V.Multimodal Named Entity Recognition for Short Social Media Posts[C]//Proceedings of NAACL-HLT.2018:852-860. [4]ZHENG C,FENG J,FU Z,et al.Multimodal relation extraction with efficient graph alignment[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:5298-5306. [5]SUN L,WANG J,ZHANG K,et al.RpBERT:a text-image relation propagation-based BERT model for multimodal NER[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:13860-13868. [6]XU Z,WANG C,QIU M,et al.Making pre-trained language models end-to-end few-shot learners with contrastive prompt tuning[C]//Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining.2023:438-446. [7]SUN S,GAO H.Meta-AdaM:An meta-learned adaptive optimizer with momentum for few-shot learning[J].Advances in Neural Information Processing Systems,2023,36:65441-65455. [8]WANG Y,SUN Y,FU Y,et al.Spectrum-BERT:pre-training of deep bidirectional transformers for spectral classification of Chinese liquors[J].IEEE Transactions on Instrumentation and Measurement,2024,73:1-13. [9]HAN B,HE L,KE J,et al.Weighted parallel decoupled feature pyramid network for object detection[J].Neurocomputing,2024,593:127809. [10]TIWARI P,ZHANG L,QU Z,et al.Quantum fuzzy neural network for multimodal sentiment and sarcasm detection[J].Information Fusion,2024,103:102085. [11]PHUKAN A,HAQ KHAN A,EKBAL A.QuMIN:quantum multi-modal data fusion for humor detection[J].Multimedia Tools and Applications,2025,84(18):18855-18872. [12]XU B,HUANG S,SHA C,et al.MAF:a general matching and alignment framework for multimodal named entity recognition[C]//Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining.2022:1215-1223. [13]CHEN X,ZHANG N,XIE X,et al.Knowprompt:Knowledge-aware prompt-tuning with synergistic optimization for relation extraction[C]//Proceedings of the ACM Web Conference 2022.2022:2778-2788. [14]ZHANG Q,FU J,LIU X,et al.Adaptive co-attention network for named entity recognition in tweets[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018. [15]NIE Y,TIAN Y,WAN X,et al.Named Entity Recognition for Social Media Texts with Semantic Augmentation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:1383-1391. [16]YU J,JIANG J,YANG L,et al.Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:3342-3352. [17]LI G,DUAN N,FANG Y,et al.Unicoder-vl:A universal en-coder for vision and language by cross-modal pre-training[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11336-11344. [18]LI L H,YATSKAR M,YIN D,et al.Visualbert:A simple and performant baseline for vision and language[J].arXiv:1908.03557,2019. [19]SU W,ZHU X,CAO Y,et al.VL-BERT:Pre-training of Gene-ric Visual-Linguistic Representations[C]//International Confe-rence on Learning Representations.2019. [20]CHEN Y C,LI L,YU L,et al.Uniter:Universal image-text representation learning[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:104-120. [21]TAN H,BANSAL M.LXMERT:Learning Cross-Modality Encoder Representations from Transformers[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.2019:5100-5111. [22]LU J,BATRA D.Vilbert:Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[J].Advances in Neural Information Processing Systems,2019,32. [23]ZHANG D,WEI S,LI S,et al.Multi-modal graph fusion fornamed entity recognition with targeted visual guidance[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:14347-14355. [24]ZHANG T M,ZHANG S,LIU X,et al.Multimodal Data fusion for Few-shot Named Entity Recognition Method[J].Journal of Software,2024,35(3):1107-1124. [25]ZHENG C,FENG J,FU Z,et al.Multimodal relation extraction with efficient graph alignment[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:5298-5306. [26]WU J K,LI W J.Remote Supervised Relationship Extraction Method Based on PCNN Similar Sentence Bag Attention [J].Journal of Chinese Information Science,2024,38(5):65-75. [27]SOARES L B,FITZGERALD N,LING J,et al.Matching the Blanks:Distributional Similarity for Relation Learning[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:2895-2905. [28]WU Z,ZHENG C,CAI Y,et al.Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1038-1046. [29]CHEN X,ZHANG N,LI L,et al.Good Visual Guidance Make A Better Extractor:Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction[C]//Findings of the Association for Computational Linguistics:NAACL 2022.2022:1607-1618. [30]CHEN X,ZHANG N,LI L,et al.Hybrid transformer withmulti-level fusion for multimodal knowledge graph completion[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.2022:904-915. [31]ZHENG C,FENG J,CAI Y,et al.Rethinking Multimodal Entity and Relation Extraction from a Translation Point of View[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:6810-6824. |
|
||