基于DS理论的多模态信息抽取方法

doi:10.11896/jsjkx.240200081

计算机科学 ›› 2025, Vol. 52 ›› Issue (10): 208-216.doi: 10.11896/jsjkx.240200081

基于DS理论的多模态信息抽取方法

王剑¹, 王京岭², 张革¹, 王章全¹, 郭世远², 庾桂铭¹

1 郑州大学计算机与人工智能学院郑州 450000
2 郑州大学网络空间安全学院郑州 450000

出版日期:2025-10-15 发布日期:2025-10-14
通讯作者: 王剑(iejwang@zzu.edu.cn)
基金资助:
国家自然科学基金(61972133);河南省重点研发专项(241111212700); 信息网络安全公安部重点实验室开放课题(C23600-04)

Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory

WANG Jian¹, WANG Jingling², ZHANG Ge¹, WANG Zhangquan¹, GUO Shiyuan², YU Guiming¹

1 School of Computer Science and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China
2 School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450000,China

Online:2025-10-15 Published:2025-10-14
About author:WANG Jian,born in 1978,Ph.D,professor,is a member of CCF(No.28300S).Her main research interests include social computing,cybersecurity,and artificial intelligence.
Supported by:
National Natural Science Foundation of China(61972133),Key Research and Development Program of Henan Province(241111212700) and Open Subjects of Key Laboratory of the Ministry of Public Security for Information Network Security(C23600-04).

摘要/Abstract

摘要： 在过去多模态信息抽取(Multimodal Information Extraction,MIE)任务中,研究人员通常使用数据层融合的方式训练用于MIE的神经网络模型。但是,由于不同模态间的异构性,这种融合方式容易导致特征冗余、特征不兼容和缺乏解释性等问题,进而影响模型训练的效果。针对此问题,提出了一种基于Dempster-Shafer(DS)理论的决策层融合方法。该方法通过神经网络和狄利克雷函数处理不同模态特征生成证据,经证据修正和权重分配后,利用Shafer融合规则得出最终决策,有效提升了特征处理的准确性和模型的可解释性。采用精确率、召回率和F1分数作为评价指标,在公开和私有数据集上的实验结果表明,相较于现有方法,所提方法的总体性能提升了0.22个百分点到1.87个百分点。

关键词: 信息抽取, 多模态, Dempster-Shafer理论, 深度学习, 证据修正, 决策融合

Abstract: In the past MIE tasks,researchers usually use data layer fusion to train neural network models for MIE.However,due to the heterogeneity among different modalities,this fusion approach can lead to issues such as feature redundancy,incompatibility,and lack of interpretability,which in turn affect the effectiveness of model training.In view of this,this paper proposes a decision-level fusion method based on the DS theory to solve the problems of feature redundancy,incompatibility,and lack of interpretability caused by data layer fusion.The evidence is generated by processing different modal features through neural networks and Dirichlet functions,and after evidence correction and weight assignment,the Shafer fusion rule is utilized to arrive at the final decision.This method effectively improves the accuracy of feature processing and the interpretability of the model.Using accuracy,recall,and F1 score as evaluation metrics,experiments on public and private datasets show an overall performance improvement of 0.22 to 1.87 percentage points compared to existing methods.

Key words: Key information extraction,Multimodality,Dempster-Shafer theory,Deep learning,Evidentiary amendments,Decision fusion

中图分类号:

TP391

王剑, 王京岭, 张革, 王章全, 郭世远, 庾桂铭. 基于DS理论的多模态信息抽取方法[J]. 计算机科学, 2025, 52(10): 208-216. https://doi.org/10.11896/jsjkx.240200081

WANG Jian, WANG Jingling, ZHANG Ge, WANG Zhangquan, GUO Shiyuan, YU Guiming. Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory[J]. Computer Science, 2025, 52(10): 208-216. https://doi.org/10.11896/jsjkx.240200081

参考文献

[1]LANDOLSI M Y,HLAOUA L,BEN ROMDHANE L.Information extraction from electronic medical documents:state of the art and future research directions[J].Knowledge and Information Systems,2023,65(2):463-516.
[2]XU B,HUANG S,DU M,et al.A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information Extraction[C]//International Conference on Database Systems for Advanced Applications.Cham:Springer,2023:544-554.
[3]RAHATE A,WALAMBE R,RAMANNA S,et al.Multimodal co-learning:Challenges,applications with datasets,recent advances and future directions[J].Information Fusion,2022,81:203-239.
[4]YANG Y,ZHAN D C,JIANG Y,et al.Reliable multi-modallearning:a survey[J].Journal of Software,2020,32(4):1067-1081.
[5]LI X,ZHAO X,XU J,et al.IMF:Interactive Multimodal Fusion Model for Link Prediction[C]//Proceedings of the ACM Web Conference 2023.2023:2572-2580.
[6]DOU H,ZHANG L M,HAN F,et al.Survey on Convolutional Neural Network Interpretability[J].Ruan Jian Xue Bao/Journal of Software,2023,35(1):159-184.
[7]CUNNINGHAM H,DING Y,KIRYAKOV A.Workshop onHuman Language Technology for the Semantic Web and Web Services[EB/OL].https://gate.ac.uk/conferences/iswc2003/proceedings/iswc2003-hlt4sw-proceedings.pdf.
[8]ETZIONI O,CAFARELLA M,DOWNEY D,et al.Unsuper-vised named-entity extraction from the web:An experimental study[J].Artificial Intelligence,2005,165(1):91-134.
[9]SEKINE S,NOBATA C.Definition,Dictionaries and Tagger for Extended Named Entity Hierarchy[C]//LREC.2004:1977-1980.
[10]ZHANG S,ELHADAD N.Unsupervised biomedical named entity recognition:experiments with clinical and biological texts.[J].Journal of Biomedical Informatics,2013,46( 6):1088-1098.
[11]HANISCH D,FUNDEL K,MEVISSEN H T,et al.ProMiner:rule-based protein and gene entity recognition[J].BMC Bioinformatics,2005,6(1):1-9.
[12]QUIMBAYA A P,MÚNERA A S,RIVERA R A G,et al.Named entity recognition over electronic health records through a combined dictionary-based approach[J].Procedia Computer Science,2016,100:55-61.
[13]FLESCA S,MANCO G,MASCIARI E,et al.Web wrapper induction:a brief survey[J].AI communications,2004,17(2):57-61.
[14]NADEAU D,TURNEY P D,MATWIN S.Unsupervisednamed-entity recognition:Generating gazetteers and resolving ambiguity[C]//Advances in Artificial Intelligence:19th Confe-rence of the Canadian Society for Computational Studies of Intelligence.Berlin:Springer,2006:266-277.
[15]COLLINS M,SINGER Y.Unsupervised models for named entity classification[C]//1999 Joint SIGDAT Conference on Empi-rical Methods in Natural Language Processing and Very Large Corpora.1999.
[16]YAO L,LIU H,LIU Y,et al.Biomedical named entity recognition based on deep neutral network[J].International Journal of Hybrid Information Technology,2015,8(8):279-288.
[17]YANG J,ZHANG Y,DONG F.Neural reranking for named entity recognition[J].arXiv:1707.05127,2017.
[18]KURU O,CAN O A,YURET D.Charner:Character-levelnamed entity recognition[C]//Proceedings of COLING 2016,the 26th International Conference on Computational Linguistics:Technical Papers.2016:911-921.
[19]ZHANG Q,FU J,LIU X,et al.Adaptive co-attention network for named entity recognition in tweets[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[20]CARBONELL M,RIBA P,VILLEGAS M,et al.Named entity recognition and relation extraction with graph neural networks in semi structured documents[C]//2020 25th International Conference on Pattern Recognition(ICPR).2021:9622-9627.
[21]LIU X,GAO F,ZHANG Q,et al.Graph convolution for multimodal information extraction from visually rich documents[J].arXiv:1903.11279,2019.
[22]YU W,LU N,QI X,et al.PICK:processing key information extraction from documents using improved graph learning-convolutional networks[C]//2020 25th International Conference on Pattern Recognition(ICPR).2021:4363-4370.
[23]HUANG Z,CHEN K,HE J,et al.Icdar2019 competition onscanned receipt ocr and information extraction[C]//2019 International Conference on Document Analysis and Recognition(ICDAR).2019:1516-1520.
[24]XU Y,LI M,CUI L,et al.Layoutlm:Pre-training of text and layout for document image understanding[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2020:1192-1200.
[25]XU Y,XU Y,LYU T,et al.Layoutlmv2:Multi-modal pre-training for visually-rich document understanding[J].arXiv:2012.14740,2020.
[26]XU Y,LYU T,CUI L,et al.Layoutxlm:Multimodal pre-training for multilingual visually-rich document understanding[J].ar-Xiv:2104.08836,2021.
[27]SUI D,TIAN Z,CHEN Y,et al.A large-scale chinese multimodal ner dataset with speech clues[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.2021:2807-2818.
[28]HAN Z,ZHANG C,FU H,et al.Trusted multi-view classification with dynamic evidential fusion[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):2551-2566.
[29]DEMPSTER A P.Upper and Lower Probabilities Induced by a Multivalued Mapping[J].Annals of Mathematical Statistics,1967,38(2):325-339.
[30]SHAFER G A.A Mathematical Theory of Evidence[J].Tech-nometrics,1978,20(1):106-106.
[31]REIMERS N,GUREVYCH I.Sentence-bert:Sentence embed-dings using siamese bert-networks[J].arXiv:1908.10084,2019.
[32]LI D,DENG Y,CHEONG K H.Multisource basic probability assignment fusion based on information quality[J].International Journal of Intelligent Systems,2021,36(4):1851-1875.
[33]WANG Y C,WANG J,HUANG M J,et al.An evidence combination rule based on a new weight assignment scheme[J].Soft Computing,2022,26(15):7123-7137.
[34]ZENG J,XIAO F.A fractal belief KL divergence for decision fusion[J].Engineering Applications of Artificial Intelligence,2023,121:106027.
[35]DENG Y.Deng entropy[J].Chaos,Solitons & Fractals,2016,91:549-553.
[36]ZHANG P,XU Y,CHENG Z,et al.TRIE:end-to-end text rea-ding and information extraction for document understanding[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1413-1422.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于DS理论的多模态信息抽取方法

Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0