基于DS理论的多模态信息抽取方法

doi:10.11896/jsjkx.240200081

Abstract

Abstract: In the past MIE tasks,researchers usually use data layer fusion to train neural network models for MIE.However,due to the heterogeneity among different modalities,this fusion approach can lead to issues such as feature redundancy,incompatibility,and lack of interpretability,which in turn affect the effectiveness of model training.In view of this,this paper proposes a decision-level fusion method based on the DS theory to solve the problems of feature redundancy,incompatibility,and lack of interpretability caused by data layer fusion.The evidence is generated by processing different modal features through neural networks and Dirichlet functions,and after evidence correction and weight assignment,the Shafer fusion rule is utilized to arrive at the final decision.This method effectively improves the accuracy of feature processing and the interpretability of the model.Using accuracy,recall,and F1 score as evaluation metrics,experiments on public and private datasets show an overall performance improvement of 0.22 to 1.87 percentage points compared to existing methods.

Key words: Key information extraction,Multimodality,Dempster-Shafer theory,Deep learning,Evidentiary amendments,Decision fusion

CLC Number:

TP391

WANG Jian, WANG Jingling, ZHANG Ge, WANG Zhangquan, GUO Shiyuan, YU Guiming. Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory[J].Computer Science, 2025, 52(10): 208-216.

References

[1]LANDOLSI M Y,HLAOUA L,BEN ROMDHANE L.Information extraction from electronic medical documents:state of the art and future research directions[J].Knowledge and Information Systems,2023,65(2):463-516.
[2]XU B,HUANG S,DU M,et al.A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information Extraction[C]//International Conference on Database Systems for Advanced Applications.Cham:Springer,2023:544-554.
[3]RAHATE A,WALAMBE R,RAMANNA S,et al.Multimodal co-learning:Challenges,applications with datasets,recent advances and future directions[J].Information Fusion,2022,81:203-239.
[4]YANG Y,ZHAN D C,JIANG Y,et al.Reliable multi-modallearning:a survey[J].Journal of Software,2020,32(4):1067-1081.
[5]LI X,ZHAO X,XU J,et al.IMF:Interactive Multimodal Fusion Model for Link Prediction[C]//Proceedings of the ACM Web Conference 2023.2023:2572-2580.
[6]DOU H,ZHANG L M,HAN F,et al.Survey on Convolutional Neural Network Interpretability[J].Ruan Jian Xue Bao/Journal of Software,2023,35(1):159-184.
[7]CUNNINGHAM H,DING Y,KIRYAKOV A.Workshop onHuman Language Technology for the Semantic Web and Web Services[EB/OL].https://gate.ac.uk/conferences/iswc2003/proceedings/iswc2003-hlt4sw-proceedings.pdf.
[8]ETZIONI O,CAFARELLA M,DOWNEY D,et al.Unsuper-vised named-entity extraction from the web:An experimental study[J].Artificial Intelligence,2005,165(1):91-134.
[9]SEKINE S,NOBATA C.Definition,Dictionaries and Tagger for Extended Named Entity Hierarchy[C]//LREC.2004:1977-1980.
[10]ZHANG S,ELHADAD N.Unsupervised biomedical named entity recognition:experiments with clinical and biological texts.[J].Journal of Biomedical Informatics,2013,46( 6):1088-1098.
[11]HANISCH D,FUNDEL K,MEVISSEN H T,et al.ProMiner:rule-based protein and gene entity recognition[J].BMC Bioinformatics,2005,6(1):1-9.
[12]QUIMBAYA A P,MÚNERA A S,RIVERA R A G,et al.Named entity recognition over electronic health records through a combined dictionary-based approach[J].Procedia Computer Science,2016,100:55-61.
[13]FLESCA S,MANCO G,MASCIARI E,et al.Web wrapper induction:a brief survey[J].AI communications,2004,17(2):57-61.
[14]NADEAU D,TURNEY P D,MATWIN S.Unsupervisednamed-entity recognition:Generating gazetteers and resolving ambiguity[C]//Advances in Artificial Intelligence:19th Confe-rence of the Canadian Society for Computational Studies of Intelligence.Berlin:Springer,2006:266-277.
[15]COLLINS M,SINGER Y.Unsupervised models for named entity classification[C]//1999 Joint SIGDAT Conference on Empi-rical Methods in Natural Language Processing and Very Large Corpora.1999.
[16]YAO L,LIU H,LIU Y,et al.Biomedical named entity recognition based on deep neutral network[J].International Journal of Hybrid Information Technology,2015,8(8):279-288.
[17]YANG J,ZHANG Y,DONG F.Neural reranking for named entity recognition[J].arXiv:1707.05127,2017.
[18]KURU O,CAN O A,YURET D.Charner:Character-levelnamed entity recognition[C]//Proceedings of COLING 2016,the 26th International Conference on Computational Linguistics:Technical Papers.2016:911-921.
[19]ZHANG Q,FU J,LIU X,et al.Adaptive co-attention network for named entity recognition in tweets[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[20]CARBONELL M,RIBA P,VILLEGAS M,et al.Named entity recognition and relation extraction with graph neural networks in semi structured documents[C]//2020 25th International Conference on Pattern Recognition(ICPR).2021:9622-9627.
[21]LIU X,GAO F,ZHANG Q,et al.Graph convolution for multimodal information extraction from visually rich documents[J].arXiv:1903.11279,2019.
[22]YU W,LU N,QI X,et al.PICK:processing key information extraction from documents using improved graph learning-convolutional networks[C]//2020 25th International Conference on Pattern Recognition(ICPR).2021:4363-4370.
[23]HUANG Z,CHEN K,HE J,et al.Icdar2019 competition onscanned receipt ocr and information extraction[C]//2019 International Conference on Document Analysis and Recognition(ICDAR).2019:1516-1520.
[24]XU Y,LI M,CUI L,et al.Layoutlm:Pre-training of text and layout for document image understanding[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2020:1192-1200.
[25]XU Y,XU Y,LYU T,et al.Layoutlmv2:Multi-modal pre-training for visually-rich document understanding[J].arXiv:2012.14740,2020.
[26]XU Y,LYU T,CUI L,et al.Layoutxlm:Multimodal pre-training for multilingual visually-rich document understanding[J].ar-Xiv:2104.08836,2021.
[27]SUI D,TIAN Z,CHEN Y,et al.A large-scale chinese multimodal ner dataset with speech clues[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.2021:2807-2818.
[28]HAN Z,ZHANG C,FU H,et al.Trusted multi-view classification with dynamic evidential fusion[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(2):2551-2566.
[29]DEMPSTER A P.Upper and Lower Probabilities Induced by a Multivalued Mapping[J].Annals of Mathematical Statistics,1967,38(2):325-339.
[30]SHAFER G A.A Mathematical Theory of Evidence[J].Tech-nometrics,1978,20(1):106-106.
[31]REIMERS N,GUREVYCH I.Sentence-bert:Sentence embed-dings using siamese bert-networks[J].arXiv:1908.10084,2019.
[32]LI D,DENG Y,CHEONG K H.Multisource basic probability assignment fusion based on information quality[J].International Journal of Intelligent Systems,2021,36(4):1851-1875.
[33]WANG Y C,WANG J,HUANG M J,et al.An evidence combination rule based on a new weight assignment scheme[J].Soft Computing,2022,26(15):7123-7137.
[34]ZENG J,XIAO F.A fractal belief KL divergence for decision fusion[J].Engineering Applications of Artificial Intelligence,2023,121:106027.
[35]DENG Y.Deng entropy[J].Chaos,Solitons & Fractals,2016,91:549-553.
[36]ZHANG P,XU Y,CHENG Z,et al.TRIE:end-to-end text rea-ding and information extraction for document understanding[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:1413-1422.

Related Articles 15

[1]	WANG Baocai, WU Guowei. Interpretable Credit Risk Assessment Model:Rule Extraction Approach Based on AttentionMechanism [J]. Computer Science, 2025, 52(10): 50-59.
[2]	ZHENG Hanyuan, GE Rongjun, HE Shengji, LI Nan. Direct PET to CT Attenuation Correction Algorithm Based on Imaging Slice Continuity [J]. Computer Science, 2025, 52(10): 115-122.
[3]	XU Hengyu, CHEN Kun, XU Lin, SUN Mingzhai, LU Zhou. SAM-Retina:Arteriovenous Segmentation in Dual-modal Retinal Image Based on SAM [J]. Computer Science, 2025, 52(10): 123-133.
[4]	WEN Jing, ZHANG Songsong, LI Xufeng. Target Tracking Method Based on Cross Scale Fusion of Features and Trajectory Prompts [J]. Computer Science, 2025, 52(10): 144-150.
[5]	SHENG Xiaomeng, ZHAO Junli, WANG Guodong, WANG Yang. Immediate Generation Algorithm of High-fidelity Head Avatars Based on NeRF [J]. Computer Science, 2025, 52(10): 159-167.
[6]	ZHENG Dichen, HE Jikai, LIU Yi, GAO Fan, ZHANG Dengyin. Low Light Image Adaptive Enhancement Algorithm Based on Retinex Theory [J]. Computer Science, 2025, 52(10): 168-175.
[7]	RUAN Ning, LI Chun, MA Haoyue, JIA Yi, LI Tao. Review of Quantum-inspired Metaheuristic Algorithms and Its Applications [J]. Computer Science, 2025, 52(10): 190-200.
[8]	XIONG Zhuozhi, GU Zhouhong, FENG Hongwei, XIAO Yanghua. Subject Knowledge Evaluation Method for Language Models Based on Multiple ChoiceQuestions [J]. Computer Science, 2025, 52(10): 201-207.
[9]	CHEN Yuyan, JIA Jiyuan, CHANG Jingwen, ZUO Kaiwen, XIAO Yanghua. SPEAKSMART:Evaluating Empathetic Persuasive Responses by Large Language Models [J]. Computer Science, 2025, 52(10): 217-230.
[10]	LI Sihui, CAI Guoyong, JIANG Hang, WEN Yimin. Novel Discrete Diffusion Text Generation Model with Convex Loss Function [J]. Computer Science, 2025, 52(10): 231-238.
[11]	ZHANG Jiawei, WANG Zhongqing, CHEN Jiali. Multi-grained Sentiment Analysis of Comments Based on Text Generation [J]. Computer Science, 2025, 52(10): 239-246.
[12]	CHEN Jiahao, DUAN Liguo, CHANG Xuanwei, LI Aiping, CUI Juanjuan, HAO Yuanbin. Text Sentiment Classification Method Based on Large-batch Adversarial Strategy and EnhancedFeature Extraction [J]. Computer Science, 2025, 52(10): 247-257.
[13]	WANG Ye, WANG Zhongqing. Text Simplification for Aspect-based Sentiment Analysis Based on Large Language Model [J]. Computer Science, 2025, 52(10): 258-265.
[14]	ZHAO Jinshuang, HUANG Degen. Summary Faithfulness Evaluation Based on Data Augmentation and Two-stage Training [J]. Computer Science, 2025, 52(10): 266-274.
[15]	SUN Liangxu, LI Linlin, LIU Guoli. Sub-problem Effectiveness Guided Multi-objective Evolution Algorithm [J]. Computer Science, 2025, 52(10): 296-307.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Multimodal Information Extraction Fusion Method Based on Dempster-Shafer Theory

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0