解耦知识蒸馏在文档级关系抽取中的应用

doi:10.11896/jsjkx.240600050

Abstract

Abstract: Document-level relation extraction is an important research direction in the field of natural language processing,aiming to extract semantic relationships between entities from unstructured or semi-structured natural language documents.This paper proposes a solution combining decoupled knowledge distillation and cross multi-head attention mechanisms to address the DocRE task.Firstly,the cross multi-head attention mechanism can not only simultaneously focus on elements in different attention heads,enabling the model to exchange and integrate information at different granularities and levels but also allow the model to consider the correlation between head and tail entities and their relations when calculating attention,thereby enhancing the model'sunderstanding of complex relationships and improving the learning of entity feature representations.Additionally,to further optimize the model's performance,this paper introduces a decoupled knowledge distillation method to adapt to distantly supervised data.This method decouples the original KL divergence loss into target class knowledge distillation loss(TCKDL) and non-target class knowledge distillation loss(NCKDL),which can adjust their weight importance through hyperparameters,increasing the flexibility and effectiveness of the knowledge distillation process.Particularly,it enables more precise knowledge transfer and learning when dealing with noise in the DocRED distantly supervised data.Experimental results show that the proposed model can more effectively extract relationships between entity pairs on the DocRED dataset.

Key words: Natural language processing, Document-Level relation extraction, DocRED, Cross Multi-head attention, Decoupled knowledge distillation, Distantly supervised data, Kullback-Leibler divergence

CLC Number:

TP391

LIU Le, XIAO Rong, YANG Xiao. Application of Decoupled Knowledge Distillation Method in Document-level RelationExtraction[J].Computer Science, 2025, 52(8): 277-287.

References

[1]YANG Z,WANG Y,GAN J,et al.Design and research of intelligent question-answering(Q&A) system based on high school course knowledge graph[J].Mobile Networks and Applications,2021,26(5):1884-1890.
[2]YU H,LI H,MAO D,et al.A relationship extracti-onmethod for domain knowledge graph construction[J].World Wide Web,2020,23:735-753.
[3]XUW,CHEN K,ZHAO T.Document-level relation extraction with reconstruction[C]//Proceedings of the AAAI Conference on Artificial Inteligence.2021:14167-14175.
[4]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pretraining ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[5]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907.11692,2019.
[6]ZHAO B,CUI Q,SONG R,et al.Decoupled knowledge distil-lation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11953-11962.
[7]YAO Y,YE D,LI P,et al.DocRED:A large-scale document-level relation extraction dataset[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:764-777.
[8]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2008,20(1):61-80.
[9]NAN G,GUO Z,SEKULIĆ I,et al.Reasoning with LatentStructure Refinement for Document-Level Relation Extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.ACL,2020:1546-1557.
[10]ZENG S,XU R,CHANG B,et al.Double Graph Based Reaso-ning for Document-level Relation Extraction[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:1630-1640.
[11]XU J,CHEN Y,QIN Y,et al.A feature combination-basedgraph convolutional neural network model for relation extraction[J].Symmetry,2021,13(8):1458.
[12]WANG N,CHEN T,REN C,et al.Document-level relation extraction with multi-layer heterogeneous graph attention network[J].Engineering Applications of Artificial Intelligence,2023,123:1-10.
[13]WOLF T,DEBUT L,SANH V,et al.Transformers:State-of-the-Art Natural Language Processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:38-45.
[14]VASWANI A,SHAZEER N,PARMARN,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017.
[15]VERGA P,STRUBELL E,MCCALLUMA.Sim-ultaneouslyself-attending to all mentions for full-abstract biological relation extraction[J]. arXiv:1802.10569,2018.
[16]ZHOU W,HUANG K,MAT,et al.Document-level relation extraction with adaptive thresholding andlocalized context pooling[C]//Proceedings of the AAAI Conferenceon Artificial Intelligence.2021:14612-14620.
[17]XU B,WANG Q,LYU Y,et al.Entity structure within andthroughout:Modeling mention dependencies for document-level relation extraction[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021:14149-14157.
[18]XIE Y,SHEN J,LI S,et al.Eider:empowering document-level relation extraction with efficient evidence extraction and infe-rence-stage fusion[C]//Proceedings of the Association for Computational Linguistics.2022:257-268.
[19]TAN Q,HE R,BING L,et al.Document-level relation extraction with adaptive focal loss and knowledge distillation[C]//Proceedings of Findings of the Association for Computational Linguistics.ACL,2022:1672-1681.
[20]MINTZ M,BILLS S,SNOW R,et al.Distant sup-ervision for relation extraction without labeled data[C]//Proceedings of the Joint Conference ofthe 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP.ACL,2009:1003-1011.
[21]HINTO N G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[22]ZHANG L,SU J,MIN Z,et al.Exploring self-distillation based relational reasoning training for document-level relation extraction[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:13967-13975.
[23]MA Y,WANG A,OKAZAKIN.DREEAM:Guiding attention with evidence for improving doc-ument-level relation extraction[J].arXiv:2302.08675,2023.
[24]JIA R,WONG C,POON H.Document-Level Nary Relation Extraction with Multiscale Representation Learning[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:3693-3704.
[25]VRANDEČIĆD,KRÖTZSCHM.Wikidata:a free collaborative knowledgebase[J].Communica-tions of the ACM,2014,57(10):78-85.
[26]LOSHCHILOV I,HUTTERF.Decoupled weight decay regularization[J].arXiv:1711.05101,2017.
[27]GOYALP,DOLLÁR P,GIRSHICK R,et al.Accurate,large minibatch sgd:Training imagenet in 1 hour[J].arXiv:1706.02677,2017.

Related Articles 15

[1]	ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[2]	WANG Youkang, CHENG Chunling. Multimodal Sentiment Analysis Model Based on Cross-modal Unidirectional Weighting [J]. Computer Science, 2025, 52(7): 226-232.
[3]	LIU Yanlun, XIAO Zheng, NIE Zhenyu, LE Yuquan, LI Kenli. Case Element Association with Evidence Extraction for Adjudication Assistance [J]. Computer Science, 2025, 52(2): 222-230.
[4]	XU Siyao, ZENG Jianjun, ZHANG Weiyan, YE Qi, ZHU Yan. Dependency Parsing for Chinese Electronic Medical Record Enhanced by Dual-scale Collaboration of Large and Small Language Models [J]. Computer Science, 2025, 52(2): 253-260.
[5]	ZHANG Jian, LI Hui, ZHANG Shengming, WU Jie, PENG Ying. Review of Pre-training Methods for Visually-rich Document Understanding [J]. Computer Science, 2025, 52(1): 259-276.
[6]	GUO Zhiqiang, GUAN Donghai, YUAN Weiwei. Word-Character Model with Low Lexical Information Loss for Chinese NER [J]. Computer Science, 2024, 51(8): 272-280.
[7]	YANG Binxia, LUO Xudong, SUN Kaili. Recent Progress on Machine Translation Based on Pre-trained Language Models [J]. Computer Science, 2024, 51(6A): 230700112-8.
[8]	WANG Yingjie, ZHANG Chengye, BAI Fengbo, WANG Zumin. Named Entity Recognition Approach of Judicial Documents Based on Transformer [J]. Computer Science, 2024, 51(6A): 230500164-9.
[9]	LI Minzhe, YIN Jibin. TCM Named Entity Recognition Model Combining BERT Model and Lexical Enhancement [J]. Computer Science, 2024, 51(6A): 230900030-6.
[10]	PENG Bo, LI Yaodong, GONG Xianfu, LI Hao. Method for Entity Relation Extraction Based on Heterogeneous Graph Neural Networks and TextSemantic Enhancement [J]. Computer Science, 2024, 51(6A): 230700071-5.
[11]	LI Bin, WANG Haochang. Implementation and Application of Chinese Grammatical Error Diagnosis System Based on CRF [J]. Computer Science, 2024, 51(6A): 230900073-6.
[12]	ZHANG Mingdao, ZHOU Xin, WU Xiaohong, QING Linbo, HE Xiaohai. Unified Fake News Detection Based on Semantic Expansion and HDGCN [J]. Computer Science, 2024, 51(4): 299-306.
[13]	TU Xin, ZHANG Wei, LI Jidong, LI Meijiao , LONG Xiangbo. Study on Automatic Classification of English Tense Exercises for Intelligent Online Teaching [J]. Computer Science, 2024, 51(4): 353-358.
[14]	ZHENG Cheng, SHI Jingwei, WEI Suhua, CHENG Jiaming. Dual Feature Adaptive Fusion Network Based on Dependency Type Pruning for Aspect-basedSentiment Analysis [J]. Computer Science, 2024, 51(3): 205-213.
[15]	MA Qimin, LI Xiangmin, ZHOU Yaqian. Large Language Model-based Method for Mobile App Accessibility Enhancement [J]. Computer Science, 2024, 51(12): 223-233.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Application of Decoupled Knowledge Distillation Method in Document-level RelationExtraction

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0