解耦知识蒸馏在文档级关系抽取中的应用

doi:10.11896/jsjkx.240600050

计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 277-287.doi: 10.11896/jsjkx.240600050

解耦知识蒸馏在文档级关系抽取中的应用

刘乐, 肖蓉, 杨肖

湖北大学计算机与信息工程学院武汉 430062

收稿日期:2024-06-05 修回日期:2024-11-16 出版日期:2025-08-15 发布日期:2025-08-08
通讯作者: 肖蓉(20040363@hubu.edu.cn)
作者简介:(202131116020073@stu.hubu.edu.cn)
基金资助:
湖北省自然科学基金(E1KF291005);云南省自然科学基金(2022KZ00125)

Application of Decoupled Knowledge Distillation Method in Document-level RelationExtraction

LIU Le, XIAO Rong, YANG Xiao

School of Computer Science and Information Engineering,Hubei University,Wuhan 430062,China

Received:2024-06-05 Revised:2024-11-16 Online:2025-08-15 Published:2025-08-08
About author:LIU Le,born in 2003,bachelor.His main research interests include natural language processing and information extraction.
XIAO Rong,born in 1980,Ph.D,lectu-rer.Her main research interests include natural language processing and information extraction.
Supported by:
Hubei Provincial Natural Science Foundation(E1KF291005) and Yunnan Provincial Natural Science Foundation(2022KZ00125).

摘要/Abstract

摘要： 文档级关系抽取是自然语言处理领域中的一个重要研究方向,旨在从无结构或半结构的自然语言文档中提取实体之间的语义关系。提出了结合使用解耦知识蒸馏方法和交叉多头注意力机制来解决文档级关系抽取任务。首先,交叉多头注意机制不仅能够并行关注不同注意力头中的元素,使模型在不同粒度和层级上进行信息的交流和整合,而且允许模型在计算头实体与尾实体之间的注意力时,同时考虑它们与关系之间的相关性,从而提升模型对复杂关系的理解能力,增强模型对实体特征表示的学习。此外,为了进一步优化模型性能,还引入了解耦知识蒸馏方法去适应远程监督数据。该方法将原始KL散度损失中的目标类别知识蒸馏损失TCKDL和非目标类别知识蒸馏损失NCKDL解耦为了两个可以通过超参数调整其权重重要性的独立部分,提高了知识蒸馏过程的灵活性和有效性,特别是在处理DocRED远程监督数据中的噪声时,能够更精准地进行知识迁移和学习。实验结果表明,所提模型在DocRED数据集上能够更有效地提取实体对之间的关系。

关键词: 自然语言处理, 文档级关系抽取, DocRED, 交叉多头注意力, 解耦知识蒸馏, 远程监督数据, KL散度

Abstract: Document-level relation extraction is an important research direction in the field of natural language processing,aiming to extract semantic relationships between entities from unstructured or semi-structured natural language documents.This paper proposes a solution combining decoupled knowledge distillation and cross multi-head attention mechanisms to address the DocRE task.Firstly,the cross multi-head attention mechanism can not only simultaneously focus on elements in different attention heads,enabling the model to exchange and integrate information at different granularities and levels but also allow the model to consider the correlation between head and tail entities and their relations when calculating attention,thereby enhancing the model'sunderstanding of complex relationships and improving the learning of entity feature representations.Additionally,to further optimize the model's performance,this paper introduces a decoupled knowledge distillation method to adapt to distantly supervised data.This method decouples the original KL divergence loss into target class knowledge distillation loss(TCKDL) and non-target class knowledge distillation loss(NCKDL),which can adjust their weight importance through hyperparameters,increasing the flexibility and effectiveness of the knowledge distillation process.Particularly,it enables more precise knowledge transfer and learning when dealing with noise in the DocRED distantly supervised data.Experimental results show that the proposed model can more effectively extract relationships between entity pairs on the DocRED dataset.

Key words: Natural language processing, Document-Level relation extraction, DocRED, Cross Multi-head attention, Decoupled knowledge distillation, Distantly supervised data, Kullback-Leibler divergence

中图分类号:

TP391

刘乐, 肖蓉, 杨肖. 解耦知识蒸馏在文档级关系抽取中的应用[J]. 计算机科学, 2025, 52(8): 277-287. https://doi.org/10.11896/jsjkx.240600050

LIU Le, XIAO Rong, YANG Xiao. Application of Decoupled Knowledge Distillation Method in Document-level RelationExtraction[J]. Computer Science, 2025, 52(8): 277-287. https://doi.org/10.11896/jsjkx.240600050

参考文献

[1]YANG Z,WANG Y,GAN J,et al.Design and research of intelligent question-answering(Q&A) system based on high school course knowledge graph[J].Mobile Networks and Applications,2021,26(5):1884-1890.
[2]YU H,LI H,MAO D,et al.A relationship extracti-onmethod for domain knowledge graph construction[J].World Wide Web,2020,23:735-753.
[3]XUW,CHEN K,ZHAO T.Document-level relation extraction with reconstruction[C]//Proceedings of the AAAI Conference on Artificial Inteligence.2021:14167-14175.
[4]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pretraining ofdeep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[5]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907.11692,2019.
[6]ZHAO B,CUI Q,SONG R,et al.Decoupled knowledge distil-lation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11953-11962.
[7]YAO Y,YE D,LI P,et al.DocRED:A large-scale document-level relation extraction dataset[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:764-777.
[8]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2008,20(1):61-80.
[9]NAN G,GUO Z,SEKULIĆ I,et al.Reasoning with LatentStructure Refinement for Document-Level Relation Extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.ACL,2020:1546-1557.
[10]ZENG S,XU R,CHANG B,et al.Double Graph Based Reaso-ning for Document-level Relation Extraction[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:1630-1640.
[11]XU J,CHEN Y,QIN Y,et al.A feature combination-basedgraph convolutional neural network model for relation extraction[J].Symmetry,2021,13(8):1458.
[12]WANG N,CHEN T,REN C,et al.Document-level relation extraction with multi-layer heterogeneous graph attention network[J].Engineering Applications of Artificial Intelligence,2023,123:1-10.
[13]WOLF T,DEBUT L,SANH V,et al.Transformers:State-of-the-Art Natural Language Processing[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.2020:38-45.
[14]VASWANI A,SHAZEER N,PARMARN,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.2017.
[15]VERGA P,STRUBELL E,MCCALLUMA.Sim-ultaneouslyself-attending to all mentions for full-abstract biological relation extraction[J]. arXiv:1802.10569,2018.
[16]ZHOU W,HUANG K,MAT,et al.Document-level relation extraction with adaptive thresholding andlocalized context pooling[C]//Proceedings of the AAAI Conferenceon Artificial Intelligence.2021:14612-14620.
[17]XU B,WANG Q,LYU Y,et al.Entity structure within andthroughout:Modeling mention dependencies for document-level relation extraction[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021:14149-14157.
[18]XIE Y,SHEN J,LI S,et al.Eider:empowering document-level relation extraction with efficient evidence extraction and infe-rence-stage fusion[C]//Proceedings of the Association for Computational Linguistics.2022:257-268.
[19]TAN Q,HE R,BING L,et al.Document-level relation extraction with adaptive focal loss and knowledge distillation[C]//Proceedings of Findings of the Association for Computational Linguistics.ACL,2022:1672-1681.
[20]MINTZ M,BILLS S,SNOW R,et al.Distant sup-ervision for relation extraction without labeled data[C]//Proceedings of the Joint Conference ofthe 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP.ACL,2009:1003-1011.
[21]HINTO N G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[22]ZHANG L,SU J,MIN Z,et al.Exploring self-distillation based relational reasoning training for document-level relation extraction[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:13967-13975.
[23]MA Y,WANG A,OKAZAKIN.DREEAM:Guiding attention with evidence for improving doc-ument-level relation extraction[J].arXiv:2302.08675,2023.
[24]JIA R,WONG C,POON H.Document-Level Nary Relation Extraction with Multiscale Representation Learning[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:3693-3704.
[25]VRANDEČIĆD,KRÖTZSCHM.Wikidata:a free collaborative knowledgebase[J].Communica-tions of the ACM,2014,57(10):78-85.
[26]LOSHCHILOV I,HUTTERF.Decoupled weight decay regularization[J].arXiv:1711.05101,2017.
[27]GOYALP,DOLLÁR P,GIRSHICK R,et al.Accurate,large minibatch sgd:Training imagenet in 1 hour[J].arXiv:1706.02677,2017.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

解耦知识蒸馏在文档级关系抽取中的应用

Application of Decoupled Knowledge Distillation Method in Document-level RelationExtraction

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0