文档级关系抽取技术研究综述

doi:10.11896/jsjkx.220400252

摘要/Abstract

摘要： 关系抽取是信息抽取研究的重要方向,已逐步从句子级扩展到了文档级。与句子相比,文档通常蕴含更多的关系事实,可为知识库构建、信息检索和语义分析等提供更多的信息支持。然而,文档级关系抽取复杂度更高,难度更大,目前缺乏较为系统全面的梳理和总结。为更好地促进文档级关系抽取的深入研究与发展,文中对已有技术和方法进行了综合深入分析,从数据预处理方式和核心算法角度,将已有文档级关系抽取研究大致分为基于树、基于序列和基于图3种类别;在此基础上,分析描述了各类研究中的部分典型方法、最新进展以及存在的不足;同时,介绍了现有研究中部分常用数据集和性能评价指标,并列出了已有部分典型方法的具体性能;最后,对现有文档级关系抽取研究存在的问题进行了分析和总结,指出了未来可能的发展趋势及可进一步深入关注的研究方向。

关键词: 信息抽取, 文档级关系抽取, 数据预处理, 数据集, 性能评价

Abstract: Relation extraction(RE) is an essential direction of information extraction research,it gradually expanding from sentence to document-level.Compared with sentences,documents usually contain more relation facts,providing more information for knowledge base construction,information retrieval,and semantic analysis.However,document-level relation extraction is more complex and challenging,and there is currently a lack of systematic and comprehensive sorting and summary.To better promote the development of document-level relation extraction,this paper carries out a comprehensive and in-depth analysis of the existing technologies and methods.From the perspective of data preprocessing methods and core algorithms,it classifies the existing methods into three types,including tree-based,sequence-based,and graph-based.Based on this,Relation extraction by category analyzes and describes some typical methods,the latest progress and shortcomings.At the same time,it introduces some corpus,evaluation metrics and some typical methods.Finally,the existing problems in document-level relation extraction research are analyzed and summarized,and the possible future development trends and research directions are discussed.

Key words: Information extraction, Document-level relation extraction, Data preprocess, Datasets, Performance evaluation

中图分类号:

TP391

祝涛杰, 卢记仓, 周刚, 丁肖摇, 王凌, 朱秀宝. 文档级关系抽取技术研究综述[J]. 计算机科学, 2023, 50(5): 189-200. https://doi.org/10.11896/jsjkx.220400252

ZHU Taojie, LU Jicang, ZHOU Gang, DING Xiaoyao, WANG Ling, ZHU Xiubao. Review of Document-level Relation Extraction Techniques[J]. Computer Science, 2023, 50(5): 189-200. https://doi.org/10.11896/jsjkx.220400252

参考文献

[1]KADRY A,DIETZ L.Open relation extraction for support passage retrieval:Merit and open issues[C]//Proceedings of the 40th International Conference on Research and Development in Information Retrieval.ACM,2017:1149-1152.
[2]MO Y,YIN W,HASAN K S,et al.Improved Neural RelationDetection for Knowledge Base Question Answering[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.ACL,2017:571-581.
[3]YOUNG T,CAMBRIA E,CHATURVEDI I,et al.Augmenting End-to-End Dialog Systems with Commonsense Knowledge[C]//Proceedings of the 32th AAAI Conference on Artificial Intelligence.AAAI Press,2018:4970-4977.
[4]YAO Y,YE D,LI P,et al.DocRED:A Large-Scale Document-Level Relation Extraction Dataset[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics.ACL,2019:764-777.
[5]CHENG Q,LIU J,QU X,et al.HacRED:A Large-Scale Relation Extraction Dataset Toward Hard Cases in Practical Applications[C]//Proceedings of the Association for Computational Linguistics.ACL,2021:2819-2831.
[6]LI D M,ZHANG Y,LI D Y,et al.Review of Entity RelationExtraction Methods[J].Journal of Computer Research and Development,2020,57(7):25.
[7]E H Y,ZHANG W J,XIAO S Q,et al.Survey of Entity Relationship Extraction Based on Deep Learning[J].Journal of Software,2019,30(6):26.
[8]HIRANO T,ASANO H,MATSUO Y,et al.Recognizing Relation Expression between Named Entities based on Inherent and Context-dependent Features of Relational words[C]//Procee-dings of the 23th International Conference on Computational Linguistics.COLING,2010:409-417.
[9]GUPTA P,RAJARAM S,SCHUTZE H,et al.Neural Relation Extraction Within and Across Sentence Boundaries[C]//Proceedings of the 32th Conference on Artificial Intelligence.AAAI Press,2019:6513-6520.
[10]TANG H,CAO Y,ZHANG Z,et al.HIN:Hierarchical Infe-rence Network for Document-Level Relation Extraction[C]//Proceedings of the 24th Pacific-Asia Conference.Cham:Sprin-ger,2020:197-209.
[11]LI J,XU K,LI F et al,MRN:A Locally and Globally Mention-Based Reasoning Network for Document-Level Relation Extraction[C]//Proceedings of the Association for Computational Linguistics.ACL/IJCNLP,2021:1359-1370.
[12]HUANG Q,ZHU S,FENG Y,et al.Three Sentences Are All You Need:Local Path Enhanced Document Relation Extraction[C]//Proceedings of the 59th Association for Computational Linguistics.ACL,2021:998-1004.
[13]JIA R,WONG C,POON H.Document-Level N-ary RelationExtraction with Multiscale Representation Learning[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics.ACL,2019:3693-3704.
[14]ZHANG N,CHEN X,XIE X,et al.Document-level RelationExtraction as Semantic Segmentation[C]//Proceedings of the 30th International Joint Conference on Artificial Intelligence.IJCAI,2021:3999-4006.
[15]XU B,WANG Q,LU Y Y,et al.Entity Structure Within and Throughout:Modeling Mention Dependencies for Document-Level Relation Extraction[C]//Proceedings of the 33th AAAI Conference on Artificial Intelligence.AAAI Press,2021:14149-14157.
[16]XIE Y,SHEN J,LI S,et al.Eider:Evidence-enhanced Docu-ment-level Relation Extraction[J].arXiv:2106.08657,2021.
[17]WANG H,FOCKE C,SYLVESTER R,et al.Fine-tune Bert for DocRED with Two-step Process[J].arXiv:1909.11898,2019.
[18]HUANG K,WANG G,MA T,et al.Entity and EvidenceGuided Relation Extraction for DocRED[J].arXiv:2008.12283,2020.
[19]LEE J,YOON W,KINM S,et al.BioBERT:a pre-trained biomedical language representation model for biomedical text mi-ning[J].Bioinformatics,2019,36(4):1234-1240.
[20]QIN Y,LIN Y,TAKANOBU R,et al.ERICA:Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.ACL,2021:3350-3363.
[21]XIAO Y,ZHANG Z,MAO Y,et al.SAIS:Supervising and Aug-menting Intermediate Steps for Document-Level Relation Extraction[J].arXiv:2109.12093,2021.
[22]ZHENG W,LIU X,LIU X,et al.An effective neural model extracting document level chemical-induced disease relations from biomedical literature[J].Journal of Biomedical Informatics,2018,83:1-9.
[23]NAN G,GUO Z,SEKULIC I,et al.Reasoning with LatentStructure Refinement for Document-Level Realtion Extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.ACL,2020:1546-1557.
[24]ZHOU H,XU Y,YAO W,et al.Global Context-enhancedGraph Convolutional Networks for Document-level Relation Extraction[C]//Proceedings of the 28th International Conference on Computational Linguistics.COLING,2020:5259-5270.
[25]DAI D,REN J,ZENG S,et al.Coarse-to-Fine Entity Representations for Document-level Relation Extraction[J].arXiv:2012.02507,2020.
[26]WANG D,HU W,CAO E,et al.Global-to-Local Neural Networks for Document-Level Relation Extraction[C]//Procee-dings of the 2020 Conference on Empirical Methods in Natural Language Processing.ACL,2020:3711-3721.
[27]LI B,YE W,SHENG Z,et al.Graph Enhanced Dual Attention Network for Document-Level Relation Extraction[C]//Proceedings of the 28th International Conference on Computational Linguistics.COLING,2020:1551-1560.
[28]ZHANG Z,YU B,SHU X,et al.Document-level Relation Extraction with Dual-tier Heterogeneous Graph[C]//Proceedings of the 28th International Conference on Computational Linguistics.COLING,2020:1630-1641.
[30]SHI Y,XIAO Y,QUAN P,et al.Document-level relation ex-traction via graph transformer networks and temporal convolutional networks[J].Pattern Recognition Letters,2021,149:150-156.
[31]MAKINO,K,MAKOTO M,YUTAKA S.A Neural Edge-Editing Approach for Document-Level Relation Graph Extraction[C]//Proceedings of the Association for Computational Linguistics.ACL,2021:2653-2662.
[32]XU W,CHEN K,ZHAO T.Discriminative Reasoning for Document-level Relation Extraction[C]//Proceedings of the Asso-ciation for Computational Linguistics.ACL,2021:1653-1663.
[33]SWAMPILLAI K,STEVENSON M.Extracting Relations With-in and Across Sentences[C]//Recent Advances in Natural Language Processing.RANLP,2011:25-32.
[34]BORDES A,USUNIER N,GARCIA-DURAN A,et al.Translating Embeddings for Modeling Multi-relational Data[C]//Proceedings of the 27th Advances in Neural Information Processing Systems.NIPS,2013:2787-2795.
[35]CHEN Q,ZHU X,LING Z,et al.Enhanced LSTM for Natural Language Inference[C]//Proceedings of the 55th Association for Computational Linguistics.ACL,2017:1657-1668.
[36]ZENG D,LIU K,LAI S,et al.Relation classification via convolutional deep neural network[C]//Proceedings of the 25th International Conference on Computational Linguistics.ACL,2014:2335-2344.
[37]RONNEBERGER O,FIS-CHER P,BROX T.U-net:Convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference Munich.Cham:Springer,2015:234-241.
[38]VERGA P,STRUBELL E,MCCALLUM A.SimultaneouslySelf-Attending to All Mentions for Full-Abstract Biological Relation Extraction[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.ACL,2018:872-884.
[39]YAN L,HAN X,SUN L,et al.From Bag of Sentences to Document:Distantly Supervised Relation Extraction via Machine Reading Comprehension[J].arXiv:2012.04334,2020.
[40]LI B,YE W,HUANG C,et al.Multi-view Inference for Relation Extraction with Uncertain Knowledge[C]//Proceedings of the 35th Conference on Artificial Intelligence.AAAI Press,2021:13234-13242.
[41]WU W,LI H,WANG H,et al.Probase:A probabilistic taxonomy for text understanding[C]//Proceedings of the International Conference on Management of Data.ACM,2012:481-492.
[42]ZHOU W,HUANG K,MA T,et al.Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling[C]//Proceedings of the 33th Conference on Artificial Intelligence.AAAI Press,2021:14612-14620.
[43]YUAN C,HUANG H,FENG C,et al.Document-level relationextraction with Entity-Selection Attention[J].Information Sciences,2021(568):163-174.
[44]XIAO C,YAO Y,XIE R,et al.Denoising Relation Extraction from Document-Level Distant Supervision[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.ACL,2020:3683-3688.
[45]KIM Y,DENTON C,HOANG L,et al.Structured attention networks[C]//Proceedings of the 5th International Conference on Learning Representations.ICLR,2017.
[46]TERRY K,AMIR G,XAVIER C,et al.Structured prediction models via the matrix-tree theorem[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Lear-ning.ACL,2007:141-150.
[47]KIPF T,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//Proceedings of the 5th International Conference on Learning Representations.ICLR,2017.
[48]GUO Z,ZHANG Y,LU W.Attention Guided Graph Convolutional Networks for Relation Extraction[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.ACL,2019:241-251.
[49]SAHU S K,CHRISTOPOULOU F,MIWA M,et al.Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.ACL,2019:4309-4316.
[50]ROBERTS A,GAIZAUSKAS R,HEPPLE M,et al.Semanticannotation of clinical text:The CLEF corpus[C]//Proceedings of the 2007 American Medical Informatics Association Annual Symposium.AMIA,2008:19-26.
[51]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition.IEEE,2017:4700-4708.
[52]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of 30th Advances in Neural Information Processing Systems.NIPS,2017:5998-6008.
[53]SCHLICHTKRULL M,KIPF T N,BLOEMP P,et al.Modeling relational data with graph convolutional networks[C]//Proceedings of the 15th European Semantic Web Conference.Cham:Springer,2018:593-607.
[54]PAN J,PENG M,ZHANG Y.Mention-centered Graph Neural Network for Document-level Relation Extraction[J].arXiv:2103.08200,2021.
[55]XU W,CHEN K,ZHAO T.Document-Level Relation Extraction with Reconstruction[C]//Proceedings of the 33th Confe-rence on Artificial Intelligence.AAAI Press,2020:14167-14175.
[56]WANG H,QIN K,LU G,et al.Document-level relation extraction using evidence reasoning on RST-GRAPH[J].Knowledge-Based Systems,2021(228):107274.
[57]MANN W C,THOMPSON S A.Rhetorical structure theory:Toward a functional theory of text organization[J].Text-interdisciplinary Journal for the Study of Discourse,1988,8(3):243-281.
[58]ZENG S,XU R,CHANG B,et al.Double Graph Based Reaso-ning for Document-level Relation Extraction[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.ACL,2020:1630-1640.
[59]ZENG S,WU Y,CHANG B.Sire:Separate intra-and inter-sentential reasoning for document-level relation extraction[J].ar-Xiv:2106.01709,2021.
[60]PENG N,POON H,QUIRK C,et al.Cross-Sentence N-ary Relation Extraction with Graph LSTMs[J].Transactions of the Association for Computational Linguistics,2017,5(1):101-115.
[29]CHRISTOPOULOU F,MIWA M,ANANIADOU S.Connec-ting the Dots:Document-level Neural Relation Extraction with Edge-oriented Graphs[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.ACL,2019:4924-4935.
[61]CHRISTOPOULOU F,MIWA M,ANANIADOU S.A walk-based model on entity graphs for relation extraction[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.ACL,2018:81-88.
[62]BAI S,KOLTER J Z,KOLTUN V.An empirical evaluation of generic convolutional and recurrent networks for sequence mo-deling[J].arXiv:1803.01271,2018.
[63]YUN S,JEONG M,KIN R,et al.Graph transformer networks[C]//NeurIPS2019.2019:11960-11970.
[64]BELTAGY I,PETERS M E,COHAN A.Longformer:Thelong-document transformer[J].arXiv:2004.05150,2020.
[65]GRISHAMN R,SUNDHEIM B.Message Understanding Conference-6:a brief history[C]//Proceedings of the 16th Confe-rence on Computational Linguistics.ACL,1996:466-471.
[66]ZHANG Y,ZHONG V,CHEN D,et al.Position-aware attention and supervised data improve slot filling[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.ACL,2017:35-45.
[67]ROBERTS A,GAIZAUSKAS R,HEPPLE M,et al.Semanticannotation of clinical text:The CLEF corpus[C]//American Medical Informatics Association Annual Symposium.AMIA,2008:19-26.
[68]LI J,SUN Y,JOHNSON R J,et al.BioCreative V CDR task corpus:a resource for chemical disease relation extraction[J].Database,2016,2016:baw068.
[69]WU Y,LUO R,LEUNG H C M,et al.RENET:A Deep Lear-ning Approach for Extracting Gene-Disease Associations from Literature[C]//Proceedings of the 23th International Confe-rence on Research in Computational Molecular Biology.Cham:Springer,2019:272-284.
[70]JAIN S,ZUYLEN MVAN,HAJISHIRZI H,et al.SciREX:A Challenge Dataset for Document-Level Information Extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.ACL,2020:7506-7516.
[71]ZAPOROJETS K,DELEU J,DEVELDER C,et al.DWIE:an entity-centric dataset for multi-task document-level information extraction[J].Information Processing & Management,2021,58(4):102563.
[72]NAYAK T,NG H T.A Hierarchical Entity Graph Convolu-tional Network for Relation Extraction across Documents[C]//Proceedings of the International Conference on Recent Advances in Natural Language Processing.RANLP,2021:1022-1030.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed