Computer Science ›› 2023, Vol. 50 ›› Issue (10): 126-134.doi: 10.11896/jsjkx.230300079

• Artificial Intelligence • Previous Articles     Next Articles

Active Learning-based Text Entity and Relation Joint Extraction Method

DING Hongxin1,2, ZOU Peinie1,3, ZHAO Junfeng1,2, WANG Yasha1,2   

  1. 1 School of Computer Science,Peking University,Beijing 100871,China
    2 Key Laboratory of High Confidence Software Technologies,Ministry of Education,Beijing 100871,China
    3 School of Software & Microelectronics,Peking University,Beijing 102600,China
  • Received:2023-03-09 Revised:2023-06-23 Online:2023-10-10 Published:2023-10-10
  • About author:DING Hongxin,born in 2000,postgra-duate.Her main research interests include knowledge graph,natural language processing and so on.ZHAO Junfeng,born in 1974,Ph.D,research professor,is a member of China Computer Federation.Her main research interests include big data analysis,knowledge graph,urban computing and so on.
  • Supported by:
    National Natural Science Foundation of China(62172011) and Fundamental Research Funds for the Central Universities of Ministry of Education of China.

Abstract: Unstructured text data contains a large amount of valuable knowledge,entities and relations extracted from which can form structured knowledge and help to build knowledge graphs and support downstream tasks.There is a wide range of application prospects for entity and relation extraction.Currently,entity and relation extraction mostly use deep learning methods.However,the training of deep learning models consumes large amounts of annotated datasets,resulting in high labor cost.Therefore,how to reduce the workload of manual annotation is one of the focuses of research.Active learning is a subfield of machine lear-ning,which aims to maximize a model's performance gain while annotating the fewest samples possible,by selecting the most va-luable samples to be labeled and handed over to the model for training.Its potential to reduce training data complements the data-hungry nature of deep learning.Therefore,deep active learning that applies active learning in deep learning has become a hot research topic in entity and relation extraction.In the above context,using deep active learning for joint entity and relation extraction and appling active learning to the training process of the deep learning model to minimize the manual labeled data required for training while maintaining model performance,a deep learning model based on unified label space and matrix annotation for entity relation joint extraction is implemented and based on it,a variety of active learning query strategies are designed and implemented.The validity of the method is verified on text datasets and common entity and relation joint extraction datasets in the medical field.Several methods are proposed to select the stopping time of model training,including methods based on training loss curve of the model,model performance on the training set,and the prediction stability on reserved data.The method of selecting stop time for practical application scenario is studied by experiments.An intelligent text annotation tool based on active learning for joint extraction of entity and relation is designed and implemented,which allows users to annotate entities and relations in the text.The tool implements a deep learning model for entity and relation extraction and active learning methods to minimize the annotation workload of users.

Key words: Active learning, Knowledge extraction, Named entity recognition, Relation extraction, Human-machine interaction

CLC Number: 

  • TP311
[1]HANISCH D,FUNDEL K,MEVISSEN H T,et al.ProMiner:rule-based protein and gene entity recognition[J].BMC Bioinformatics,2005,6(1):1-9.
[2]ROCKTÄSCHEL T,WEIDLICH M,LESER U.ChemSpot:ahybrid system for chemical named entity recognition[J].Bioinformatics,2012,28(12):1633-1640.
[3]ZHENG S,WANG F,BAO H,et al.Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme[C]//Procee-dings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017:1227-1236.
[4]WEI Z,SU J,WANG Y,et al.A Novel Cascade Binary Tagging Framework for Relational Triple Extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:1476-1488.
[5]WANG J,LU W.Two are Better than One:Joint Entity and Relation Extraction with Table-Sequence Encoders[C]//Procee-dings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:1706-1721.
[6]WANG Y,SUN C,WU Y,et al.UniRE:A Unified Label Spacefor Entity Relation Extraction[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:220-231.
[7]SHEN Y,YUN H,LIPTON Z C,et al.Deep Active Learning for Named Entity Recognition[C]//Proceedings of the 2nd Workshop on Representation Learning for NLP.2017:252-256.
[8]ZHDANOV F.Diverse mini-batch active learning[J].arXiv:1901.05954,2019.
[9]ASH J T,ZHANG C,KRISHNAMURTHY A,et al.Deepbatch active learning by diverse,uncertain gradient lower bounds[J].arXiv:1906.03671,2019.
[10]ZHANG N,CHEN M,BI Z,et al.CBLUE:A Chinese Biomedi-cal Language Understanding Evaluation Benchmark[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2022:7888-7915.
[11]HONGYING Z,WENXIN L,KUNLI Z,et al.Building a pediatric medical corpus:Word segmentation and named entity annotation[C]//21st Workshop Chinese Lexical Semantics(CLSW 2020).Hong Kong,China,Revised Selected Papers 21.Springer International Publishing,2021:652-664.
[12]GUAN T,ZAN H,ZHOU X,et al.CMeIE:Construction andevaluation of Chinese medical information extraction dataset[C]//9th CCF International Conference Natural Language Processing and Chinese Computing(NLPCC 2020).2020:270-282.
[1] ZHAI Lizhi, LI Ruixiang, YANG Jiabei, RAO Yuan, ZHANG Qitan, ZHOU Yun. Overview About Composite Semantic-based Event Graph Construction [J]. Computer Science, 2023, 50(9): 242-259.
[2] LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
[3] HENG Hongjun, MIAO Jing. Fusion of Semantic and Syntactic Graph Convolutional Networks for Joint Entity and Relation Extraction [J]. Computer Science, 2023, 50(9): 295-302.
[4] DING Xiaoyao, ZHOU Gang, LU Jicang, CHEN Jing. Study on Enhanced Entity Representation for Document-level Relation Extraction [J]. Computer Science, 2023, 50(8): 157-162.
[5] ZHU Xiubao, ZHOU Gang, CHEN Jing, LU Jicang, XIANG Yixin. Single-stage Joint Entity and Relation Extraction Method Based on Enhanced Sequence Annotation Strategy [J]. Computer Science, 2023, 50(8): 184-192.
[6] GAO Xiang, WANG Shi, ZHU Junwu, LIANG Mingxuan, LI Yang, JIAO Zhixiang. Overview of Named Entity Recognition Tasks [J]. Computer Science, 2023, 50(6A): 220200119-8.
[7] LI Han, HOU Shoulu, TONG Qiang, CHEN Tongtong, YANG Qimin, LIU Xiulei. Entity Relation Extraction Method in Weapon Field Based on DCNN and GLU [J]. Computer Science, 2023, 50(6A): 220200112-7.
[8] QI Xuanlong, CHEN Hongyang, ZHAO Wenbing, ZHAO Di, GAO Jingyang. Study on BGA Packaging Void Rate Detection Based on Active Learning and U-Net++ Segmentation [J]. Computer Science, 2023, 50(6A): 220200092-6.
[9] GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[10] DUAN Jianyong, YANG Xiao, WANG Hao, HE Li, LI Xin. Document-level Relation Extraction of Graph Attention Convolutional Network Based onInter-sentence Information [J]. Computer Science, 2023, 50(6A): 220800189-6.
[11] HUANG Jiange, JIA Zhen, ZHANG Fan, LI Tianrui. Chinese Medical Named Entity Recognition Based on Multi-feature Embedding [J]. Computer Science, 2023, 50(6): 243-250.
[12] GUO Wei, HUANG Jiahui, HOU Chenyu, CAO Bin. Text Classification Method Based on Anti-noise and Double Distillation Technology [J]. Computer Science, 2023, 50(6): 251-260.
[13] ZHU Taojie, LU Jicang, ZHOU Gang, DING Xiaoyao, WANG Ling, ZHU Xiubao. Review of Document-level Relation Extraction Techniques [J]. Computer Science, 2023, 50(5): 189-200.
[14] LIU Pan, GUO Yanming, LEI Jun, LAO Mingrui, LI Guohui. Study on Chinese Named Entity Extraction Rules Based on Boundary Location and Correction [J]. Computer Science, 2023, 50(3): 276-281.
[15] WEN Kunjian, CHEN Yanping, HUANG Ruizhang, QIN Yongbin. Biomedical Relationship Extraction Method Based on Prompt Learning [J]. Computer Science, 2023, 50(10): 223-229.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!