Computer Science ›› 2025, Vol. 52 ›› Issue (11A): 250600198-10.doi: 10.11896/jsjkx.250600198

• Artificial Intelligence • Previous Articles     Next Articles

Review of Application of Information Extraction Technology in Digital Humanities

WEI Hao1,2,3, ZHANG Zongyu1, DIAO Hongyue1,2,3, DENG Yaochen2,3   

  1. 1 School of Software,Dalian University of Foreign Languages,Dalian,Liaoning 116044,China
    2 China Research Center for Northeast Asian Languages,Dalian University of Foreign Languages,Dalian,Liaoning 116044,China
    3 Liaoning New Lab for Innovations in Digital Humanities,Dalian University of Foreign Languages,Dalian,Liaoning 116044,China
  • Online:2025-11-15 Published:2025-11-10
  • Supported by:
    General Projects of the 14th Five-Year Plan of the Language Commission, China(YB145-82),Liaoning Provincial Department of Education project,China(LJKQZ20222451,JYTQN2023149) and Natural Science Foundation of Liaoning Province,China(2024-BS-203).

Abstract: Digital humanities,as an emerging interdisciplinary field integrating computer science and humanities,aims to address research challenges in humanities through digital technologies,thereby advancing disciplinary development,cultural heritage preservation,and cultural dissemination.Information extraction,a core task in natural language processing,enables the automatic extraction of structured knowledge from unstructured texts,providing valuable data support for digital humanities research.This review systematically examines the applications of information extraction technologies in digital humanities,focusing on three key subtasks:named entity recognition,relation extraction,and event extraction.The study traces the evolution of these tasks from early rule-based and dictionary methods to traditional machine learning approaches,and further to current mainstream techniques based on deep learning and pre-trained language models,analyzing the trajectory of technological advancements.Furthermore,the review discusses the unique challenges of information extraction in digital humanities,including data scarcity,complex text structures,ambiguous entity boundaries,and implicit relationship expressions,while critically evaluating the applicability and limitations of existing methods.Finally,future research directions are outlined,such as multimodal information extraction,cross-lingual processing,optimization for low-resource scenarios,knowledge graph construction,and language generation technologies.The review offers theoretical insights and practical guidance for further research and applications of information extraction in digital humanities.

Key words: Digital humanities, Natural language processing, Information extraction, Named entity recognition, Relation extraction, Event extraction, Deep learning

CLC Number: 

  • TP391
[1]FENG Z W.Four Levels of Digital Humanities Research[J].Journal of School of Chinese Language and Culture Nanjing Normal University,2023(3):1-9.
[2]DING H D,ZHOU Z Q.Digital Humanities:A New Landscape of Social Memory Reproduction in the Digital Age[J].Information Science,2023,41(11):1-7,27.
[3]GUO X Y,HE T T.Survey about Research on Information Extraction[J].Computer Science,2015,42(2):14-17,38.
[4]NOBLE W S.What is a support vector machine?[J].Nature biotechnology,2006,24(12):1565-1567.
[5]RABINER L,JUANG B.An introduction to hidden Markovmodels[J].IEEE ASSP Magazine,1986,3(1):4-16.
[6]LAFFERTY J,MCCALLUM A,PEREIRA F C N.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the 18th International Conference on Machine Learning.2001:282-289.
[7]LI Z,LIU F,YANG W,et al.A Survey of Convolutional Neural Networks:Analysis,Applications,and Prospects[J].IEEE transactions on neural networks and learning systems,2022,33(12):6999-7019.
[8]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[9]DEY R,SALEM F M.Gate-variants of gated recurrent unit(GRU) neural networks[C]//2017 IEEE 60th international midwest symposium on circuits and systems(MWSCAS).IEEE,2017:1597-1600.
[10]PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2018:2227-2237.
[11]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J].URL https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/languageunderstanding paper.pdf,2018.
[12]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[13]WANG Y Q,ZHOU Q S.A Research on Internet Open Source Information Extraction Based on Pre-trained Language Model and Intelligence Analysis Application:Take “Academic,Lecture,Forum” and Other Conference Activities as an Example[J].Information Studies:Theory & Application,2024,47(1):154-163.
[14]YANG S,FENG D,QIAO L,et al.Exploring pre-trained lan-guage models for event extraction and generation[C]//Proceedings of the 57th annual meeting of the association for computational linguistics.2019:5284-5294.
[15]GIORGI J,WANG X,SAHAR N,et al.End-to-end named entity recognition and relation extraction using pre-trained language models[J].arXiv:1912.13415,2019.
[16]LEE J,YOON W,KIM S,et al.BioBERT:a pre-trained biomedical language representation model for biomedical text mining[J].Bioinformatics,2020,36(4):1234-1240.
[17]ZHAO W X,ZHOU K,LI J,et al.A survey of large language models[J].arXiv:2303.18223,2023.
[18]NADEAU D,SEKINE S.A survey of named entity recognition and classification[J].Lingvisticae Investigationes,2007,30(1):3-26.
[19]ZHU X.Person Name Entity Recognition and Part of SpeechTagging in Ancient Chinese Chronology[D].Fudan University,2012.
[20]LE J,ZHAO X.Algorithm of Beijing Opera Organization Names Entity Recognition Based on HMM[J].Computer Engineering,2013,39(6):266-271,286.
[21]DÍEZ PLATAS M L,ROS MUNOZ S,GONZÁLEZ-BLANCO E,et al.Medieval Spanish(12th-15th centuries) named entity recognition and attribute annotation system based on contextual information[J].Journal of the Association for Information Science and Technology,2021,72(2):224-238.
[22]SHE J,ZHANG X Q.Musical named entity recognition method[J].Journal of Computer Applications,2010,30(11):2928-2931,2948.
[23]YU H K,ZHANG H P,LIU Q,et al.Chinese named entityidentification using cascaded hidden Markov model[J].Journal on Communications,2006(2):87-94.
[24]LI H,ZHU L L,LIU J Y,et al.Research on the Organization of Bamboo and Silk Medical Knowledge Based on Ontology[J].Library and Information Service,2022,66(22):16-27.
[25]WANG D B,GAO R Q,SHEN S,et al.Research on Automatic Recognition of Basic Entity Component of Historic Events for Pre-Qin Classics[J].Journal of the National Library of China,2018,27(1):65-77.
[26]ETZIONI O,CAFARELLA M,DOWNEY D,et al.Unsuper-vised named-entity extraction from the web:An experimental study[J].Artificial intelligence,2005,165(1):91-134.
[27]VAN DALEN-OSKAM K,DE DOES J,MARX M,et al.Named entity recognition and resolution for literary studies[J].Computational Linguistics in the Netherlands Journal,2014,4:121-136.
[28]LIU L,QIN T Y,WANG D B.Automatic Extraction of Traditional Music Terms of Intangible Cultural Heritage[J].Data Analysis and Knowledge Discovery,2020,4(12):68-75.
[29]LI N.Construction of Automatic Recognition Model of Function Entities in Local Chronicles:Produce Based on Deep Learning[J].Digital Library Forum,2022(12):19-28.
[30]ZHANG W,WANG H,DENG S H,et al.Sentiment Term Extraction and Application of Chinese Ancient Poetry Text for Digital Humanities[J].Journal of Library Science in China,2021,47(4):113-131.
[31]FAN T,WANG H,ZHANG W,et al.Extracting Entities from Intangible Cultural Heritage Texts Based on Machine Reading Comprehension[J].Data Analysis and Knowledge Discovery,2022,6(12):70-79.
[32]WANG L,WANG H,LI X M,et al.Thesaurus Developmentand Application in the Field of Intangible Cultural Heritage Ceramics Incorporated with Learning Extension[J].Library Tribune,2024,44(2):66-78.
[33]EMHA T L,YUSOH Z I M,ABOOBAIDER B M.BERT based named entity recognition for automated Hadith narrator identification[J].International Journal of Advanced Computer Science and Applications,2022,13(1).
[34]LIU S,YANG H,LI J,et al.Chinese named entity recognition method in history and culture field based on BERT[J].International Journal of Computational Intelligence Systems,2021,14:1-10.
[35]AFFI M,LATIRI C.Arabic named entity recognition using variant deep neural network architectures and combinatorial feature embedding based on CNN,LSTM and BERT[C]//Proceedings of the 36th Pacific Asia Conference on Language,Information and Computation.2022:302-312.
[36]FANG Z,WU L C,KONG X,et al.A Comparative Analysis of Word Segmentation,Part-of-Speech Tagging,and Named Entity Recognition for Historical Chinese Sources,1900-1950[J].arXiv:2503.19844,2025.
[37]HILTMANN T,DRÖGE M,DRESSELHAUS N,et al.NER4all or Context is All You Need:Using LLMs for low-effort,high-performance NER on historical texts.A humanities informed approach[J].arXiv:2502.04351,2025.
[38]LIU H,JIANG Q J,GUI Q J,et al.Review of research progress of entity relationship extraction[J].Application Research of Computers,2020,37(S2):1-5.
[39]CUI B,WANG D B,HUANG S Q.The Analysis of Time Distribution and Evolution Characteristics of Crops in Classics:Taking Shihuozhi as an Example[J].Library and Information Service,2021,65(14):90-100.
[40]QIAN Z Y,CHEN T,XU Y,et al.Research on Construction and Application of Knowledge Graph of Vocabulary Interpretation in Ancient Classical Dictionaries[J].Library Journal,2023,42(8):82-88,123.
[41]LOPER E E D.Applying semantic relation extraction to information retrieval[D].Massachusetts Institute of Technology,2000.
[42]EUGENE A,LUIS G.Extracting relations from large plain-text collections[J].Proc.ACM,2000,2000(10.1145):336597.336644.
[43]FAN C,LI Y.Network extraction and analysis of character relationships in Chinese literary works[J].Computational Intelligence and Neuroscience,2022,2022(1):7295834.
[44]SUN S H.Research on key technologies of information extraction in traditional Chinese medicine acupuncture and moxibustion[D].Dalian University of Technology,2020.
[45]XIE K W.Research on Text based Crop Disease and Pest Relation Extraction Technology[D].Hunan Agricultural University,2023.
[46]YANG X H,SHAN Y H,XIE D,et al.Relation Extraction of Traditional Chinese Medicine Prescription and Disease Based on Literature Abstracts Data[J].Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology,2017,19(7):1167-1172.
[47]MA Y K,FENG Y C.Research on Traditional Chinese Medical Text Implicit Relation Extraction Method[J].Journal of Zhengzhou University(Natural Science Edition),2024,56(2):34-42.
[48]ELSON D,DAMES N,MCKEOWN K.Extracting social net-works from literary fiction[C]//Proceedings of the 48th annual meeting of the association for computational linguistics.2010:138-147.
[49]SUN Y,WANG L K,GUO L L.Tibetan Entity Relation Extraction Based on Optimized Word Embedding with GRU Neural Network[J].Journal of Chinese Information Processing,2019,33(6):35-41.
[50]ZHANG Q.Research on Multidimensional Knowledge Organization and Visualization of Records of the Grand Historian[D].Nanjing Agricultural University,2022.
[51]TANG X M,SU Q,WANG J.Classifying Ancient Chinese Text Relations with Entity Information[J].Data Analysis and Knowledge Discovery,2024,8(1):114-124.
[52]SONG X Y,ZHANG X Q,ZHANG W M.Research on Know-ledge Element Organization and Visualization of Intangible Cultural Heritage of Shuishu Customs[J].Journal of Modern Information,2023,43(10):3-15.
[53]ZENG G,ZHAO X Q.Research on Knowledge Extraction and Organization of Wanli Tea Ceremony Digital Resources Based on Knowledge Elements[J].Information Studies:Theory & Application,2021,44(10):173-178,164.
[54]LIANG L X,LIN L,LIN E,et al.A Joint Learning Model to Extract Entities and Relations for Chinese Literature Based on Self-Attention[J].Mathematics,2022,10(13):2216.
[55]GABUD R,LAPITAN P,MARIANO V,et al.A Hybrid of Rule-based and Transformer-based Approaches for Relation Extraction in Biodiversity Literature[C]//Proceedings of the 2nd Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning.2023:103-113.
[56]CRUCIANI G.Extracting Relations from Ecclesiastical Cultural Heritage Texts[C]//Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities.2024:41-50.
[57]WANG H C,ZHOU C L,PETRESCU M G.Survey on Event Extraction Based on Deep Learning[J].Journal of Software,2023,34(8):3905-3923.
[58]JIANG D L.Research on extraction of emergency event information based on rules matching[J].Computer Engineering and Design,2010,31(14):3294-3297.
[59]FENG Y H.Research on Information Extraction Technology in Tibetan Cultural Field[D].Minzu University of China,2017.
[60]CYBULSKA A,VOSSEN P.Historical event extraction fromtext[C]//Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage,Social Sciences,and Humanities.2011:39-43.
[61]YU J D,FAN X Z,PANG W B.Research on Semantic Role Labeling for Event Information Extraction[J].Computer Science,2008(3):155-157.
[62]JING Y C,HUANG Z.Public Opinions Event Extraction based on Language Feature[J].Information Security and Communications Privacy,2015(4):96-100.
[63]CHEN X X,LIU B.Extracting Open Domain Events in Microblogs[J].Computer Applications and Software,2016,33(8):18-22,109.
[64]QIU P Y,ZHANG H C,YU L,et al.Automatic Event Labeling for Traffic Information Extraction from Microblogs[J].Journal of Chinese Information Processing,2017,31(2):107-116.
[65]LI J,RITTER A,CARDIE C,et al.Major life event extraction from twitter based on congratulations/condolences speech acts[C]//Proceedings of the 2014 conference on empirical methods in natural language processing(EMNLP).2014:1997-2007.
[66]HU H J,WANG C,DAI J H,et al.Social Emergency EventJudgement Based on BiLSTM-CRF[J].Journal of Chinese Information Processing,2022,36(3):154-161.
[67]DANG J F.Research on Knowledge Extraction Method of Chinese Classics Based on Deep Learning[D].North University of China,2021.
[68]YU X H,HE L,XU J.Extracting Events from Ancient Books Based on RoBERTa-CRF[J].Data Analysis and Knowledge Discovery,2021,5(7):26-35.
[69]WANG Y Y,WANG H,ZHU H,et al.Research on the Con-struction of an Event Recognition Model for Historical Antique Books Based on Text Generation Technology[J].Library and Information Service,2023,67(3):119-130.
[70]ZHANG P J,WANG L,MA B,et al.Uyghur event extraction based on pre-trained language model[J].Computer Engineering and Design,2023,44(5):1487-1494.
[1] YIN Shi, SHI Zhenyang, WU Menglin, CAI Jinyan, YU De. Deep Learning-based Kidney Segmentation in Ultrasound Imaging:Current Trends and Challenges [J]. Computer Science, 2025, 52(9): 16-24.
[2] ZENG Lili, XIA Jianan, LI Shaowen, JING Maike, ZHAO Huihui, ZHOU Xuezhong. M2T-Net:Cross-task Transfer Learning Tongue Diagnosis Method Based on Multi-source Data [J]. Computer Science, 2025, 52(9): 47-53.
[3] LI Yaru, WANG Qianqian, CHE Chao, ZHU Deheng. Graph-based Compound-Protein Interaction Prediction with Drug Substructures and Protein 3D Information [J]. Computer Science, 2025, 52(9): 71-79.
[4] LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[5] LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211.
[6] LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion [J]. Computer Science, 2025, 52(9): 259-268.
[7] CHENG Zhangtao, HUANG Haoran, XUE He, LIU Leyuan, ZHONG Ting, ZHOU Fan. Event Causality Identification Model Based on Prompt Learning and Hypergraph [J]. Computer Science, 2025, 52(9): 303-312.
[8] TANG Boyuan, LI Qi. Review on Application of Spatial-Temporal Graph Neural Network in PM2.5 ConcentrationForecasting [J]. Computer Science, 2025, 52(8): 71-85.
[9] ZHANG Shiju, GUO Chaoyang, WU Chengliang, WU Lingjun, YANG Fengyu. Text Clustering Approach Based on Key Semantic Driven and Contrastive Learning [J]. Computer Science, 2025, 52(8): 171-179.
[10] LIU Le, XIAO Rong, YANG Xiao. Application of Decoupled Knowledge Distillation Method in Document-level RelationExtraction [J]. Computer Science, 2025, 52(8): 277-287.
[11] LIU Zhengyu, ZHANG Fan, QI Xiaofeng, GAO Yanzhao, SONG Yijing, FAN Wang. Review of Research on Deep Learning Compiler [J]. Computer Science, 2025, 52(8): 29-44.
[12] ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[13] FAN Xing, ZHOU Xiaohang, ZHANG Ning. Review on Methods and Applications of Short Text Similarity Measurement in Social Media Platforms [J]. Computer Science, 2025, 52(6A): 240400206-8.
[14] YANG Jixiang, JIANG Huiping, WANG Sen, MA Xuan. Research Progress and Challenges in Forest Fire Risk Prediction [J]. Computer Science, 2025, 52(6A): 240400177-8.
[15] ZHENG Xinxin, CHEN Fan, SUN Baodan, GONG Jianguang, JIANG Junhui. Question Answering System for Soybean Planting Management Based on Knowledge Graph [J]. Computer Science, 2025, 52(6A): 240500025-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!