Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250500052-10.doi: 10.11896/jsjkx.250500052

• Artificial Intelligence • Previous Articles     Next Articles

Construction and Application of Dataset Knowledge Graph Based on Metadata Semantic Enhancement

SHEN Jianwei1,2, CHEN Jiawen1,2, CHEN Hanlin1,2, MA Xinjian3,4, CHEN Xing1,2   

  1. 1 College of Computer and Data Science,Fuzhou University,Fuzhou 350116,China
    2 Fujian Key Laboratory of Network Computing and Intelligent Information Processing(Fuzhou University),Fuzhou 350116,China
    3 National Key Laboratory of Data Space Technology and System,Beijing 100195,China
    4 Advanced Institute of Big Data,Beijing.AIBD,Beijing 100195,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:SHEN Jianwei,born in 2001,postgraduate.His main research interests include large language models,knowledge graphs.
    MA Xinjian,born in 1987,Ph.D.His main research interests include information security,distributed systems.
  • Supported by:
    National Natural Science Foundation of China(62072108),Special Funds for Promoting High-quality Development of Marine and Fishery Industries in Fujian Province(FJHYF-ZH-2023-02),Fujian Key Technological Innovation and Industrialization Projects(2024XQ004) and National Key Laboratory of Data Space Technology and System(QZQC2024007).

Abstract: The rapid expansion of data resources has led to a significant emphasis on the effective organization,discovery,and utilization of datasets within the domain of data management.Conventional approaches that rely on metadata matching or statistical retrieval often fail to adequately capture intricate semantic relationships,resulting in diminished accuracy and interpretability in the retrieval of datasets.In response to this challenge,this study proposes a methodology for the construction and retrieval of dataset knowledge graphs through the enhancement of metadata semantics,with the objective of augmenting the semantic retrieval capabilities of datasets.Initially,it standardizes dataset metadata in accordance with the W3C DCAT specification to establish a foundational knowledge graph that encompasses essential attributes such as titles,keywords,subject categories,and data items.Subsequently,to address the shortcomings associated with the semantic descriptions of metadata,it incorporates the Wikidata general knowledge graph to enrich entity semantics via cross-domain semantic expansion.In the retrieval phase,the BERT-BiLSTM-CRF model is employed to extract key entities from user queries and construct semantic relationship subgraphs.By integrating entity vector representations generated via Wikipedia2vec,it implements structured semantic retrieval ranking using cosine similarity calculations.Experiments conducted on the government open data platforms of Fuzhou and Shenzhen demonstrate that the proposed method achieves Top-10 hit rates of 97.92% and 98.25%,respectively-representing improvements of 8.92%~12.04% over traditional BM25 and 5.72%~8.96% over Word2Vec-enhanced methods.The results highlight that semantic enhancement via Wikidata and structured graph matching significantly boost retrieval accuracy by explicitly modeling entity relationships and enriching metadata semantics.This study provides a feasible technical solution for enhancing dataset discovery in scenarios such as open data platforms and research data management,showcasing the effectiveness of integrating semantic enrichment with knowledge graph structures.

Key words: Metadata semantic enhancement, Dataset knowledge graph, Knowledge graph construction, Semantic retrieval, Open data platforms

CLC Number: 

  • TP391
[1] LUO P C,WANG J M,WANG S Q,et al.Research on retrieval method of scientific dataset based on deep learning[J].Information Studies:Theory & Application,2022,45(7):49-56.
[2] CHAPMAN A,SIMPERL E,KOESTEN L,et al.Dataset search:a survey[J].The VLDB Journal,2020,29(1):251-272.
[3] YANG B,ZHAO Y,JIAO H.Comparative study of international major scientific dataset retrieval platforms[J].Technology Intelligence Engineering,2020,6(1):22-33.
[4] BRICKLEY D,BURGESS M,NOY N.Google Dataset Search:Building a search engine for datasets in an open Web ecosystem[C]//The World Wide Web Conference.2019:1365-1375.
[5] EHRLINGER L,SCHROTT J,MELICHAR M,et al.Data catalogs:a systematic literature review and guidelines to implementation[C]//Database and Expert Systems Applications(DEXA 2021) Workshops:BIOKDD,IWCFS,MLKgraphs,AI-CARES,ProTime,AISys 2021,Virtual Event,September 27-30,2021,Proceedings 32.Springer International Publishing,2021:148-158.
[6] REINANDA R,MEIJ E,DE RIJKE M.Knowledge graphs:An information retrieval perspective[J].Foundations and Trends© in Information Retrieval,2020,14(4):289-444.
[7] ZOU X.A survey on application of knowledge graph[C]//Journal of Physics:Conference Series.IOP Publishing,2020,1487(1):012016.
[8] PENG C,XIA F,NASERIPARSA M,et al.Knowledge graphs:Opportunities and challenges[J].Artificial Intelligence Review,2023,56(11):13071-13102.
[9] ALBERTONI R,BROWNING D,COX S,et al.The W3C data catalog vocabulary,version 2:Rationale,design principles,and uptake[J].Data Intelligence,2024,6(2):457-487.
[10] DUDEK J,MONGEON P,BERGMANS J.DataCite as a Potential Source for Open Data Indicators[C]//ISSI.2019:2037-2042.
[11] YAO Y,MAO S,ZHANG N,et al.Schema-aware reference as prompt improves data-efficient knowledge graph construction[C]//Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.2023:911-921.
[12] ALTMAN M,CASTRO E,CROSAS M,et al.Open journal systems and dataverse integration-helping journals to upgrade data publication for reusable research[J].Code4Lib Journal,2015(30).
[13] WYLOT M,HAUSWIRTH M,CUDRÉ-MAUROUX P,et al.RDF data storage and query processing schemes:A survey[J].ACM Computing Surveys(CSUR),2018,51(4):1-36.
[14] MOUNTANTONAKIS M,TZITZIKAS Y.Large-scale semantic integration of linked data:A survey[J].ACM Computing Surveys(CSUR),2019,52(5):1-40.
[15] CHEN X,GURURAJ A E,OZYURT B,et al.DataMed-an open source discovery index for finding biomedical datasets[J].Journal of the American Medical Informatics Association,2018,25(3):300-308.
[16] SANSONE S A,GONZALEZ-BELTRAN A,ROCCA-SERRA P,et al.DATS,the data tag suite to enable discoverability of datasets[J].Scientific Data,2017,4(1):1-8.
[17] SHERIDAN J,TENNISON J.Linking UK Government Data[C]//Proceedings of the WWW2010 Workshop on Linked Data on the Web(LDOW 2010).CEUR Workshop Proceedings,2010:1-4.
[18] WANG Z J,CHEN Q Y,HAN F,et al.Research progress and trends of open data in China(1996-2019)[J].Journal of Information Resources Management,2020,10(6):47-59.
[19] CAFARELLA M J,HALEVY A,KHOUSSAINOVA N.Data integration for the relational web[J].Proceedings of the VLDB Endowment,2009,2(1):1090-1101.
[20] GYSEL C V,DE RIJKE M,KANOULAS E.Neural vector spaces for unsupervised information retrieval[J].ACM Transactions on Information Systems(TOIS),2018,36(4):1-25.
[21] GUHA R V,BRICKLEY D,MACBETH S.Schema.org:evolution of structured data on the web[J].Communications of the ACM,2016,59(2):44-51.
[22] OJO A,SENNAIKE O.Constructing knowledge graphs fromdata catalogues[C]//International Conference on Distributed Computing and Internet Technology.Cham:Springer International Publishing,2019:94-107.
[23] SCHOLZ R,TCHOLTCHEV N,LÄMMEL P,et al.Frommetadata catalogs to distributed data processing for smart city platforms and services:A study on the interplay of CKAN and Hadoop[C]//7th International Conference, Cloud Computing and Service Science(CLOSER 2017).Springer International Publishing,2018:115-136.
[24] DAHBI Y,LAMHARHAR H,CHIADMI D.Towards a know-ledge graph for open healthcare data[J].International Journal of Advanced Trends in Computer Science and Engineering,2020,9(4):5654-5662.
[25] WANG J,ARYANI A,WYBORN L,et al.Providing researchgraph data in JSON-LD using Schema.org[C]//Proceedings of the 26th International Conference on World Wide Web Companion.2017:1213-1218.
[26] ZRHAL M,BUCHER B,HAMDI F,et al.Identifying the key resources and missing elements to build a knowledge graph dedicated to spatial dataset search[J].Procedia Computer Science,2022,207:2911-2920.
[27] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems.2013:3111-3119.
[1] ZHANG Junhui, ZAN Hongying, OU Jiale, YAN Ziyue, ZHANG Kunli. Knowledge Annotation Platform-based Knowledge Graph Construction and Application for Water Conservancy Hub Projects [J]. Computer Science, 2024, 51(11): 255-264.
[2] WANG Jing, ZHANG Miao, LIU Yang, LI Haoling, LI Haotian, WANG Bailing, WEI Yuliang. Study on Dual-security Knowledge Graph for Process Industrial Control [J]. Computer Science, 2023, 50(9): 68-74.
[3] JIANG Chuanyu, HAN Xiangyu, YANG Wenrui, LYU Bohan, HUANG Xiaoou, XIE Xia, GU Yang. Survey of Medical Knowledge Graph Research and Application [J]. Computer Science, 2023, 50(3): 83-93.
[4] DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method [J]. Computer Science, 2022, 49(6A): 100-108.
[5] LIANG Jing-ru, E Hai-hong, Song Mei-na. Method of Domain Knowledge Graph Construction Based on Property Graph Model [J]. Computer Science, 2022, 49(2): 174-181.
[6] WANG Xu-yang and WEI Xing-xing. Query Expansion Method Based on Ontology and Local Co-occurrence [J]. Computer Science, 2017, 44(1): 214-218.
[7] LIU Yang, TU Chun-long and ZHENG Feng-bin. Research of Neural Cognitive Computing Model for Visual and Auditory Cross-media Retrieval [J]. Computer Science, 2015, 42(3): 19-25.
[8] . Semantic Retrieval Based on Shallow Semantic Analysis Technology [J]. Computer Science, 2012, 39(6): 107-110.
[9] YANG Xuesong , RAN Jie. Research of Semantic Retrieval Model Based on Idiom Literary Quotation Ontology [J]. Computer Science, 2011, 38(Z10): 219-221.
[10] LIU Qian, LI Hua. Research and Application of Scorm Metadata in Semantic Retrieval of Education Resources [J]. Computer Science, 2011, 38(Z10): 416-418.
[11] ZHANG Liang,QU Zhen-xin,DING Song,TANG Sheng-qun. Semantic Retrieval Based on Weighted Domain Ontology [J]. Computer Science, 2010, 37(7): 165-168.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!