计算机科学 ›› 2023, Vol. 50 ›› Issue (6): 142-150.doi: 10.11896/jsjkx.230300071

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于智能映射推荐的知识图谱实例构建与演化方法

张雅晴1,2, 单中原1,2, 赵俊峰1,2,3, 王亚沙1,2,3   

  1. 1 北京大学计算机学院 北京 100871
    2 高可信软件技术教育部重点实验室 北京 100871
    3 北京大学(天津滨海)新一代信息技术研究院 天津 300450
  • 收稿日期:2023-03-08 修回日期:2023-04-13 出版日期:2023-06-15 发布日期:2023-06-06
  • 通讯作者: 王亚沙(wangyasha@pku.edu.cn)
  • 作者简介:(yaqing_zhang@stu.pku.edu.cn)
  • 基金资助:
    国家自然科学基金(62172011);中央高校基本科研业务费

Intelligent Mapping Recommendation-based Knowledge Graph Instance Construction and Evolution Method

ZHANG Yaqing1,2, SHAN Zhongyuan1,2, ZHAO Junfeng1,2,3, WANG Yasha1,2,3   

  1. 1 School of Computer Science,Peking University,Beijing 100871,China
    2 Key Laboratory of High Confidence Software Technologies,Ministry of Education,Beijing 100871,China
    3 Peking University Information Technology Institute(Tianjin Binhai),Tianjin 300450,China
  • Received:2023-03-08 Revised:2023-04-13 Online:2023-06-15 Published:2023-06-06
  • About author:ZHANG Yaqing,born in 1999,postgraduate,is a member of China Computer Federation.Her main research interests is knowledge graph.WANG Yasha,born in 1975,Ph.D,professor.His main research interests include big data analysis,artificial intelligence,and urban computing.
  • Supported by:
    National Natural Science Foundation of China(62172011) and Fundamental Research Funds for the Central Universities.

摘要: 随着大数据技术的深入发展,各领域产生了海量异构数据,构建知识图谱是实现异构数据语义互通的重要手段。通过将结构化数据与本体模型映射匹配来生成实例模型是图谱实例层构建常用的方法。然而,对于复杂异构的领域数据来说,现有映射式实例构建方法大多需要用户手动完成全部映射匹配,映射操作繁琐,无法进行智能匹配,费时费力且容易出错。除此之外,现有方法对实例导入后的增量更新也支持不足。针对现有模式匹配和实例构建方法的映射操作繁琐的问题,提出了基于智能映射推荐的实例构建与演化方法。其中,智能映射复用推荐机制,在用户手动映射之前进行数据模式匹配计算,对元素级相似度、表级相似度和表间传播相似度进行多级相似度综合计算,根据数据模式匹配度仲裁排序后生成推荐映射。另外,增量发现机制通过自动发现冗余实例和冲突实例,生成系统后台任务进行处理,可实现实例的高效无重复导入。在山东市政府开放数据集和深圳市医疗急救数据集上进行了实验,在映射复用推荐模块的辅助下,交互时间缩短为传统模式的约26%,字段推荐匹配准确率达到98.1%;在增量发现模块的实验中,导入了1 394万个实例节点以及2 158万条关系边所需的时间由31.21 h缩短至2.23 h,验证了智能映射复用推荐的可用性和匹配准确率,提高了实例层构建与演化的效率。

关键词: 知识图谱, 模式匹配, 映射复用, 实例构建, 图谱演化

Abstract: With the development of big data technology,a large amount of heterogeneous data has been generated in various fields.Constructing knowledge graph is an important means to realize semantic intercommunication of heterogeneous data.It is a common method to generate instance model by matching structured data with ontology model mapping.However,most of the existing construction methods require users to manually complete all mapping matching,and the mapping operation is time-consuming and error-prone,unable to perform intelligent matching.In addition,the existing methods do not support incremental updates of the instances.This paper analyzes the existing instance construction methods,and proposes an instance construction and evolution method based on intelligent mapping recommendation to solve the problem of cumbersome manual mapping.Before manually mapping by users,the mapping reuse recommendation mechanism performs multilevel similarity calculation,including element-level similarity,table-level similarity and inter-table propagation similarity,and generates recommendation mapping according to the sorting result of matching.In addition,the incremental discovery mechanism can automatically discover redundant and conflicting instances and generate system background tasks for processing,so as to realize efficient and repeatless import of instances.Experiments are carried out on Shandong government open dataset and Shenzhen medical emergency dataset.With the help of the mapping reuse recommendation module,the interaction time is 3~4 times shorter than that of the traditional mode,and the matching accuracy of field recommendation reaches 98.1%.In the experiment of incremental discovery mechanism,the time required to import 13.94 million instance nodes and 21.58 million relationship edges is reduced from 31.21h to 2.23h,which proves the availability and matching accuracy of intelligent mapping reuse recommendation,and improves the efficiency of instance layer construction and growth.

Key words: Knowledge graph, Schema matching, Mapping reusing, Instance construction, Graph evolution

中图分类号: 

  • TP311
[1]SINGHAL A.Introducing the knowledgegraph:things,notstrings[Z/OL].Official google blog.2012 http://googleblog.blogspot.pt/2012/05/introducing-knowledge-graph-things-not.html.
[2]RAHM E,BERNSTEIN P A.A survey of approaches to automatic schema matching[J].the VLDB Journal,2001,10(4):334-350.
[3]MASSMANN S,RAUNICH S,AUMÜLLER D,et al.Evolution of the COMA match system[J].Ontology Matching,2011,49:49-60.
[4]SHVAIKO P,EUZENAT J.Ontology matching:state of the art and future challenges[J].IEEE Transactions on Knowledge and Data Engineering,2011,25(1):158-76.
[5]ARENAS M,BERTAILS A,PRUD'HOMMEAUX E,et al.A direct mapping of relational data to RDF[J].W3C recommendation,2012,27:1-11.
[6]SOURIPRIYA DAS S S,RICHARD CYGANIAK.R2RML:RDB to RDF Mapping Language [OL].https://wwww3org/TR/r2rml/.
[7]BERNSTEIN P A,MADHAVAN J,RAHM E.Generic schema matching,ten years later[C]//Proceedings of the VLDB Endowment.2011:695-701.
[8]PAPAPANAGIOTOU P,KATSIOULI P,TSETSOS V,et al.RONTO:Relationalto ontology schema matching[J].AIS Sigsemis Bulletin,2006,3(3/4):32-36.
[9] WANG F,WANG Y S,ZHAO J F,et al.A Schema Matching Method from relational model to ontology Model Based on Iteration[J].Journal of Software,2019,30(5):1510-1521.
[10]SEQUEDA J F,MIRANKER D P.Ultrawrap Mapper:A Semi-Automatic Relational Database to RDF(RDB2RDF) Mapping Tool[C]//Proceedings of the ISWC(Posters & Demos).2015.
[11]Pentaho Data Integration-Pentaho Documentation [OL].https://helphitachivantaracom/Documentation/Pentaho/93.
[12]ArcGIS [OL].https://developersarcgiscom/.
[13]PKUMOD.gBuilder [OL].http://wwwopenkgcn/tool/gbuilder/.
[14]Spring Boot [OL].https://springio/projects/spring-boot.
[15]MELNIK S,GARCIA-MOLINA H,RAHM E.Similarity floo-ding:A versatile graph matching algorithm and its application to schema matching[C]//Proceedings 18th International Confe-rence on Data Engineering.IEEE,2002:117-128.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!