计算机科学 ›› 2010, Vol. 37 ›› Issue (6): 179-185.

• 数据库与数据挖掘 • 上一篇    下一篇

一种基于内容模型图的XML Schema Definition的提取方法

宁静,刘杰,叶丹   

  1. (中国科学院软件研究所软件工程技术中心 北京100190);(中国科学院研究生院 北京100190);(中国科学技术大学计算机科学与技术系 合肥230026)
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家863高技术研究发展计划(2007AA01Z149, 2007AA04Z148)资助。

Novel Approach for Extracting XML Schema Definition Based on Content Model Graph

NING Jing,LIU Jie,YE Dan   

  • Online:2018-12-01 Published:2018-12-01

摘要: 使用XML Schcma能够对XMI文档进行有效性验证以及对查询、转换等操作进行优化,但是实际应用中大量XML文档缺失关联的Schema。提出了一种根据XML文档自动提取对应的XML Schema Definition (XSD)的方法XSDInfer,先根据合并规则将XM工解析过程中分析得到的模式信息记录在内容模型图中,再根据生成规则将其转化为内容模型表达式,进而得到XSD。XSDInfer能够在内存消耗比较少的情况下快速地处理超大规模、深度嵌套的XML文档,同时支持XSD的上下文相关的内容模型,提取出的XSD也具有更好的可读性。实验表明,与同类方法相比,XSDInfer具有更好的可扩展性和表达能力。

关键词: XML, XML Schema Definition,模式提取,内容模型

Abstract: Although XML Schema can be used to perform validation,querying and transformation on XML documents,a lot of XML documents in real applications have no XML Schema defined. This paper presented an approach, XSDInfer, to extract XMLSchema Definition (XSD) from XML documents automatically. Firstly, schema information harvesled from XML parsing was merged into the Content Model Graphs by applying rules. Then the graphs were transformed to content model expressions to generate the XSD. XSDInfer can scale to very large and deep recursive XML documents. It supports the context sensitive content model, and the generated XSD is more human-readable. Experiments show that XSDInfer achieves better performance both in scalability and expressiveness in contrast to the previous techniqucs.

Key words: XML, XML schema definition, Schema extraction, Content model

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!