计算机科学 ›› 2014, Vol. 41 ›› Issue (4): 178-183.

• 软件与数据库技术 • 上一篇    下一篇

一类受限正则表达式的推断算法

冯晓强,郑黎晓,陈海明   

  1. 中国科学院软件研究所计算机科学国家重点实验室 北京100190;华侨大学计算机科学与技术学院 厦门361021;中国科学院软件研究所计算机科学国家重点实验室 北京100190
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61070038),华侨大学人才引进科研启动基金项目(12BS215)资助

Inferring Algorithm for a Subclass of Restricted Regular Expressions

FENG Xiao-qiang,ZHENG Li-xiao and CHEN Hai-ming   

  • Online:2018-11-14 Published:2018-11-14

摘要: XML模式推断问题的主要任务可以归约为从一个句子集合中推断出对应的确定型正则表达式。提出了一类在XML模式中大量出现的受限正则表达式,给出了该类正则表达式的推断算法。该算法首先根据给定的句子集合构造自动机,然后根据自动机和句子集合推断出对应的正则表达式。该算法的时间复杂度为max(O(|V|+|E|),O(L)),其中V和E分别表示自动机的节点集合和边集合,L表示句子集合中所有句子的长度之和。对算法的终止性和正确性进行了证明。

关键词: XML模式,模式推断,正则表达式,自动机,算法

Abstract: The problem of inferring XML schemas reduces to inferring deterministic regular expressions from a set of sentences.A subclass of restricted regular expressions which commonly occur in practical XML schemas was proposed.An algorithm for inferring this kind of regular expressions was described.The algorithm first constructs the corresponding automata according to the sentence set,then infers the regular expression from the automata and the sentence set.The complexity of the algorithm is max(O(|V|+|E|),O(L)) where V and E are the set of states and the set of edges of the constructed automata respectively,and L is the total length of sentences.The termination and correctness of the algorithm were proved.

Key words: XML schema,Schema inference,Regular expression,Automata,Algorithm

[1] Abiteboul S,Buneman P,Suciu D.Data on the Web:From Relations to Semistructured Data and XML [M].Burlington,USA:Morgan Kaufmann,1999
[2] Bray T,Paoli J,Sperberg-McQueen C M,et al.Extensible Mark-up Language (XML) 1.0(Fifth Edition) [EB/OL].http://www.w3.org/TR/2008/REC-xml-20081126/,2008
[3] Gao Shu-di,Sperberg-McQueen C M,Thompson H S.XMLSchema Definition Language (XSD) 1.1Part 1:Structures [EB/OL].http://www.w3.org/TR/2012/REC-xmlschema11-1-20120405/,2012
[4] Benedikt M,Fan Wen-fei,Geerts F.XPath satisfiability in thepresence of DTDs [J].Journal of the ACM,2008,55(2):1-79
[5] Rahm E,Bernstein P A.A survey of approaches to automatic schema matching [J].The VLDB Journal,2001,10(4):334-350
[6] Barbosa D,Mignet L,Veltri P.Studying the XML Web:gathering statistics from an XML sample [J].World Wide Web:Internet and Web Information Systems,2005,8(4):413-438
[7] Bex G J,Neven F,Van den Bussche J.DTDs versus XML Schema:a practical study [C]∥WebDB.2004:79-84
[8] Martens W,Neven F,Schwentick T,et al.Expressiveness andComplexity of XML Schema [J].ACM Transactions on Database Systems,2006,31(3):770-813
[9] Sahuguet A.Everything you ever wanted to know about DTDs,but were afraid to ask [C]∥WebDB.2000:171-183
[10] Bex G J,Gelade W,Neven F,et al.Learning deterministic regular expressions for the inference of schemas from XML data [J].ACM Transactions on The Web,2010,4(4):1-32
[11] Bex G J,Neven F,Schwentick T,et al.Inference of concise regular expresssions and DTDs [J].ACM Transactions on Database Systems,2010,35(2):1-47
[12] Chen Hai-ming,Lu Ping.Assisting the design of XML Schema:diagnosing nondeterministic content models [C]∥APWeb.2011:301-312
[13] Brüggemann-Klein A,Wood D.One-unambiguous regular lan-guage [J].Information and Computation,1998,142(2):182-206
[14] García P J,Vidal E.Inference of k-testable languages in thestrict sense and application to syntactic pattern recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1990,12(9):920-925
[15] Freydenberger D D,Ktzing T.Fast learning of restricted regular expressions and DTDs [C]∥ICDT.2013:45-56
[16] Cormen T H,Leiserson C E,Rivest R L,et al.Introduction to Algorithms [M].Cambridge,USA:The MIT Press,2001
[17] Garofalakis M,Gionis A,Rastogi R.XTRACT:Learning document type descriptors from XML document collections [J].Data Mining and Knowledge Discovery,2003,7(1):23-56
[18] Rissanen J.Modeling by shortest data description [J].Automa-tica,1978,14(5):465-471
[19] Min J-K,Ahn J-Y,Chung C-W.Efficient extraction of schemas for XML documents [J].Information Processing Letters,2003,85(1):7-12

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!