Computer Science ›› 2014, Vol. 41 ›› Issue (4): 178-183.

Previous Articles     Next Articles

Inferring Algorithm for a Subclass of Restricted Regular Expressions

FENG Xiao-qiang,ZHENG Li-xiao and CHEN Hai-ming   

  • Online:2018-11-14 Published:2018-11-14

Abstract: The problem of inferring XML schemas reduces to inferring deterministic regular expressions from a set of sentences.A subclass of restricted regular expressions which commonly occur in practical XML schemas was proposed.An algorithm for inferring this kind of regular expressions was described.The algorithm first constructs the corresponding automata according to the sentence set,then infers the regular expression from the automata and the sentence set.The complexity of the algorithm is max(O(|V|+|E|),O(L)) where V and E are the set of states and the set of edges of the constructed automata respectively,and L is the total length of sentences.The termination and correctness of the algorithm were proved.

Key words: XML schema,Schema inference,Regular expression,Automata,Algorithm

[1] Abiteboul S,Buneman P,Suciu D.Data on the Web:From Relations to Semistructured Data and XML [M].Burlington,USA:Morgan Kaufmann,1999
[2] Bray T,Paoli J,Sperberg-McQueen C M,et al.Extensible Mark-up Language (XML) 1.0(Fifth Edition) [EB/OL].http://www.w3.org/TR/2008/REC-xml-20081126/,2008
[3] Gao Shu-di,Sperberg-McQueen C M,Thompson H S.XMLSchema Definition Language (XSD) 1.1Part 1:Structures [EB/OL].http://www.w3.org/TR/2012/REC-xmlschema11-1-20120405/,2012
[4] Benedikt M,Fan Wen-fei,Geerts F.XPath satisfiability in thepresence of DTDs [J].Journal of the ACM,2008,55(2):1-79
[5] Rahm E,Bernstein P A.A survey of approaches to automatic schema matching [J].The VLDB Journal,2001,10(4):334-350
[6] Barbosa D,Mignet L,Veltri P.Studying the XML Web:gathering statistics from an XML sample [J].World Wide Web:Internet and Web Information Systems,2005,8(4):413-438
[7] Bex G J,Neven F,Van den Bussche J.DTDs versus XML Schema:a practical study [C]∥WebDB.2004:79-84
[8] Martens W,Neven F,Schwentick T,et al.Expressiveness andComplexity of XML Schema [J].ACM Transactions on Database Systems,2006,31(3):770-813
[9] Sahuguet A.Everything you ever wanted to know about DTDs,but were afraid to ask [C]∥WebDB.2000:171-183
[10] Bex G J,Gelade W,Neven F,et al.Learning deterministic regular expressions for the inference of schemas from XML data [J].ACM Transactions on The Web,2010,4(4):1-32
[11] Bex G J,Neven F,Schwentick T,et al.Inference of concise regular expresssions and DTDs [J].ACM Transactions on Database Systems,2010,35(2):1-47
[12] Chen Hai-ming,Lu Ping.Assisting the design of XML Schema:diagnosing nondeterministic content models [C]∥APWeb.2011:301-312
[13] Brüggemann-Klein A,Wood D.One-unambiguous regular lan-guage [J].Information and Computation,1998,142(2):182-206
[14] García P J,Vidal E.Inference of k-testable languages in thestrict sense and application to syntactic pattern recognition [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1990,12(9):920-925
[15] Freydenberger D D,Ktzing T.Fast learning of restricted regular expressions and DTDs [C]∥ICDT.2013:45-56
[16] Cormen T H,Leiserson C E,Rivest R L,et al.Introduction to Algorithms [M].Cambridge,USA:The MIT Press,2001
[17] Garofalakis M,Gionis A,Rastogi R.XTRACT:Learning document type descriptors from XML document collections [J].Data Mining and Knowledge Discovery,2003,7(1):23-56
[18] Rissanen J.Modeling by shortest data description [J].Automa-tica,1978,14(5):465-471
[19] Min J-K,Ahn J-Y,Chung C-W.Efficient extraction of schemas for XML documents [J].Information Processing Letters,2003,85(1):7-12

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!