Computer Science ›› 2017, Vol. 44 ›› Issue (Z11): 414-417.doi: 10.11896/j.issn.1002-137X.2017.11A.088

Previous Articles     Next Articles

Method for Unstructured Data Transformation Based on XML Technology

YANG Jing and ZHOU Shuang-e   

  • Online:2018-12-01 Published:2018-12-01

Abstract: XML,as a semi-structured language,is widely used in converting unstructured information to structured information because of its special characteristic of pre-defined mark.In this work,the complicated unstructured data on the network was converted to XML semi-structured data through POI technology,then the semi-structured data was converted to structured data by parsing XML file through SAX,which would provide convenience for users to search for information.In addition,those efficiencies of parsing of XML files though methods of SAX and DOM were compared in this work for the first time.It demonstrates that the parsing efficiency of SAX is higher than DOM when they are used to parse the same file,and this gap will increase with the size of XML file.

Key words: Big data,Unstructured data,Extensible markup language,Document resolution technology

[1] 万里鹏.非结构化到结构化数据转换的研究与实现[D].成都:西南交通大学,2013.
[2] CHIEW W S,HAW S C,SUBRAMANIAM S,et al.Labeling Schemes for Xml Dynamic Updates:a Survey and Open Discussions[C]∥Proceedings of the 2014 International Conference on E-commerce,e-business and E-service.2014:79-83.
[3] 施伟斌,孙未未,施伯乐.XML数据的结构化处理方法[J].计算机研究与发展,2002,9(7):819-826.
[4] 李爱民,谭献海.基于XML技术的非结构化数据到结构化数据转换的研究[J].铁路计算机应用,2012,21(10):12-15.
[5] MARTENS W,NEVEN F,SCHWENTICK T,et al.Expres-siveness and complexity of XML Schema[J].ACM Transactions on Database Systems,2006,31(3):770-813.
[6] SHANMUGASUNDARAM J,SHEKITA E,BARR R,et al.Efficiently publishing relational data as XML documents[J].The VLDB Journal,2001,10(2):133-154.
[7] 鉴保瑞,宋余庆,陈健美,等.一种基于关系的XML文档模型映射方法[J].计算机应用研究,2011(12):4621-4624.
[8] 冯进,丁博,史殿习,等.XML解析技术研究[J].计算机工程与科学,2009,31(2):120-124.
[9] 贾福林,王国仁,于戈.基于DOM的XML数据库的索引技术研究[J].计算机研究与发展,2004,41(1):175-186.
[10] 赵俊岚.XML编程中的DOM与SAX技术[J].计算机工程,2004,30(24):70-72.
[11] 杨治,鞠时光.基于SAX的XML数据结构聚簇存储方法[J].计算机工程,2008,34(18):72-74.
[12] COLLADO E M,SOTO M A C,DELAMER I M,et al.Embedded XML DOM parser:an approach for XML data processing on networked embedded systems with real-time requirements[J].EURASIP Journal on Embedded Systems,2007,2008(1):163864.
[13] 戴维.POI实现Excel的数据导入导出的研究[J].科技信息,2013(1):107.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!