Computer Science ›› 2015, Vol. 42 ›› Issue (7): 5-11.doi: 10.11896/j.issn.1002-137X.2015.07.002

Previous Articles     Next Articles

Detecting Snapshots for Web Tables

WANG Ning REN Hong-wei   

  • Online:2018-11-14 Published:2018-11-14

Abstract: In recent years,a large number of structured tabular data have emerged on the Internet constantly.However,the value of Web tables depends not only on the data itself,but also on the relatedness between the data.Only when the potential relatedness between them is detected,can these structured data be fully utilized.We proposed a new type of relatedness between Web tables called snapshot relationship,and a framework for capturing snapshots that meet a certain matching condition with a given table.The snapshots are beneficial for query optimization,and also helpful for returning partial results rapidly when querying on big data.The relatedness between an original Web table and its snapshot can be computed based on entity consistency and schema consistency.In order to assign high weights on tables which provide more fresh entities,the concept of entity freshness was introduced into our scoring method.Meanwhile,the content consistency of Web tables can be enhanced by applying Bayesian analysis to our relatedness capturing framework.As a consequence,accuracy of finding snapshots is improved.Extensive experiments demonstrate that the algorithms can capture snapshots with high quality,and perform well in query precision and recall.

Key words: Web tables,Relatedness,Snapshot,Data integration,Query optimization

[1] Cafarella M J,Halevy A,Wang D Z,et al.WebTables:Exploring the Power of Tables on the Web [J].Proceedings of the VLDB Endowment,2008,1(1):538-549
[2] Gonzalez H,Halevy A,Jensen C S,et al.Google Fusion Tables:Data Management,Integration and Collaboration in the Cloud[C]∥Proc of the 1st ACM symposium on Cloud computing.New York:ACM,2010:175-180
[3] Wang J,Wang H,Wang Z,et al.Understanding Tables on the Web [M].New York:Springer,2012
[4] Venetis P,Halevy A,Madhavan J,et al.Recovering Semantics of Tables on the Web [J].Proceedings of the VLDB Endowment,2011,4(9):528-538
[5] Yakout M,Ganjam K,Chakrabarti K,et al.InfoGather:EntityAugmentation and Attribute Discovery by Holistic Matching with Web Tables[C]∥Proc of the 2012 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2012:97-108
[6] Dong X L,Berti-Equille L,Srivastava D.Truth Discovery andCopying Detection in a Dynamic World [J].Proceedings of the VLDB Endowment,2009,2(1):562-573
[7] Sarma A D,Fang L,Gupta N,et al.Finding Related Tables[C]∥Proc of the 2012 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2012:817-828
[8] Eberius J,Thiele M,Braunschweig,et al.DrillBeyond:Enabling Business Analysts to Explore the Web of Open Data [J].Proceedings of the VLDB Endowment,2012,5(12):1978-1981
[9] Theodoros R,Xin L D,Divesh S.Characterizing and Selecting Fresh Data Sources[C]∥Proc of the 2014 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2014:919-930
[10] 孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169 Meng Xiao-feng,Ci Xiang.Big Data Management:Concepts,Technology and Challenges [J].Journal of Computer Research and Development,2013,50(1):146-169
[11] Babcock B,Chaudhuri S,Das G.Dynamic Sample Selection forApproximate Query Processing[C]∥Proc of the 2003 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2003:539-550
[12] Cafarella M J,Halevy A,Wang D Z,et al.Uncovering the Relational Web[C]∥Proc of the 11th Int Workshop on the Web and Databases( WebDB 2008).Vancouver,2008
[13] Bollacker K,Evans C,Paritosh P,et al.Freebase:a Collaboratively Created Graph Database for Structuring Human Know-ledge[C]∥Proc of the 2008 ACM SIGMOD Intl Conf on Mana-gement of data.New York:ACM,2008:1247-1250
[14] Lee T,Wang Z,Wang H,et al.Attribute Extraction and Sco-ring:A Probabilistic Approach[C]∥Proc of the 2013 IEEE Int Conf on Data Engineering (ICDE).Washington,DC:IEEE,2013:194-205
[15] Dong X L,Berti-Equille L,Srivastava D.Integrating Conflicting Data:The Role of Source Dependence [J].Proceedings of the VLDB Endowment,2009,2(1):550-561

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!