计算机科学 ›› 2015, Vol. 42 ›› Issue (7): 5-11.doi: 10.11896/j.issn.1002-137X.2015.07.002
王 宁 任红伟
WANG Ning REN Hong-wei
摘要: 近年来,互联网上涌现出大量结构化的表格数据,网络表格的价值不仅在于数据本身,还在于数据之间的关系。只有探测出表格之间潜在的关系,方能更好地利用这些结构化数据。因此提出发现网络表格间的快照关系,并给出发现快照关系的框架以及检测与给定表之间满足某种匹配关系的快照表的算法,快照表可用于优化查询以及在大数据环境下实时地返回部分查询结果。提出了基于实体和属性重合度的评分方法,并引入实体新鲜度的概念,使得算法在快照关系的发现过程中更多地关注能提供新鲜实体的表;与此同时,基于Bayes模型的表格内容增强算法能更加准确地判断属性列上值的一致性,从而提高快照关系发现的准确率。大量实验表明,该评分模型能发现高质量的快照表,且在快照的查询精度和召回率上表现出色。
[1] Cafarella M J,Halevy A,Wang D Z,et al.WebTables:Exploring the Power of Tables on the Web [J].Proceedings of the VLDB Endowment,2008,1(1):538-549 [2] Gonzalez H,Halevy A,Jensen C S,et al.Google Fusion Tables:Data Management,Integration and Collaboration in the Cloud[C]∥Proc of the 1st ACM symposium on Cloud computing.New York:ACM,2010:175-180 [3] Wang J,Wang H,Wang Z,et al.Understanding Tables on the Web [M].New York:Springer,2012 [4] Venetis P,Halevy A,Madhavan J,et al.Recovering Semantics of Tables on the Web [J].Proceedings of the VLDB Endowment,2011,4(9):528-538 [5] Yakout M,Ganjam K,Chakrabarti K,et al.InfoGather:EntityAugmentation and Attribute Discovery by Holistic Matching with Web Tables[C]∥Proc of the 2012 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2012:97-108 [6] Dong X L,Berti-Equille L,Srivastava D.Truth Discovery andCopying Detection in a Dynamic World [J].Proceedings of the VLDB Endowment,2009,2(1):562-573 [7] Sarma A D,Fang L,Gupta N,et al.Finding Related Tables[C]∥Proc of the 2012 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2012:817-828 [8] Eberius J,Thiele M,Braunschweig,et al.DrillBeyond:Enabling Business Analysts to Explore the Web of Open Data [J].Proceedings of the VLDB Endowment,2012,5(12):1978-1981 [9] Theodoros R,Xin L D,Divesh S.Characterizing and Selecting Fresh Data Sources[C]∥Proc of the 2014 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2014:919-930 [10] 孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169 Meng Xiao-feng,Ci Xiang.Big Data Management:Concepts,Technology and Challenges [J].Journal of Computer Research and Development,2013,50(1):146-169 [11] Babcock B,Chaudhuri S,Das G.Dynamic Sample Selection forApproximate Query Processing[C]∥Proc of the 2003 ACM SIGMOD Int Conf on Management of Data.New York:ACM,2003:539-550 [12] Cafarella M J,Halevy A,Wang D Z,et al.Uncovering the Relational Web[C]∥Proc of the 11th Int Workshop on the Web and Databases( WebDB 2008).Vancouver,2008 [13] Bollacker K,Evans C,Paritosh P,et al.Freebase:a Collaboratively Created Graph Database for Structuring Human Know-ledge[C]∥Proc of the 2008 ACM SIGMOD Intl Conf on Mana-gement of data.New York:ACM,2008:1247-1250 [14] Lee T,Wang Z,Wang H,et al.Attribute Extraction and Sco-ring:A Probabilistic Approach[C]∥Proc of the 2013 IEEE Int Conf on Data Engineering (ICDE).Washington,DC:IEEE,2013:194-205 [15] Dong X L,Berti-Equille L,Srivastava D.Integrating Conflicting Data:The Role of Source Dependence [J].Proceedings of the VLDB Endowment,2009,2(1):550-561 |
No related articles found! |
|