计算机科学 ›› 2017, Vol. 44 ›› Issue (9): 208-215.doi: 10.11896/j.issn.1002-137X.2017.09.039

• 软件与数据库技术 • 上一篇    下一篇

基于列重合度的网络表格一致性扩展

齐飞,王宁,张丽方,孙伟娟   

  1. 北京交通大学计算机与信息技术学院 北京100044,北京交通大学计算机与信息技术学院 北京100044,北京交通大学计算机与信息技术学院 北京100044,北京交通大学计算机与信息技术学院 北京100044
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金面上项目(61370060)资助

Consistent Web Table Augmentation Based on Column Overlapping

QI Fei, WANG Ning, ZHANG Li-fang and SUN Wei-juan   

  • Online:2018-11-13 Published:2018-11-13

摘要: 网络表格的扩展是根据已知信息扩展与主列相关的其他属性列,以满足人们通过表格获取感兴趣信息的需求。目前的研究工作主要针对由主列和待扩展列组成的实体-属性二元表,并将主列视为其他属性列扩展的唯一依据,但该技术运用到具有多个待扩展列的网络表格时,由多个二元表拼接而成的结果表很容易出现实体不一致现象。综合考虑各属性列间以及元组行间的关系,提出一致性支持度概念,设计并实现了基于列重合度的表格一致性扩展系统CCA,其既能保证候选值的高匹配分数,又能使结果表中填值所使用的数据源表数目最小化,有效地避免了实体不一致问题。实验表明,与现有方法相比CCA系统有更高的精确度、覆盖率、一致性,以及更低的查询时间代价。

关键词: 网络表格扩展,列重合度,列映射,一致性支持度

Abstract: Web table augmentation refers to extend table content based on main column or other known information,which helps people to obtain information they are interested in.Current research focuses on entity-attribute binary table made of main column and extended column,where the primary column is the only basis.When it is applied to a table with multiple columns to be extended,the result table consolidated by binary tables will suffer from entity inconsistency problem.We proposed consistency support degree based on relationships between columns as well as between tuples in the table,and implemented the CCA system for table consistency augmentation based on column overlapping.Our methodkeeps the high matching score of candidate values using as few source tables as possible to avoid entity inconsistency.Experimental results show that the proposed CCA system has higher accuracy,coverage,consistency and lower query time cost compared with existing methods.

Key words: Web table augmentation,Overlapping degree of columns,Column mapping,Consistent support degree

[1] CAFARELLA M J,HALEVY A,WANG D Z,et al.WebTa-bles:exploring the power of tables on the Web[J].Proceedings of the Vldb Endowment,2008,1(1):538-549.
[2] CAFARELLA M J,HALEVY A Y,ZHANG Y,et al.Uncovering the Relational Web[C]∥International Workshop on the Web and Databases,WEBDB 2008.2008.
[3] LIAO T,LIU Z T,SUN R.Research and Implementation ofWeb Table Positioning Technology[J].Computer Science,2009,36(9):227-230.(in Chinese) 廖涛,刘宗田,孙荣.Web表格定位技术的研究与实现[J].计算机科学,2009,36(9):227-230.
[4] WANG N,REN H W.Detecting Snapshots for Web Tables[J].Computer Science,2015,42(7):5-11.(in Chinese) 王宁,任红伟.网络表格间的快照关系发现[J].计算机科学,2015,42(7):5-11.
[5] CAFARELLA M J,HALEVY A,KHOUSSAINOVA N.Dataintegration for the relational Web[J].Proceedings of the Vldb Endowment,2009,2(1):1090-1101.
[6] GONZALE H,HALEVY A Y,JENSEN C S,et al.Google fusion tables:web-centered data management and collaboration[C]∥ACM SIGMOD International Conference on Management of Data(SIGMOD 2010).2010:1061-1066.
[7] GONZALE H,HALEVY A,JENSEN C S,et al.Google fusion tables:data management,integration and collaboration in the cloud[C]∥Proceedings of the 1st ACM Symposium on Cloud Computing.ACM,2010:175-180.
[8] SARMA A D,FANG L,GUPTA N,et al.Finding related tables[C]∥Acm Sigmod International Conference on Management of Data.ACM,2012:817-828.
[9] BALAKRISHNAN S,HALEVY A,HARB B,et al.ApplyingWebTables in Practice[C]∥Biennial Conference on Innovative Data Systems Research.2015.
[10] YAKOUT M,GANJAM K,CHAKR ABARTI K,et al.Info-Gather:entity augmentation and attribute discovery by holistic matching with web tables[C]∥ACM SIGMOD Internatio-nal Conference on Management of Data.ACM,2012:97-108.
[11] ZHANG M,CHAKRABARTI K.InfoGather+:semantic matching and annotation of numeric and time-varying attributes in Web tables[C]∥ACM SIGMOD International Conference on Management of Data.2013:145-156.
[12] YANG M,DING B,CHAUDURI S,et al.Finding patterns in a knowledge base using keywords to compose table answers[J].Proceedings of the Vldb Endowment,2014,7(14):1809-1820.
[13] WANG C,CHAKRABARTI K,HE Y,et al.Concept Expansion Using Web Tables[C]∥Proceedings of the 24th International Conference on World Wide Web.2015:1198-1208.
[14] PIMPLIKAR R,SARAWAGI S.Answering table queries on the web using column keywords[J].Proceedings of the Vldb Endowment,2012,5(10):908-919.
[15] GUPTA R,SARAWAGI S.Answering Table AugmentationQueries from Unstructured Lists on the Web[J].Proceedings of the Vldb Endowment,2009,2(1):289-300.
[16] LEHMBERG O,RITZE D,RISTOSKI P,et al.Extending tables with data from over a million websites[C]∥Semantic Web Challenge.2014.
[17] BIZER C.Search Joins with the Web[C]∥ICDT.2014:3.
[18] LEHMBERG O,RITZE D,RISTOSKI P,et al.The Mannheim Search Join Engine[J].Web Semantics Science Services & Agents on the World Wide Web,2015,35(P3):159-166.
[19] BRAUNSCHWEIG K,THIELE M,E BERIUS J,et al.Column-specific context extraction for web tables[C]∥ACM Sympo-sium on Applied Computing.ACM,2015:1072-1077.
[20] EBERIUS J,THIELE M,BRAUNSCHWEIG K,et al.Top-kentity augmentation using consistent set covering[C]∥SSDBM.2015:1-12.
[21] LAUTERT L R,SCHEIDT M M,DORNELES C F.Web table taxonomy and formalization[J].ACM Sigmod Record,2013,42(3):28-33.
[22] SONG S,ZHANG A,CHEN L,et al.Enriching data imputation with extensive similarity neighbors[J].Proceedings of the Vldb Endowment,2015,8(11):1286-1297.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!