Computer Science ›› 2019, Vol. 46 ›› Issue (10): 195-201.doi: 10.11896/jsjkx.180901748

• Software & Database Technology • Previous Articles     Next Articles

Foreign Key Detection Algorithm for Web Tables Based on Conflict Dependency Elimination

WANG Jia-min, WANG Ning   

  1. (School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China)
  • Received:2018-09-16 Revised:2019-03-06 Online:2019-10-15 Published:2019-10-21

Abstract: As one of the most important constraints in databases,foreign key relationships between tables are crucial for data integration and analysis.For large amount of web tables,foreign keys are not specified in most cases.Therefore,detection of foreign key becomes a significant step in understanding and utilizing the web tables.Current researches mainly focus on the search for inclusion dependencies between attributes,and foreign key detection methods on traditional relational tables cannot solve the problem about large number of conflicting foreign keys due to the heterogeneity of web tables.Considering the conflict dependency between web tables,a foreign key detection algorithm for web tables based on conflict dependency elimination was proposed.Firstly,the concept of conflict dependency is proposed,and an inclusion dependency graph is established for the candidate foreign key relationship.Subsequently,the layer structure of the inclusion dependency graph is constructed,and the definition for the strength of candidates is given.Finally,based on the layer-by-layer elimination of conflict dependency,true foreign key relationships are obtained.To verify the effectiveness of the algorithm,this paper selected 3 true web table datasets as experimental datasets,which are the WIKI dataset with complete schema specification,the DWTC and WDC datasets without schema information.The proposed algorithm has been compared with the other two foreign key detection methods on above datasets in terms of accuracy,recall and F-measure.The experimental results show that the accuracy,recall andF-measure of the proposed algorithm on WIKI and DWTC are more superior than existing algorithms.The proposed algorithm outperforms other algorithms especially on WDC,the largest and the most up-to-data web table corpus,it’s accuracy,recall and F-measure to 0.89,0.88 and 0.89.Therefore,the proposed foreign key detection algorithm is more suitable for web tables.Compared with the existing methods,it has higher accuracy,recall and F-measure.

Key words: Conflict dependency, Constraint, Foreign key, Web table

CLC Number: 

  • TP391
[1]VENETIS P,HALEVY A,MADHAVAN J,et al.Recovering semantics of tables on the web[J].In Proceedings of the VLDB Endowment,2011,4(9):528-538.
[2]Marco D,ADELFIO,HANAN S.Schema extraction for tabular data on the web[J].In Proceedings of the VLDB Endowment,2013,6(6):421-432.
[3]WANG J J,WANG H X,WANG Z Y,et al.Understanding tables on the web[C]//Proceedings of 31st International Confe-rence ER.2012:141-155.
[4]DENG D,JIANG Y,LI G L,et al.Scalable column concept determination for web tables using large konwledge bases[J].In Proceedings of the VLDB Endowment,2013,6(3):1606-1617.
[5]ZHANG M H,MARIOS H,BENG C O,et al.Automatic discovery of attributes in relational databases[C]//ACM SIGMOD Conference.2011:109-120.
[6]VARISH M,FININ T,JOSHI A.Generating linked data by inferring the semantics of tables[C]//Proc.1st Int.Workshop on Searching and Integrating New Web Data Sources--Very Large Data Search.2011:17-22.
[7]SARMA A D,FANG L,GUPTA N,et al.Finding related tables[C]// Proceedings of the 2012 ACM SIGMOD Int.Conf.on Management of Data.New York:ACM,2012:817-828.
[8]CASANOVA M A,FAGIN R,PAPADIMITRIOU C H.Inclusion dependencies and their interaction with functional depen-dencies[C]//PODS,Los Angeles,United States.1982:171-176.
[9]MARCHI F D,PETIT J M.Zigzag:a new algorithm for mining large inclusion dependencies in databases[C]//ICDM,Melbourne,Florida,USA,2003:27-34.
[10]BAUCKMANN J,LESER U,NAUMANN F,et al.Efficiently detecting inclusion dependencies[C]// International Conference on Data Engineering.Istanbul,Twrkey,2007:1448-1450.
[11]FABIAN T,THORSTEN P,FELIX N.Detecting inclusion dependencies on very many tables[J].ACM Transaction on Database Systems,2017,42(3):1-29.
[12]MARCHI F D,LOPES S,PETIT J M.Unary and n-ary inclusion dependency discovery in relational databases[J].Journal of Intelligent Information Systems,2009,32(1):53-73.
[13]PAPENBROCK T,KRUSE S,NAUMANN F.Divide & con-quer-based inclusion dependency discovery[J].Proceedings of the VLDB Endowment,2015,8(7):774-785.
[14]ROSTIN A,ALBRECHT O,BAUCKMANN J,et al.A ma-chince learning approach to foreign key discovery[C]//12th International Workshop on the Web and Databases.2009.
[15]CHEN Z M,NARASAYYA V,CHAUDHURI S.Fast foreign-key detection in microsoft sql server powerpivot for excel[J].Proceedings of the VLDB Endowment,2014,7(13):1417-1428.
[16]ZHANG M H,HADJIELEFTHERIOU M,OOI B C,et al.On multi-column foreign key discovery[J].Proceedings of the VLDB Endowment,2010,3(1):805-814.
[17]PELEG S,WERMAN M,ROM H.A unified approach to the change of resolution:space and gray-level[J].TPAMI,1989,11(7):739-742.
[18]Web Data Commons-Web Table Corpora[EB/OL].[2018-08-01].http://www.webdatacommons.org/webtables/index.html.
[19]DWTC-Dresden Web Table Corpus[EB/OL].[2018-08-01].https://wwwdb.inf.tu-dresden.de/misc/dwtc/.
[20]WikiTables:Public Site[EB/OL].[2018-08-01].http://dow-ney-n1.cs.northwestern.edu/public.
[1] FU Yan-ming, ZHU Jie-fu, JIANG Kan, HUANG Bao-hua, MENG Qing-wen, ZHOU Xing. Incentive Mechanism Based on Multi-constrained Worker Selection in Mobile Crowdsourcing [J]. Computer Science, 2022, 49(9): 275-282.
[2] YANG Hui, TAO Li-hong, ZHU Jian-yong, NIE Fei-ping. Fast Unsupervised Graph Embedding Based on Anchors [J]. Computer Science, 2022, 49(4): 116-123.
[3] GUAN Zheng, DENG Yang-lin, NIE Ren-can. Non-negative Matrix Factorization Based on Spectral Reconstruction Constraint for Hyperspectral and Panchromatic Image Fusion [J]. Computer Science, 2021, 48(9): 153-159.
[4] ZHANG Fan, GONG Ao-yu, DENG Lei, LIU Fang, LIN Yan, ZHANG Yi-jin. Wireless Downlink Scheduling with Deadline Constraint for Realistic Channel Observation Environment [J]. Computer Science, 2021, 48(9): 264-270.
[5] XU Yi-fei, XIONG Shu-hua, SUN Wei-heng, HE Xiao-hai, CHEN Hong-gang. HEVC Post-processing Algorithm Based on Non-local Low-rank and Adaptive Quantization Constraint Prior [J]. Computer Science, 2021, 48(5): 155-162.
[6] LI Li, LI Guang-peng, CHANG Liang, GU Tian-long. Survey of Constrained Evolutionary Algorithms and Their Applications [J]. Computer Science, 2021, 48(4): 1-13.
[7] ZHOU Qiu-yan, XIAO Man-sheng, ZHANG Long-xin, ZHANG Xiao-li, YANG Wen-li. Intelligent Optimization Technology of Production Scheduling Under Multiple Constraints [J]. Computer Science, 2021, 48(3): 239-245.
[8] ZHENG Jian-yun, PANG Jian-min, ZHOU Xin, WANG Jun. Enhanced Binary Vulnerability Mining Based on Constraint Derivation [J]. Computer Science, 2021, 48(3): 320-326.
[9] CAO Bo, CHEN Feng, CHENG Jing, LI Hua, LI Yong-le. Route Planning of Unstructured Road Including Repeat Node Based on Bidirectional Search [J]. Computer Science, 2021, 48(11A): 77-80.
[10] WANG Wen-bo, LUO Heng-li. Complete Graph Face Clustering Based on Graph Convolution Network [J]. Computer Science, 2021, 48(11A): 275-277.
[11] YE Ya-nan, CHI Jing, YU Zhi-ping, ZHAN Yu-liand ZHANG Cai-ming. Expression Animation Synthesis Based on Improved CycleGan Model and Region Segmentation [J]. Computer Science, 2020, 47(9): 142-149.
[12] YANG Fan, WANG Jun-bin, BAI Liang. Extended Algorithm of Pairwise Constraints Based on Security [J]. Computer Science, 2020, 47(9): 324-329.
[13] ZHANG Long-xin, ZHOU Li-qian, WEN Hong, XIAO Man-sheng, DENG Xiao-jun. Energy Efficient Scheduling Algorithm of Workflows with Cost Constraint in Heterogeneous Cloud Computing Systems [J]. Computer Science, 2020, 47(8): 112-118.
[14] GAO Fang-yuan, WANG Xiu-mei. Subspace Clustering Method Based on Block Diagonal Representation and Neighbor Constraint [J]. Computer Science, 2020, 47(7): 66-70.
[15] WANG Hui-yan, XU Jing-wei, XU Chang. Survey on Runtime Input Validation for Context-aware Adaptive Software [J]. Computer Science, 2020, 47(6): 1-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!