Computer Science ›› 2018, Vol. 45 ›› Issue (3): 311-316.doi: 10.11896/j.issn.1002-137X.2018.03.051

Previous Articles     Next Articles

Rapid Decision Method for Repairing Sequence Based on CFDs

WANG Huan, ZHANG Yun-feng and ZHANG Yan   

  • Online:2018-03-15 Published:2018-11-13

Abstract: Data consistency is one central issue of big data quality management research.Conditional functional depen-dencies (CFDs) are effective techniques for maintaining data consistency.In practice,different repairing sequences may affect precision and efficiency of data repairing.It is critical to select an appropriate repairing sequence.To solve the problem,based on CFDs,this paper presented a rapid decision method for repairing sequence.Firstly,a framework is designed for consistency repairing.Then,by analyzing the association between constraints,the concept of repairing sequence graph is presented to determine repairing sequence on CFDs.It contributes to avoiding some incorrect and unnecessary repairs,which can improve the accuracy of repairing.Meanwhile,repairing sequence with rules runs faster than that with real data.Furthermore,in the process of repairing sequence decision,repairing-deadlock detection is implemented to ensure the termination of repairing.Finally,compared with the existing method,this solution is more accurate and efficient evidenced by the empirical evaluation on two real-life datasets.

Key words: Data consistency,Conditional functional dependencies (CFDs),Repairing sequence

[1] FAN W,GEERTS F.Foundations of data quality management[M].Synthesis Lectures on Data Management,Morgan & Claypool Publishers,2012.
[2] ECKERSON W W.Data warehousing special report:data quality and the bottom line.http://www.adtmag.com/aspx?id=6321.
[3] BOHANNON P,FAN W,GEERTS F,et al.Conditional func-tional dependencies for data cleaning[C]∥Proceedings of the 2007 IEEE International Conference on Data Engineering.2007:746-755.
[4] WANG J,TANG N.Towards dependable data repairing withfixing rules[C]∥Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.2014:124-136.
[5] HUANG Y,GUERRA-HOLLSTEIN J D,BRUSILOVSKY P.Modeling skill combination patterns for deeper knowledge tra-cing[C]∥Proceedings of the 2016 Personalization Approaches in Learning Environments.2016:359-368.
[6] BOHANNON P,FAN W,FLASTER M,et al.A cost-basedmodel and effective heuristic for repairing constraints by value modification[C]∥Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data.Baltimore,Mary-land,2005:143-154.
[7] FEI C,MILLER R J.A unified model for data and constraint repair[C]∥Proceedings of the 2014 IEEE International Confe-rence on Data Engineering.2011:446-457.
[8] JIN C Q,LIU H P,ZHOU A Y.Functional dependency and conditional constraint based data repair[J].Journal of Software,2016,27(7):1671-1684.(in Chinese) 金澈清,刘辉平,周傲英.基于函数依赖与条件约束的数据修复方法[J].软件学报,2016,27(7):1671-1684.
[9] ZHANG X Y,MENG X F,MA Z M,et al.Attribute weightevaluation approach based on approximate functional dependencies[J].Computer Science,2013,40(2):172-176.(in Chinese) 张霄雁,孟祥福,马宗民,等.基于近似函数依赖的关系数据属性权重评估方法[J].计算机科学,2013,40(2):172-176.
[10] HAN J Y,CHEN K J.Ranking data quality of web article content by extracting facts[J].Computer Science,2014,41(11):247-251.(in Chinese) 韩京宇,陈可佳.基于事实抽取的Web文档内容数据质量评估[J].计算机科学,2014,41(11):247-251.
[11] EBAID A,ELMAGARMID A,ILYAS I F,et al.NADEEF:ageneralized data cleaning system[J].Proceedings of the 2013 VLDB Endowment,2013,6(12):1218-1221.
[12] FEI C,MILLER R J.Discovering data quality rules[J].Procee-dings of the 2008 VLDB Endowment,2008,1(1):1166-1177.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[5] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[6] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[7] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[8] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[9] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .
[10] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105, 130 .