计算机科学 ›› 2012, Vol. 39 ›› Issue (Z11): 174-176.

• 软件工程 • 上一篇    下一篇

基于清洗规则和主数据的数据修复算法实现

林印华,张春海,刘 洁   

  1. (中国海洋大学信息科学与工程学院 青岛266100)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Realization of Data Cleaning Based on Editing Rules and Master Data

  • Online:2018-11-16 Published:2018-11-16

摘要: 为了有效地清洗数据,此前已经提出了很多的完整性约束规则,例如条件函数依赖、条件包含依赖。这些约束规则虽然可以侦测出错误的存在,但是不能有效地指导用户纠正错误。实际上,基于约束规则的数据修复可能最终得不到确定性的修复结果,相反会引入新的错误,因此很大程度上降低了数据修复的效率。针对以上不足,提出了一种有效的数据清洗框架:首先基于Editing Rules和Master Data对数据进行清洗操作,最终得到确定性的修复;然后依据条件函数依赖来修复遗漏的错误,此种修复结果是不确定的,但是相比之下该框架不仅可以有效地保证数据修复的精确性与唯一性,而且提高了数据修复的效率。

关键词: 条件函数依赖,清洗规则,数据清洗,数据质量

Abstract: To effectively clean the dirty data among the database,a variety of integrity constraints have been proposed,such as Conditional Functional Dcpcndcncics(CFD) , Conditional Inclusion Dcpcndcncics(CIDS). Even though these constrains are compentent to detect the existence of mistakes,they couldn't effectively guide us to corretc these mistakes,as a mater of fact, data repairing based on these constraints maybe not able to find certain fixs that arc absolutely right,what's more, thay may introduce new mistakes, so it reduced the effenciency of data repairing. Focusing on the above-mentioned demerits, this paper proposed a better data repairing framework; firstly, those fixes which arc based on Editing Rules and Master Data arc bound to be certain, we also provide an an algorithm to automatcly repairing the dirty data; seconldly, the prior step may not repair the whole attributes of the relation, so we employ the CFDto correct the reamining dirty data, unfortunatly, these fixes arc possible fixes which maybe not totally right. Even so, compared with others, the framework show great superiority, not only enhance the efficiency and uniqueness, but slao make sure the percision of data repairing.

Key words: Conditional functional dependency, Cleaning rules, Data cleaning, Data quality

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!