Computer Science ›› 2012, Vol. 39 ›› Issue (Z11): 207-211.

Previous Articles     Next Articles

Data Cleaning and its General System Framework

  

  • Online:2018-11-16 Published:2018-11-16

Abstract: Data cleaning is one of the important methods to improve data quality. This pare studies data cleaning and system framework from the perspective of comparing data product with physical product and software product, Data quality research is begun from the data cleaning. The status and function of data cleaning is identified from the point of view of the data quality development,and it is compared to default diagnosis and servicing. 10 items of explaination for the data cleaning arc given, and its basic meaning is elucidated roundly. We compare data cleaning and data integration,and point that they arc the two coequal concepts of data quality. A general system framework of data cleaning is constructed. The framework consists of five phases, and they are preparation, detection, location, modification and validation. It could apply to different data cleaning tasks, and is a framework with good flexibility, extensibility, interactivity and loose coupling.

Key words: Data quality,Data cleaning,Approximate duplicate records,Incompleted records,Framework

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!