计算机科学 ›› 2018, Vol. 45 ›› Issue (4): 1-10.doi: 10.11896/j.issn.1002-137X.2018.04.001
• 综述 • 下一篇
蔡莉,梁宇,朱扬勇,何婧
CAI Li, LIANG Yu, ZHU Yang-yong and HE Jing
摘要: 在互联网时代,数据成为了新的生产要素,也成为了基础性资源和战略性资源,同时还是重要的生产力。大数据服务业在全国广泛开展,数据交易所纷纷成立。这时,数据质量就逐渐变成制约数据产业发展的关键问题。首先,按照时间顺序将数据质量的研究内容划分为3个阶段,全面梳理和总结每个阶段的代表性成果,包括理论、方法、技术、工具和框架;然后,分析了在物联网、云计算和大数据环境下,数据质量研究所面临的各种挑战和机遇;最后,从数据质量模型、大数据质量管理、大数据质量相关技术、众包、物联网以及数据开放6个方面对数据质量的研究热点和发展方向进行了展望。
[1] SCANNAPIECO M,CATARCI T.Data Quality under the Computer Science Perspective [J].Archivi & Computer,2002,2:1-13. [2] Financial Accounting Standards Board.Qualitative Characteristics of Accounting Information,Statement of Financial Accoun-ting Concepts No.2 [R].Financial Accounting Standards Board,2008:6. [3] 曹建军,刁兴春,徐永平,等.信息质量 [M].北京:国防工业出版社,2013. [4] NAUMANN F,ROLKWE C.Assessment Methods for Information Quality Criteria[C]∥Proceedings of 5th International Conference on Information Quality.2000:148-162. [5] HUANG X Y,ZHANG H.Statistical Data Quality Manage-ment:From a Multidisciplinary Perspective [J].Journal of Business Economics,2011,239(9):90-96.(in Chinese) 黄向阳,张皓.多学科视角下的统计数据质量管理 [J].商业经济与管理,2011,239(9):90-96. [6] WANG R Y,STRONG D M.Beyond accuracy:What data quality means to data consumers [J].Journal of management information systems,1996,12(4):5-33. [7] REDMAN T C.Data quality:the field guide[M].Boston:DigitalPress,2001. [8] ZEIST R H J,HENDRIKS P R H.Specifying software quality with the extended ISO model [J].Software Quality Journal,1996,5(4):273-284. [9] KATERATTANAKUL P,SIAU K.Measuring informationquality of web sites:Development of an instrument[C]∥Proceedings of the 20th International Conference on Information Systems.North Carolina:ACM,1999:279-285. [10] DEDEKE A.A Conceptual Framework for Developing Quality Measures for Information Systems[C]∥Conference on Information Quality.DBLP,2000:126-128. [11] FAN B W.Study on the quality of crowdsourcing geographic data-a case of Kunming [D].Kunming:Yunnan University,2015.(in Chinese) 范博文.众源地理数据质量研究——以昆明市为例 [D].昆明:云南大学,2015. [12] CAI L,ZHU Y Y.Big Data Quality [M].Shanghai:Scientific & Technical Publishers,2017.(in Chinese) 蔡莉,朱扬勇.大数据质量 [M].上海:科学技术出版社,2017. [13] ZOOK M,GRAHAM M,SHELTON T,et al.Volunteered geographic information and crowdsourcing disaster relief:a case study of the Haitian earthquake [J].World Medical & Health Policy,2010,2(2):7-33. [14] PIPINO L L,LEE Y W,WANG R Y.Data quality assessment [J].Communications of the ACM,2002,45(4):211-218. [15] BALLOU D,WANG R,PAZER H,et al.Modeling information manufacturing systems to determine information product quality [J].Management Science,1998,44(4):462-484. [16] 徐子沛.大数据[M].桂林:广西师范大学出版社,2013. [17] SILBERSCHATZ A.Database System Concepts:Fifth Edition[M].Beijing:China Machine Press,2010. [18] CHENG L Q.Data Constraints on the Impact of Data Quality [J].Journal of Yangtze University (Natural Science Edition),2011,8(5):100-102.(in Chinese) 程录庆.数据约束对数据质量的影响研究 [J].长江大学学报(自然科学版),2011,8(5):100-102. [19] BOHANNON P,FAN W,GEERTS F,et al.Conditional functional dependencies for data cleaning[C]∥IEEE 23rd International Conference on Data Engineering,2007(ICDE 2007).IEEE,2007:746-755. [20] CONG G,FAN W,GEERTS F,et al.Improving data quality:Consistency and accuracy[C]∥Proceedings of the 33rd International Conference on Very Large Data Bases.VLDB Endowment,2007:315-326. [21] INMON W H.Building the data warehouse (2nd ed)[M].John Wiley & Sons,1996. [22] 李志刚,马刚.数据仓库与数据挖掘的原理及应用[M].北京:高等教育出版社,2007. [23] BATINI C,CAPPIELLO C,FRANCALANCI C,et al.Methodo-logies for data quality assessment and improvement [J].ACM Computing Surveys (CSUR),2009,41(3):16-68. [24] Chinese Academy of Sciences Computer Network InformationCenter.Data Quality Evaluation Method and Index System[EB/OL].[2015-10-17].http://www.nsdata.cn/pronsdchtml/1.compservice.standards/pages/3123.html.(in Chinese) 中国科学院计算机网络信息中心.数据质量评测方法与指标体系[EB/OL].[2015-10-17].http://www.nsdata.cn/pronsdchtml/1.compservice.standards/pages/3123.html. [25] SAATY T L.Decision making with the analytic hierarchyprocess [J].International Journal of Services Sciences,2008,1(1):83-98. [26] 陈水利,李敬功,王向公.模糊集理论及其应用 [M].北京:科学出版社,2005:156-207. [27] LI D Y,LIU C Y.Study on the Universality of the NormalCloud Model [J].Engineering Sciences,2004,6(8):28-34.(in Chinese) 李德毅,刘常昱.论正态云模型的普适性 [J].中国工程科学,2004,6(8):28-34. [28] LIU C.Sampling Theory and Method of Accuracy Measurement and Quality Assurance for GIS Attribute Data [D].Shanghai:Tongji University,2000.(in Chinese) 刘春.GIS 属性数据的精度度量及质量控制的抽样原理与方法 [D].上海:同济大学,2000. [29] FAN W,GEERTS F.Foundations of data quality management[J].Synthesis Lectures on Data Management,2012,4(5):1-217. [30] MONGE A E,ELKAN C.The Field Matching Problem:Algo-rithms and Applications[C]∥KDD.1996:267-270. [31] WANG Y F,ZHANG C Z,ZHANG B B,et al.A Survey of Data Cleaning [J].New Technology of Library & Information Ser-vice,2007,2(12):50-56.(in Chinese) 王曰芬,章成志,张蓓蓓,等.数据清洗研究综述 [J].现代图书情报技术,2007,2(12):50-56. [32] CAO J J,DIAO X C,CHEN S,et al.Data Cleaning and itsGeneral System Framework [J].Computer Science,2012,39(S3):207-211.(in Chinese) 曹建军,刁兴春,陈爽,等.数据清洗及其一般性系统框架 [J].计算机科学,2012,39(S3):207-211. [33] GALHARDAS H,FLORESCU D,SHASHA D,et al.AJAX:an extensible data cleaning tool[J].ACM Sigmod Record,2000,29(2):590. [34] RAMAN V,HELLERSTEIN J M.Potter’s wheel:An interactive data cleaning system[C]∥VLDB.2001:381-390. [35] VASSILIADIS P,VAGENA Z,SKIADOPOULOS S,et al.ARKTOS:towards the modeling,design,control and execution of ETL processes [J].Information Systems,2001,26(8):537-561. [36] CUI Y,WIDOM J,WIENER J L.Tracing the lineage of viewdata in a warehousing environment [J].ACM Transactions on Database Systems (TODS),2000,25(2):179-227. [37] BUNEMAN P,TAN W C.Provenance in databases[C]∥Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data.ACM,2007:1171-1173. [38] DENG Z H,WEI Y Z.Study on the Method of Provenance in Science Workflow for Data Publishing [J].Library & Information,2014,158(3):61-66.(in Chinese) 邓仲华,魏银珍.面向数据发布的科学工作流数据溯源方法研究 [J].图书与情报,2014,158(3):61-66. [39] BUNEMAN P,KHANNA S,WANG C T.Why and where:A characterization of data provenance[C]∥International Con-ference on Database Theory.Springer Berlin Heidelberg,2001:316-330. [40] GREEN T J,KARVOUNARAKIS G,TANNEN V.Provenance semirings[C]∥Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.ACM,2007:31-40. [41] RAM S,LIU J,GEORGE R T.PROMS:A System for Harvesting and Managing Data Provenance [EB/OL].[2010-11-01].http://kartik.eller.arizona.edu/WITS_DEMO_final.pdf. [42] MING H,ZHANG Y,FU X H.Survey of Data Provenance [J].Journal of Chinese Computer Systems,2012,33(9):1917-1923.(in Chinese) 明华,张勇,符小辉.数据溯源技术综述 [J].小型微型计算机系统,2012,33(9):1917-1923. [43] WANG J L,LI H,WANG Q.Research on ISO 8000 SeriesStandards for Data Quality [J].Standard Science,2010,439(12):44-46.(in Chinese) 王军玲,李华,王强.ISO8000 数据质量系列标准探析 [J].标准科学,2010,439(12):44-46. [44] RADACK G.Improving Data Portability and Long Term Data Retention through ISO Standards 8000 and 22745 [C]∥The Fifth MIT Information Quality Industry Symposium.2011:13-15. [45] SONG L R,PENG J.Introduction and Inspirations of the “Information Quality Act” in the American Federal Government [J].Journal of Intelligence,2012,31(2):12-18.(in Chinese) 宋立荣,彭洁.美国政府 “信息质量法” 的介绍及其启示 [J].情报杂志,2012,31(2):12-18. [46] SCANNAPIECO M,VIRGILLITO A,M ARCHETTI C,et al.The DaQuinCIS architecture:a platform for exchanging and improving data quality in cooperative information systems [J].Information systems,2004,29(7):551-582. [47] SIADAT M R,SOLTANIAN-ZADEH H,F OTOUHI F,et al.Data modeling for content-based support environment (C-BASE):Application on epilepsy data mining[C]∥Seventh IEEE International Conference on Data Mining Workshops,2007(ICDM Workshops 2007).IEEE,2007:181-188. [48] CHU E,BAID A,CHEN T,et al.A relational approach to incrementally extracting and querying structure in unstructured data[C]∥Proceedings of the 33rd International Conference on Very Large Data Bases.VLDB Endowment,2007:1045-1056. [49] MARCUS S,SUBRAHMANIAN V S.Foundations of multimedia database systems [J].Journal of the ACM (JACM),1996,43(3):474-523. [50] AMATO G,MAINETTO G,SAVINO P.An approach to a content-based retrieval of multimedia data[C]∥Multimedia Information Systems.Springer US,1998:9-36. [51] LI W,LANG B.A tetrahedral data model for unstructured data management [J].Science China Information Sciences,2010,53(8):1497-1510. [52] MCGILVRAY D.Executing Data Quality Projects:Ten Steps to Quality Data and Trusted Information (TM) [M].California:Morgan Kaufmann,2007. [53] CAI L,ZHU Y.The challenges of data quality and data quality assessment in the big data era [J].Data Science Journal,2015,14(2):2-10. [54] YANG D,MA Y A,WANG Z,et al.Exploration and reflection of data quality management system of operators under the big data background [J].China Internet,2016(1):73-79.(in Chinese) 杨迪,马怡安,王铮,等.运营商在大数据背景下对数据质量管理体系的探索及思考 [J].互联网天地,2016(1):73-79. [55] WANG J,SONG Z,LI Q,et al.Semantic-based Intelligent Data Clean Framework for Big Data[C]∥2014 International Con-ference on Security,Pattern Analysis,and Cybernetics.IEEE,2014:448-453. [56] CRAWL D,WANG J,ALTINTAS I.Provenance for mapre-duce-based data-intensive workflows[C]∥Proceedings of the 6th Workshop on Workflows in Support of Large-scale Science.ACM,2011:21-30. [57] PARK H,IKEDA R,WIDOM J.RAMP:A System for Capturing and Tracing Provenance in MapReduce Workflows [C]∥Proceedings of the VLDB Endowment,2011,4(12):1-4. [58] AKOUSH S,SOHAN R,HOPPER A.HadoopProv:TowardsProvenance as a First Class Citizen in MapReduce[C]∥TaPP.2013. [59] AMSTERDAMER Y,DAVIDSON S B,D EUTCH D,et al.Putting lipstick on pig:Enabling database-style workflow provenance [J].Proceedings of the VLDB Endowment,2011,5(4):346-357. [60] HAKLAY M.How good is volunteered geographical informa-tion? A comparative study of OpenStreetMap and Ordnance Survey datasets [J].Environment and Planning B:Planning and Design,2010,37(4):682-703. [61] CIEPUCH B,JACOB R,MOONEY P,et al.Comparison of the accuracy of OpenStreetMap for Ireland with Google Maps and Bing Maps[C]∥Proceedings of the Ninth International Symposium on Spatial Accuracy Assessment in Natural Resuorces and Enviromental Sciences.University of Leicester,2010:337. [62] GIRRES J F,TOUYA G.Quality assessment of the FrenchOpenStreetMap dataset [J].Transactions in GIS,2010,14(4):435-459. [63] ARSANJANI J J,ZIPF A,MOONEY P,et al.An introduction to OpenStreetMap in Geographic Information Science:Experien-ces,research,and applications[M]∥OpenStreetMap in GIScience.Springer International Publishing,2015:1-15. [64] SUN S,KRAJWEWSKI J L B,LYNGGAARD-JENSEN A,et al.Literature review for data validation methods[EB/OL].[2011-6-8].http://www.prepared-fp7.eu/viewer/file.aspx?fileinfoID=215. [65] FAN H.Study on Unreliable RFID Data Cleaning and Storage techniques for Internet of Things[D].Changsha:National University of Defense Technology,2013.(in Chinese) 樊华.面向物联网的 RFID 不确定数据清洗与存储技术研究 [D].长沙:国防科学技术大学,2013. [66] JEFFERY S R,ALONSO G,FRANKLIN M J,et al.A pipelined framework for online cleaning of sensor data streams[C]∥Proceedings of the 22nd International Conference on Data Enginee-ring.IEEE,2006:140. [67] WANG C.Study on Quality Assurance Method for Internet of Things of Location-based Service[D].Nanjing:Nanjing University of Science and Technology,2015.(in Chinese) 王川.面向位置服务的物联网数据质量保证方法研究[D].南京:南京理工大学,2015. |
No related articles found! |
|