Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250600204-10.doi: 10.11896/jsjkx.250600204

• Big Data & Data Science • Previous Articles     Next Articles

Data Quality Measurement Method Based on Metadata

CHEN Lianyong1, SONG Jinyu2, LI Zhixia1, SI Changzhe1, YANG Wenkai1, WANG Jing1   

  1. 1 Xichang Satellite Launch Center,Xichang,Sichuan 615000,China
    2 College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210000,China
  • Online:2026-06-16 Published:2026-06-12
  • About author:CHEN Lianyong,born in 1999,master,assistant engineer.His main research interests include communication engineering and data engineering.
    SONG Jinyu,born in 1967,master,professor,is a member of CCF(No.99004M).Her main research interests include data engineering and data quality.
  • Supported by:
    National Natural Science Foundation of China(62207031).

Abstract: Data quality is a realistic need to activate the potential of data elements and ensure the realization of data value.In order to find problematic data and improve data quality,this paper studies the data quality measurement methods based on metadata,constructs the data quality measurement index set based on metadata,and formulates the generation elements and formal composition forms of measurement rules,so as to achieve data quality measurement with consistency,integrity and validity.And the corre-sponding data quality measurement tool is developed.Combining with the teaching evaluation dataset of a university,the feasibility and effectiveness of the proposed measurement method and measurement tool are verified.

Key words: Data quality, Metadata, Data quality measurement

CLC Number: 

  • TP 274+.3
[1] DING X O,LI Y Z,WANG C,et al.Time series data quality Rules discovery with both row and column dependencies[J].Journal of Software,2023,34(3):1065-1086.
[2] LI J Z,WANG H Z,GAO H.State-of-the-Art of research on big data usability[J].Journal of Software,2016,27(7):1605-1625.
[3] SONG J Y,CHEN L Y,CHEN G.Data quality measurement framework research and field measurement framework construction[J].Computer Science,2024,51(4):19-27.
[4] KUANG J Q,ZHAO C,YANG L,et al.An Outlier CleaningAlgorithm Based on Deep Learning[J].Journal of Electronics & Information Technology,2022,44(2):507-513.
[5] SONG J Y,CHEN S,GUO D P,et al.Data Quality and Data Cleaning Methods[J].Command Information Sy stem and Technology,2013,4(5):63-70.
[6] LI H F,GUO R T,HU H Q,et al.An evaluation method of NOTAM data quality based on entropy weight method[J].Journal of Civil Aviation,2022(4):1-5.
[7] ZHANG W,WANG D,TAN X Y.Robust class-specific autoencoder for data cleaning and classification in the presence of label noise[J].Neural Process Lett,2019,50(2):1845-1860
[8] DAI C F,LI P,WANG W Q.Application of maximum depen-dency set in inconsistent data detection[J].Computer Engineering and Applications,2019,55(15):89-95.
[9] GE J,LIANG Y D,HUANG J P,et al.A large scale data qualityanomaly detection method based on data characteristics:CN113569006A[P].2021-10-29.
[10] LIU Z F,WANG J M,LI Q.A survey of research on metadata quality evaluation[J].Information studies:Theory & Application,2022,45(7):42-48.
[11] HUANG G,YUAN M,WU X Y,et al.Data quality assessment architecture research based on metadata-driven[J].Computer Engineering and Applications,2013,49(8):114-119.
[12] BATINI C,SCANNAPIECO M.Data and information qualitydimensions,principles and techniques[M].Beijing:National Defense Industry Press,2022:17-43.
[13] CHEN L Y,SONG J Y.Abnormal data detection method based on ant colony clustering[C]//Proceedings of the 6th International Conference on Electronic Information Technology and Computer Engineering.2022:66-72.
[14] LIU F,LI M,REN H M,et al.Data quality evaluation method based on rule base[J].Computer Systems & Appli-cations,2017,26(11):165-169.
[15] WANG W H.Research on key technologies of business rule-based expert system[D].Xi'an:Northwest University,2011.
[16] LIU B,GENG Y R.Mining method for data quality detectionrules[J].Pattern Recognition and Artificial Intelligence,2012,25(5):835-844.
[17] CARUCCIO L,DEUFEMIA V,NAUMANNE F,et al.Discovering relaxed functional dependencies based on multi-Attribute dominance [C]//Proceedings of the 37th InternationalConfe-rence on Data Engineering,2021:2354-2355.
[18] WEN F,HUANG H L,LI T D,et al.Research on library data fast mining algorithm based on FP-growth association rules[J].Journal of Chongqing University of Technology(Natural Science),2020,34(6):189-194.
[19] WANG D Y,LI L,ZHANG L,et al.Research and application of FP-Tree algorithm rule mining[J].Journal of Northeastern University(Natural Science Edition),2021,53(2):67-72.
[20] YAO P F.Research on operational data quality inspectionmethod based on association rules[J].Electronic Design Enginee-ring,2022,30(3):126-130.
[21] QIU X Q,HU Y M,ZHU A X,et al.Research on associatedrule-based error checking method on assessment index database of cultivated land quality:a case Study on Guangzhou city[J].China Land Science,2020,34(3):75-83.
[22] YU M,ZHAO X N,XU Z.Survey on using dependencies to improve data consistency[J].Journal of Computer Applications,2018,38(S2):72-76.
[23] DU Y F,SHEN D R,NIE T Z,et al.A cleaning method for consistency and currency in related data[J].Chinese Journal of Computers,2017,40(1):15.
[24] ZHU Y Q,ZHOU L W,CHEN G.Research of relationship between functional dependency and association rules in data-base theory teaching[J].Application Research of Computers,2014,31(7):2085-2087.
[25] CARPINETO C,ROMANO G,D'ADAMO P.Inferring dependencies from relations:a conceptual clustering approach[J].Computational Intelligence,1999,15(4):415-441.
[26] AN L X,ZHANG H Y.The mathematical mechanism of functional dependency and its deformation in the view of data mining[J].Fuzzy Systems and Mathematics,2020,34(6):1-11.
[27] ZAHNG C S,TU Y,WENG H,et al.Dependency discovery and data recovery of conditional functions based on association rules[J].Application Research of Computers,2016,33(2):384-387.
[28] XIONG Z M,WANG B,TAO R,et al.An association rule mining reduction algorithm based on determining prime attributes[J].Computer Engineering and Science,2021,43(4):738-745.
[29] ZHOU J L,DIAO X C,CAO J J.Mining of constant conditional functional dependencies based on pruning free itemsets[J].Journal of Tsinghua University(Science and Technology),2016,56(3):253-261.
[1] SHEN Jianwei, CHEN Jiawen, CHEN Hanlin, MA Xinjian, CHEN Xing. Construction and Application of Dataset Knowledge Graph Based on Metadata Semantic Enhancement [J]. Computer Science, 2026, 53(6A): 250500052-10.
[2] CUI Jinjia, ZENG Chen, WANG Lu, PENG Xiaohui. Analysis of Data Trading Models and Transaction Challenges [J]. Computer Science, 2026, 53(4): 121-133.
[3] XU Jiawen, ZHENG Yungui, ZHOU Wei, XU Yaoqiang, HU Huiqi, ZHOU Xuan. SQL-MARS:Text-to-SQL Structured Data Recommendation System for Ambiguous UserRequirements [J]. Computer Science, 2026, 53(3): 52-63.
[4] QIAN Zekai, DING Xiaoou, SUN Zhe, WANG Hongzhi, ZHANG Yan. Intelligent Evidence Set Selection Method for Diverse Data Cleaning Tasks [J]. Computer Science, 2024, 51(8): 124-132.
[5] SONG Jinyu, CHEN Lianyong, CHEN Gang. Data Quality Measurement Framework Research and Field Measurement Framework Construction [J]. Computer Science, 2024, 51(4): 19-27.
[6] ZHANG Guohao, WANG Yi, ZHOU Xi, WANG Baoquan. Deep Collaborative Truth Discovery Based on Variational Multi-hop Graph Attention Encoder [J]. Computer Science, 2024, 51(3): 109-117.
[7] WU Jiawei, FANG Quan, HU Jun, QIAN Shengsheng. Pre-training of Heterogeneous Graph Neural Networks for Multi-label Document Classification [J]. Computer Science, 2024, 51(1): 143-149.
[8] CHANG Bing-guo, SHI Hua-long, CHANG Yu-xin. Multi Model Algorithm for Intelligent Diagnosis of Melanoma Based on Deep Learning [J]. Computer Science, 2022, 49(6A): 22-26.
[9] ZHENG Xiao-meng, GAO Meng, TENG Jun-yuan. Research on Construction Method of Defect Prediction Dataset for Spacecraft Software [J]. Computer Science, 2021, 48(6A): 575-580.
[10] LI Ying, YU Ya-xin, ZHANG Hong-yu, LI Zhen-guo. High Trusted Cloud Storage Model Based on TBchain Blockchain [J]. Computer Science, 2020, 47(9): 330-338.
[11] LI Zhuo, XU Zhe, CHEN Xin, LI Shu-qin. Location-related Online Multi-task Assignment Algorithm for Mobile Crowd Sensing [J]. Computer Science, 2019, 46(6): 102-106.
[12] WANG Yang, CAI Shu-qin, ZOU Xin-wen, CHEN Zi-tong. Quality-embedded Hypergraph Model for Big Data Product Manufacturing System and Decision for Production Lines [J]. Computer Science, 2019, 46(2): 11-17.
[13] CAI Li, LIANG Yu, ZHU Yang-yong and HE Jing. History and Development Tendency of Data Quality [J]. Computer Science, 2018, 45(4): 1-10.
[14] SHANG Yu-ling, CAO Jian-jun, LI Hong-mei, ZHENG Qi-bin. Co-author and Affiliate Based Name Disambiguation Approach [J]. Computer Science, 2018, 45(11): 220-225.
[15] XU Jing, REN Kai-jun and LI Xiao-yong. Parallel Algorithm Design and Optimization of Range Query for Meteorological Data Retrieval [J]. Computer Science, 2017, 44(3): 42-47.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!