计算机科学 ›› 2022, Vol. 49 ›› Issue (5): 120-128.doi: 10.11896/jsjkx.210300092
杨斐斐, 沈思妤, 申德荣, 聂铁铮, 寇月
YANG Fei-fei, SHEN Si-yu, SHEN De-rong, NIE Tie-zheng, KOU Yue
摘要: 随着数据量的增加、数据间的关联和交叉,需要通过数据融合来实现数据的价值最大化。然而,由于数据融合过程复杂,为清晰解释数据融合过程,建立数据融合的回溯机制十分必要。虽然对数据溯源的研究很多,但大多是面向查询和工作流的溯源研究,而面向数据融合的溯源研究很少。文中面向数据融合溯源展开研究,提出了一种支持多粒度数据溯源的方法。首先,对数据融合过程进行抽象,以实体为核心构建模式、实体和属性的语义图,将数据融合过程语义化,并提出优化的溯源信息存储模式;然后,基于语义图,分别提出了实体级和属性级的溯源查询算法,以及相应的查询优化策略;最后,通过实验证明了提出的数据溯源方法的有效性。
中图分类号:
[1]MENG X F,DU Z J.Research on Big Data Fusion:Problemsand challenges[J].Journal of Computer Research and Development,2016,53(2):231-246. [2]WANG S,PENG Y W,LAN H,et al.Development and Prospect of data integration methods[J].Acta Software,2020,31(3):893-908. [3]HERSCHEL M,DIESTELKÄMPER R,BEN LAHMAR H.A survey on provenance:What for? What form? What from?[J].Vldb Journal,2017,26(5):1-26. [4]IKEDA R,PARK H,WIDOM J.Provenance for GeneralizedMap and Reduce Workflows[C]//Fifth Biennial Conference on Innovative Data Systems Research.Asilomar,CA,USA,2011:273-283. [5]BUTT A S,FITCH P.ProvONE+:A Provenance Model for Scientific Workflows[C]//Web Information Systems Enginee-ring-WISE 2020.Cham:Springer,2020:431-444. [6]AKOUSH S,SOHAN R,HOPPER A.HadoopProv:towardsprovenance as a first class citizen in MapReduce[C]//Usenix Workshop on the Theory and Practice of Provenance.USENIX Association,2013. [7]LOGOTHETIS D,DE S,YOCUM K.Scalable lineage capturefor debugging DISC analytics[C]//Symposium on Cloud Computing.ACM,2013. [8]INTERLANDI M,SHAH K,TETALI S D,et al.Titian:data provenance support in Spark[J].Proceedings of the Vldb Endowment,2015,9(3):216-227. [9]DEUTCH D,GILAD A,MOSKOVITCH Y.Selective prove-nance for datalog programs using top-k queries[J].Proceedings of the VLDB Endowment,2015,8(12):1394-1405. [10]CHENEY J,CHITICARIU L,TAN W C.Provenance in Databases:Why,How,and Where[J].Foundations & Trends in Databases,2010,1(4):379-474. [11]HERSCHEL M.A Hybrid Approach to Answering Why-NotQuestions on Relational Query Results[J].Journal of Data & Information Quality,2015,5(3):1-29. [12]XUE J X,SHEN D R,KOU Y,et al.Semirring Provenance for Data Fusion[J].Journal of Computer Research and Development,2016,53(2):316-325. [13]MISSIER P,BELHAJJAME K,CHENEY J.The W3C PROVfamily of specifications for modelling provenance metadata[C]//Proceedings of EDBT.2013:773-776. [14]NIU X,KAPOOR R,GLAVIC B,et al.Interoperability forprovenance-aware databases using PROV and JSON[C]//Usenix Conference on Theory and Practice of Provenance.USENIX Association,2015. [15]ALOMEIR O,LAI E Y,MILANI M,et al.The Pastwatch:On the usability of provenance data in relational databases[C]//2020 IEEE 36th International Conference on Data Engineering (ICDE).IEEE,2020:1882-1885. [16]CUI Y,WIDOM J.Lineage tracing for general data warehouse transformations[J].The VLDB Journal,2003,12(1):41-58. [17]BUNEMAN P,KHANNA S,TAN W C.Why and Where:ACharacterization of Data Provenance[C]//International Confe-rence on Database Theory.Berlin:Springer,2001:316-330. [18]GREEN T J,KARVOUNARAKIS G,TANNEN V.Provenance semirings[C]//Twenty-Sixth ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems.ACM,2007:31-40. [19]DONG X,LAURE B E,SRIVASTAVA D.Integratingconflic-ting data:The role of source dependence[J].Proceedings of VLDB Endowment,2009,2(1):550-561. |
[1] | 秦琪琦, 张月琴, 王润泽, 张泽华. 基于知识图谱的层次粒化推荐方法 Hierarchical Granulation Recommendation Method Based on Knowledge Graph 计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111 |
[2] | 陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079 |
[3] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[4] | 胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063 |
[5] | 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究 Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method 计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220 |
[6] | 王栋, 周大可, 黄有达, 杨欣. 基于多尺度多粒度特征的行人重识别 Multi-scale Multi-granularity Feature for Pedestrian Re-identification 计算机科学, 2021, 48(7): 238-244. https://doi.org/10.11896/jsjkx.200600043 |
[7] | 李艳, 范斌, 郭劼, 林梓源, 赵曌. 基于k-原型聚类和粗糙集的属性约简方法 Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets 计算机科学, 2021, 48(6A): 342-348. https://doi.org/10.11896/jsjkx.201000053 |
[8] | 王政, 姜春茂. 一种基于三支决策的云任务调度优化算法 Cloud Task Scheduling Algorithm Based on Three-way Decisions 计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023 |
[9] | 吕乐宾, 刘群, 彭露, 邓维斌, 王崇宇. 结合多粒度信息的文本匹配融合模型 Text Matching Fusion Model Combining Multi-granularity Information 计算机科学, 2021, 48(6): 196-201. https://doi.org/10.11896/jsjkx.200700100 |
[10] | 丁玲, 向阳. 基于分层次多粒度语义融合的中文事件检测 Chinese Event Detection with Hierarchical and Multi-granularity Semantic Fusion 计算机科学, 2021, 48(5): 202-208. https://doi.org/10.11896/jsjkx.200800038 |
[11] | 周晓进, 徐陈铭, 阮彤. 面向中文电子病历的多粒度医疗实体识别 Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records 计算机科学, 2021, 48(4): 237-242. https://doi.org/10.11896/jsjkx.200100036 |
[12] | 陈卓, 王国胤, 刘群. 结合多粒度特征融合的自然场景文本检测方法 Natural Scene Text Detection Algorithm Combining Multi-granularity Feature Fusion 计算机科学, 2021, 48(12): 243-248. https://doi.org/10.11896/jsjkx.201000154 |
[13] | 徐堃, 付印金, 陈卫卫, 张亚男. 基于区块链的云存储安全研究进展 Research Progress on Blockchain-based Cloud Storage Security Mechanism 计算机科学, 2021, 48(11): 102-115. https://doi.org/10.11896/jsjkx.210600015 |
[14] | 薛占熬, 孙冰心, 侯昊东, 荆萌萌. 基于多粒度粗糙直觉犹豫模糊集的最优粒度选择方法 Optimal Granulation Selection Method Based on Multi-granulation Rough Intuitionistic Hesitant Fuzzy Sets 计算机科学, 2021, 48(10): 98-106. https://doi.org/10.11896/jsjkx.200800074 |
[15] | 薛占熬, 张敏, 赵丽平, 李永祥. 集对优势关系下多粒度决策粗糙集的可变三支决策模型 Variable Three-way Decision Model of Multi-granulation Decision Rough Sets Under Set-pair Dominance Relation 计算机科学, 2021, 48(1): 157-166. https://doi.org/10.11896/jsjkx.191200175 |
|