Computer Science ›› 2022, Vol. 49 ›› Issue (5): 120-128.doi: 10.11896/jsjkx.210300092

• Database & Big Data & Data Science • Previous Articles     Next Articles

Method on Multi-granularity Data Provenance for Data Fusion

YANG Fei-fei, SHEN Si-yu, SHEN De-rong, NIE Tie-zheng, KOU Yue   

  1. College of Computer Science and Engineering,Northeastern University,Shenyang 110169,China
  • Received:2021-03-09 Revised:2021-10-22 Online:2022-05-15 Published:2022-05-06
  • About author:YANG Fei-fei,born in 1998,postgra-duate.Her main research interests include data integration and data provenance.
    SHEN De-rong,born in 1964,professor,Ph.D supervisor,is a senior member of China Computer Federation.Her main research interests include Web data processing and distributed database.
  • Supported by:
    National Natural Science Foundation of China(62072084, 62072086) and National Key R & D Program of China(2018YFB1003404).

Abstract: As the amount of data increases,correlates and crosses between data,the value of data needs to be maximized through data fusion.However,due to the complexity of the data fusion process,to clearly explain the data fusion process,it is necessary to establish a backtracking mechanism for data fusion.Although many researches are focused on data provenance,most of them are based on query and workflow,and few of them are for data fusion.This paper focuses on the provenance of data fusion,and proposes a method to support multi-granularity provenance.Firstly,the data fusion process is abstracted,and the semantic graphs of patterns,entities and attributes are constructed with the entity as the core,and an optimized model for storing storage provenance information is proposed.Secondly,on the basis of the semantic graph,the data provenance query algorithms at the entity level and the attribute level are proposed respectively,and the corresponding query optimization strategy are also proposed.Finally,experiments demonstrate the effectiveness of the proposed data provenance method.

Key words: Data fusion, Data provenance, Multi-granularity

CLC Number: 

  • TP311.13
[1]MENG X F,DU Z J.Research on Big Data Fusion:Problemsand challenges[J].Journal of Computer Research and Development,2016,53(2):231-246.
[2]WANG S,PENG Y W,LAN H,et al.Development and Prospect of data integration methods[J].Acta Software,2020,31(3):893-908.
[3]HERSCHEL M,DIESTELKÄMPER R,BEN LAHMAR H.A survey on provenance:What for? What form? What from?[J].Vldb Journal,2017,26(5):1-26.
[4]IKEDA R,PARK H,WIDOM J.Provenance for GeneralizedMap and Reduce Workflows[C]//Fifth Biennial Conference on Innovative Data Systems Research.Asilomar,CA,USA,2011:273-283.
[5]BUTT A S,FITCH P.ProvONE+:A Provenance Model for Scientific Workflows[C]//Web Information Systems Enginee-ring-WISE 2020.Cham:Springer,2020:431-444.
[6]AKOUSH S,SOHAN R,HOPPER A.HadoopProv:towardsprovenance as a first class citizen in MapReduce[C]//Usenix Workshop on the Theory and Practice of Provenance.USENIX Association,2013.
[7]LOGOTHETIS D,DE S,YOCUM K.Scalable lineage capturefor debugging DISC analytics[C]//Symposium on Cloud Computing.ACM,2013.
[8]INTERLANDI M,SHAH K,TETALI S D,et al.Titian:data provenance support in Spark[J].Proceedings of the Vldb Endowment,2015,9(3):216-227.
[9]DEUTCH D,GILAD A,MOSKOVITCH Y.Selective prove-nance for datalog programs using top-k queries[J].Proceedings of the VLDB Endowment,2015,8(12):1394-1405.
[10]CHENEY J,CHITICARIU L,TAN W C.Provenance in Databases:Why,How,and Where[J].Foundations & Trends in Databases,2010,1(4):379-474.
[11]HERSCHEL M.A Hybrid Approach to Answering Why-NotQuestions on Relational Query Results[J].Journal of Data & Information Quality,2015,5(3):1-29.
[12]XUE J X,SHEN D R,KOU Y,et al.Semirring Provenance for Data Fusion[J].Journal of Computer Research and Development,2016,53(2):316-325.
[13]MISSIER P,BELHAJJAME K,CHENEY J.The W3C PROVfamily of specifications for modelling provenance metadata[C]//Proceedings of EDBT.2013:773-776.
[14]NIU X,KAPOOR R,GLAVIC B,et al.Interoperability forprovenance-aware databases using PROV and JSON[C]//Usenix Conference on Theory and Practice of Provenance.USENIX Association,2015.
[15]ALOMEIR O,LAI E Y,MILANI M,et al.The Pastwatch:On the usability of provenance data in relational databases[C]//2020 IEEE 36th International Conference on Data Engineering (ICDE).IEEE,2020:1882-1885.
[16]CUI Y,WIDOM J.Lineage tracing for general data warehouse transformations[J].The VLDB Journal,2003,12(1):41-58.
[17]BUNEMAN P,KHANNA S,TAN W C.Why and Where:ACharacterization of Data Provenance[C]//International Confe-rence on Database Theory.Berlin:Springer,2001:316-330.
[18]GREEN T J,KARVOUNARAKIS G,TANNEN V.Provenance semirings[C]//Twenty-Sixth ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems.ACM,2007:31-40.
[19]DONG X,LAURE B E,SRIVASTAVA D.Integratingconflic-ting data:The role of source dependence[J].Proceedings of VLDB Endowment,2009,2(1):550-561.
[1] QIN Qi-qi, ZHANG Yue-qin, WANG Run-ze, ZHANG Ze-hua. Hierarchical Granulation Recommendation Method Based on Knowledge Graph [J]. Computer Science, 2022, 49(8): 64-69.
[2] ZHANG Yuan, KANG Le, GONG Zhao-hui, ZHANG Zhi-hong. Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM [J]. Computer Science, 2022, 49(7): 31-39.
[3] CHEN Ming-xin, ZHANG Jun-bo, LI Tian-rui. Survey on Attacks and Defenses in Federated Learning [J]. Computer Science, 2022, 49(7): 310-323.
[4] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[5] WANG Dong, ZHOU Da-ke, HUANG You-da , YANG Xin. Multi-scale Multi-granularity Feature for Pedestrian Re-identification [J]. Computer Science, 2021, 48(7): 238-244.
[6] WANG Zheng, JIANG Chun-mao. Cloud Task Scheduling Algorithm Based on Three-way Decisions [J]. Computer Science, 2021, 48(6A): 420-426.
[7] LYU Le-bin, LIU Qun, PENG Lu, DENG Wei-bin , WANG Chong-yu. Text Matching Fusion Model Combining Multi-granularity Information [J]. Computer Science, 2021, 48(6): 196-201.
[8] DING Ling, XIANG Yang. Chinese Event Detection with Hierarchical and Multi-granularity Semantic Fusion [J]. Computer Science, 2021, 48(5): 202-208.
[9] ZHOU Xiao-jin, XU Chen-ming, RUAN Tong. Multi-granularity Medical Entity Recognition for Chinese Electronic Medical Records [J]. Computer Science, 2021, 48(4): 237-242.
[10] CHEN Zhuo, WANG Guo-yin, LIU Qun. Natural Scene Text Detection Algorithm Combining Multi-granularity Feature Fusion [J]. Computer Science, 2021, 48(12): 243-248.
[11] XU Kun, FU Yin-jin, CHEN Wei-wei, ZHANG Ya-nan. Research Progress on Blockchain-based Cloud Storage Security Mechanism [J]. Computer Science, 2021, 48(11): 102-115.
[12] ZHANG Jun, WANG Yang, LI Kun-hao, LI Chang, ZHAO Chuan-xin. Multi-source Sensor Body Area Network Data Fusion Model Based on Manifold Learning [J]. Computer Science, 2020, 47(8): 323-328.
[13] MA Hong. Fusion Localization Algorithm of Visual Aided BDS Mobile Robot Based on 5G [J]. Computer Science, 2020, 47(6A): 631-633.
[14] HUANG Ting-ting, FENG Feng. Study on Optimization of Heterogeneous Data Fusion Model in Wireless Sensor Network [J]. Computer Science, 2020, 47(11A): 339-344.
[15] LI Yuan,LI Zhi-xing,TENG Lei,WANG Hua-ming,WANG Guo-yin. Comment Sentiment Analysis and Sentiment Words Detection Based on Attention Mechanism [J]. Computer Science, 2020, 47(1): 186-192.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!