Computer Science ›› 2019, Vol. 46 ›› Issue (4): 8-13.doi: 10.11896/j.issn.1002-137X.2019.04.002

• Big Data & Data Science • Previous Articles     Next Articles

Quality Control Agent Based on Probability Inference

XU Yao-li, LI Zhan-huai   

  1. School of Computer Science and Engineering,Northwestern Polytechnical University,Xi’an 710072,China
    Key Laboratory of Big Data Storage and Management,Northwestern Polytechnical University,Ministry of Industry and Information Technology,Xi’an 710129,China
  • Received:2018-12-04 Online:2019-04-15 Published:2019-04-23

Abstract: Entity resolution (ER) is the fundamental problem of data integration and cleaning,while inconsistency reconciliation(IR) further improves the resolution performance through reconciling inconsistent pairs resolved by existing diverse ER approaches.However,previous IR approaches have a limitation that the reconciliation solution has no quality guarantee.To solve this problem,this paper firstly proposed a quality control agent based on probability inference,denoted as QCAgent.QCAgent does not require any manually labeled pair,and can automatically output reconciliation result with the highest recall on the premise of satisfying the given precision threshold.Its core idea is as follows.Firstly,the outlier detection model is utilized to estimate the matching probability for each inconsistent pair,and then the estimated precision and recall are regarded as the environmental feedback according to these probabilities.Next,the binary search algorithm is used to select a flipping solution as the next action of QCAgent,which can make flipped reconciliation result satisfy the precision requirement with the highest recall.Then the outlier detection model is retrained by using the new consistent pairs,and the recall and precision of flipped reconciliation result are estimated.The iterative process terminates until the newest estimated precision meets the constraints.On the real data set,the experimental results show that QCAgent can effectively solve the quality control problem of reconciliation result.

Key words: Agent, Entity resolution, Inconsistency reconciliation, Precision, Quality control

CLC Number: 

  • TP391
[1]XU Y,LI Z,CHEN Q,et al.GL-RF:A Reconciliation Framework for Label-free Entity Resolution [J].Frontiers of Compu-ter Science,2018,12(5):1035-1037.
[2]LI G.Human-in-the-loop data integration [J].Proceedings of the VLDB Endowment,2017,10(12):2006-2017.
[3]FAN F F,LI Z H,CHEN Q,et al.An outlier-detection based approach for automatic entity matching [J].Chinese Journal of Computers,2017,40(10):2197-2211.(in Chinese) 樊峰峰,李战怀,陈群,等.一种基于离群点检测的自动实体匹配方法[J].计算机学报,2017,40(10):2197-2211.
[4]EFTHYMIOU V,STEFANIDIS K,CHRISTOPHIDES V.Minoan ER:Progressive Entity Resolution in the Web of Data[C]∥Proceedings of the 19th International Conference on Extending Database Technology.2016:670-671.
[5]LI L,LI J,GAO H.Rule-Based Method for Entity Resolution [J].IEEE Transactions on Knowledge & Data Engineering,2015,27(1):250-263.
[6]WHANG S E,MARMAROS D,GARCIA-MOLINA H.Pay-as-you-go entity resolution [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(5):1111-1124.
[7]BELLARE K,IYENGAR S,PARAMESWARAN A,et al.Active Sampling for Entity Matching with Guarantees [J].ACM Transactions on Knowledge Discovery from Data,2013,7(3):1-24.
[8]BELLARE K,IYENGAR S,PARAMESWARAN A G,et al. Active sampling for entity matching[C]∥Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM:New York,2012:1131-1139.
[9]WANG J,LI G,YU J X,et al.Entity matching:how similar is similar [J].Proceedings of the VLDB Endowment,2011,4(10):622-633.
[10]MONGE A E,ELKAN C.The Field Matching Problem:Algorithms and Applications[C]∥Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.AAAI Press:California,1996:267-270.
[11]ZHANG D,GUO L,HE X,et al.A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution[C]∥Procee-dings of the 34th IEEE International Conference on Data Engineering.IEEE Computer Society,2018:713-724.
[12]ARASU A,GÖTZ M,KAUSHIK R.On active learning of record matching packages[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data.ACM:New York,2010:783-794.
[13]MUDGAL S,LI H,REKATSINAS T,et al.Deep Learning for Entity Matching:A Design Space Exploration[C]∥Proceedings of the 2018 International Conference on Management of Data.ACM:New York,2018:19-34.
[14]COHEN W,RAVIKUMAR P,FIENBERG S.A comparison of string metrics for matching names and records[C]∥Proceedings of the KDD Workshop on Data Cleaning and Object Consolidation.2003:73-78.
[15]EBRAHEEM M,THIRUMURUGANATHAN S,JOTY S,et al. Distributed representations of tuples for entity resolution[J].Proceedings of the VLDB Endowment,2018,11(11):1454-1467.
[16]COHEN W W.Data integration using similarity joins and a word-based information representation language [J].ACM Transactions on Information Systems,2000,18(3):288-321.
[17]DAS A,KOTTUR S,MOURA J M F,et al.Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning[C]∥Proceedings of the IEEE International Conference on Computer Vision.2017:2970-2979.
[18]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning [J].Nature,2015,518(7540):529-533.
[19]LIU Q,ZHAI J W,ZHANG Z Z,et al.A Survey on Deep Reinforcement Learning [J].Chinese Journal of Computers,2018,41(1):1-27.(in Chinese) 刘全,翟建伟,章宗长,等.深度强化学习综述 [J].计算机学报,2018,41(1):1-27.
[20]ZHAO X Y,DING S F.Research on Deep Reinforcement Learning [J].Computer Science,2018,45(7):1-6.(in Chinese) 赵星宇,丁世飞.深度强化学习研究综述 [J].计算机科学,2018,45(7):1-6.
[21]CHEN Z,CHEN Q,FAN F,et al.Enabling quality control for entity resolution:A human and machine cooperation framework[C]∥Proceedings of the 2018 IEEE 34th International Confe-rence on Data Engineering.IEEE:New Jersey,2018:1156-1167.
[22]EFTHYMIOU V,PAPADAKIS G,PAPASTEFANATOS G,et al. Parallel meta-blocking for scaling entity resolution over big heterogeneous data [J].Information Systems,2017,65:137-157.
[23]WANG Q,CUI M,LIANG H.Semantic-aware blocking for entity resolution [J].IEEE Transactions on Knowledge and Data Engineering,2016,28(1):166-180.
[24]SIMONINI G,BERGAMASCHI S,JAGADISH H.BLAST:a loosely schema-aware meta-blocking approach for entity resolution [J].Proceedings of the VLDB Endowment,2016,9(12):1173-1184.
[25]PAPADAKIS G,KOUTRIKA G,PALPANAS T,et al.Meta- Blocking:Taking Entity Resolution to the Next Level [J].IEEE Transactions on Knowledge & Data Engineering,2014,26(8):1946-1960.
[26]SCHÖLKOPF B,PLATT J C,SHAWE-TAYLOR J,et al.Estimating the support of a high-dimensional distribution [J].Neural computation,2001,13(7):1443-1471.
[27]PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python [J].Journal of Machine Learning Research,2011,12:2825-2830.
[28]CORMEN T H,LEISERSON C E,RIVEST R L,et al.算法导论 [M].殷建平,徐云,王刚,等译.北京:机械工业出版社,2013.
[29]KÖPCKE H,THOR A,RAHM E.Evaluation of entity resolution approaches on real-world match problems [J].Proceedings of the VLDB Endowment,2010,3(1-2):484-493.
[1] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[2] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[3] ZHANG Ming-xin. Conceptual Model for Large-scale Social Simulation [J]. Computer Science, 2022, 49(4): 16-24.
[4] WANG Qi, WANG Gang-qiao, CHEN Yong-qiang, LIU Yi. Integrated Modeling Method and Application System for Social Computing [J]. Computer Science, 2022, 49(4): 25-29.
[5] WANG Zi-yin, LI Lei-jun, MI Ju-sheng, LI Mei-zheng, XIE Bin. Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost [J]. Computer Science, 2022, 49(4): 161-167.
[6] WANG Chun-jing, LIU Li, TAN Yan-yan, ZHANG Hua-xiang. Image Retrieval Method Based on Fuzzy Color Features and Fuzzy Smiliarity [J]. Computer Science, 2021, 48(8): 191-199.
[7] ZHOU Tian-yang, ZENG Zi-yi, ZANG Yi-chao, WANG Qing-xian. Team Cooperative Attack Planning Based on Multi-agent Joint Decision [J]. Computer Science, 2021, 48(5): 301-307.
[8] GAO Feng-yue, WANG Yan, ZHU Tie-lan. Resilient Distributed State Estimation Algorithm [J]. Computer Science, 2021, 48(5): 308-312.
[9] ZUO Jian-kai, WU Jie-hong, CHEN Jia-tong, LIU Ze-yuan, LI Zhong-zhi. Study on Heterogeneous UAV Formation Defense and Evaluation Strategy [J]. Computer Science, 2021, 48(2): 55-63.
[10] CHENG Sheng-gan, YU Hao-ran, WEI Jian-wen, James LIN. Design and Optimization of Two-level Particle-mesh Algorithm Based on Fixed-point Compression [J]. Computer Science, 2020, 47(8): 56-61.
[11] HAO Jiang-wei, GUO Shao-zhong, XIA Yuan-yuan, XU Jin-chen. Algorithm Design of Variable Precision Transcendental Functions [J]. Computer Science, 2020, 47(8): 71-79.
[12] REN Yi. Design of Network Multi-server SIP Information Encryption System Based on Block Chain and Artificial Intelligence [J]. Computer Science, 2020, 47(6A): 634-638.
[13] LI Li. Classification Algorithm of Distributed Data Mining Based on Judgment Aggregation [J]. Computer Science, 2020, 47(6A): 450-456.
[14] XU Zi-xi, MAO Xin-jun, YANG Yi, LU Yao. Modeling and Simulation of Q&A Community and Its Incentive Mechanism [J]. Computer Science, 2020, 47(6): 32-37.
[15] WU Tian-tian,WANG Jie. Belief Coordination for Multi-agent System Based on Possibilistic Answer Set Programming [J]. Computer Science, 2020, 47(2): 201-205.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!