Computer Science ›› 2015, Vol. 42 ›› Issue (3): 8-12.doi: 10.11896/j.issn.1002-137X.2015.03.002

Previous Articles     Next Articles

Summary and Prospect on Entity Resolution

ZHU Can and CAO Jian   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Entity Resolution(ER) is a key step in data cleaning,data integration,data mining and the insurance of data quality.This paper listed and explained some classic algorithms in the development of entity resolution,including pair-wise entity resolution,collective entity resolution,entity resolution on big data,and entity resolution on complex data et al.We also introduced the characteristics and limitation of these algorithms and shared some state-of-the-art algorithms derived from new application environment according to different requirements.Finally,the research hotspots and the development direction of this field were discussed.

Key words: Entity resolution,Record linkage,Collective data,Complex data,Big data

[1] Redman T C.The impact of poor data quality on the typical enterprise[J].Communication of ACM,1998,41(2):79-82
[2] Tejada S,Knoblock C A,Minton S.Learning object identification rules for information integration[J].Information Systems Journal,2001,(08):607-633
[3] Cochinwala M,et al.Efficient data reconciliation[J].Information Sciences,2001,137(1-4):1-5
[4] Bilenko M,Mooney R.Adaptive Duplicate Detection UsingLearnable String Similarity Measures[C]∥KDD 2003.2003:39-48
[5] Christen P.Automatic record linkage using seeded nearestneighbour and support vector machine classification[C]∥KDD 2008.2008:151-159
[6] Chen Z,et al.Exploiting context analysis for combining multiple entity resolution systems[C]∥SIGMOD 2009.2009:207-218
[7] Sarawagi R.Answering Table Gupta & S.Augmentation Queries from Unstructured Lists on the Web[J].PVLDB,2009,2(1):289-300
[8] Fellegi I,Sunter A.A Theory for Record Linkage[J].JASA1969,64(328):1183-1210
[9] Bhattacharya I,Getoor L.Collective Entity Resolution in Relational Data[C]∥TKDD 2007.2007
[10] Richardson M,Domingos P.Markov logic networks[J].Ma-chineLearning,2006,62(1/2):107-136
[11] Dong X,et al.Reference Recounciliation in Complex Information Spaces[C]∥SIGMOD 2005.2005
[12] Liben-Nowell,Kleinberg.The Link-Prediction Problem for Social Networks[J].Journal of the American Society for Information Science and Technology,2007,58(7):1019-1031
[13] Bhattacharya I,Getoor L.A Latent Dirichlet Model for Unsupervised Entity Resolution[C]∥SDM 2007.2007
[14] Broecheler M,Getoor L.Probabilistic Similarity Logic[C]∥UAI 2010.2010
[15] Singla P,Domingos P.Entity Resolution with Markov Logic[C]∥ICDM 2006.2006:572-582
[16] Broder A,et al.Min-Wise Independent Permutations[J].Journal of Computer and System Science,2010,0(3):630-659
[17] Hernandez M,Stolfo S.The merge/purge problem for large databases[C]∥SIGMOD 1995.1995:127-138
[18] McCallum A,et al.Efficient clustering of high-dimensional data sets with application to reference matching[C]∥KDD 2000.2000
[19] Kade A M,Heuser C A.Matching XML documents in highlydynamic applications[C]∥Proceedings of the 2008 ACM Symposium on Document Engineering.SaoPaulo,Brazil,2008:191-198
[20] Puhlmann S,Weis M,Naumann F.XML duplicate detectionusing sorted neighborhoods[C]∥Proceedings of the 10th International Conference on Extending Database Technology.Munich,Germany,2006:773-791
[21] Weis M,Naumann F,Dogmati X.tracks down duplicates inXML[C]∥Proceedings of the ACM SIGMOD International Conferenceon Management of Data.Baltimore,Maryland,USA,2005:431-442
[22] Carvalho J C P,Silva A S.Finding similar identities among objects from multiple websources[C]∥Proceedings of the 5th ACM CIKM International Workshop on Web Information and DataManagement.NewOrleans,Louisiana,USA,2003:90-93
[23] Tai K C.The tree-to-tree correction problem[J].Journal of ACM,1979,26(3):422-433
[24] Leito L,Calado P,Weis M.Structure based inference of xmlsimilarity for fuzzy duplicate detection[C]∥Proceedings of the 16th ACM Conferenceon Information and Knowledge Management.Lisbon,Portugal,2007:293-302
[25] Joshi S,Agrawal N,Krishnapuram R,et al.A bag of paths modelfor measuring.structural similarity in Web documents[C]∥Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and DataMining.Washington,DC,USA,2003:577-582
[26] Viyanon W,Madria S K.A system for detecting xml similarity in content and structure using relational database[C]∥Procee-dings of the 18th ACM Conference on Information and Know-ledge Management.HongKong,China,2009:1197-1206
[27] Li Pei,Dong X L,Maurino A,et al.Linking temporal records[C]∥Proceedings of the 37th International Conferenceon Very Large Data Bases (VLDB 11).Seattle,Washington,USA,2011
[28] 王宏志,樊文飞.复杂数据上的实体识别技术研究[J].计算机学,2011,38(10):1843-1852
[29] 杨丹,申德荣,于戈,等.数据空间中时间为中心的集合实体识别策略[J].计算机科学与探索,2012,9(11):1673-9418
[30] Papadakis G,Ioannou E,Palpanas T,et al.A Blocking Frame-work for Entity Resolution in Highly Heterogeneous Information Spaces[J].IEEE Trans.Knowl.Data Eng.(TKDE),2013,25(12):2665-2682
[31] Papadakis G,Ioannou E,Niederée C,et al.Efficient entity resolution for large heterogeneous information spaces[C]∥WSDM 2011.2011:535-544
[32] de Vries T,Ke H,Chawla S,et al.Robust Record Linkage Blocking Using Suffix Arrays[C]∥Proc.18th ACM Conf.Information and Knowledge Management (CIKM).2009:305-314
[33] Jin L,Li C,Mehrotra S.Efficient Record Linkage in Large Data Sets[C]∥Proc.Eighth Int’l Conf.Database Systems for Advanced Applications (DASFAA).2003
[34] Baxter R,Christen P,Churches T.A Comparison of Fast Blo-cking Methods for Record Linkage[C]∥Proc.Workshop Data Cleaning,Record Linkage and Object Consolidation at SIGKDD.2003:25-27
[35] Gravano L,Ipeirotis P,Jagadish H,et al.Approximate StringJoins in a Database (Almost) for Free[C]∥Proc.27th Int’l Conf.Very Large Data Bases (VLDB).2001:491-500
[36] Kalashnikov D V,Mehrotra S.Domain-independent data clea-ning via analysis of entityrelationship graph[J].ACM Trans.Datab.Syst.,2006,31(2):716-767
[37] Nuray-Turan R,Kalashnikov D V,Mehrotra S.Adaptive con-nection strength models for relationship-based entity resolution[J].Journal of Data and Information Quality,2013,4(2)
[38] Yakout M,Elmagarmid A K,Elmelegy H,et al.Behavior based record linkage[C]∥Proceedings of VLDB.2010
[39] Whang S E,Garcia-Molina H.Entity Resolution with Evolving Rules[J].Proceeding of the VLDB Endowment,2013(1/2):1326-1337
[40] Ramadan B,Christen P,Liang Hui-zhi, et al.Dynamic Similarity-Aware Inverted Indexing for Real-Time Entity Resolution[C]∥Trends and Aplplications in Knowledge Discovery and Data Mi-ning.Gold Coast Australia,Volume 7867,2013:47-58
[41] Altwaijry H,Kalashnikov D V,Mehrotra S.Query-Driven Ap-proach to Entity Resolution[J].PVLDB,2013,6(14):1846-1857
[42] Kenig B,Gal A.MFIBlocks:An effective blocking algorithm for entity resolution[J].Inf.Syst.(IS),2013,38(6):908-926
[43] 王颖颖,黄杜英,许多顶.向量空间中基于隐私保护的记录链接协议[J].现代电子技术,2009,32(14):138-141

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!