计算机科学 ›› 2017, Vol. 44 ›› Issue (3): 215-219.doi: 10.11896/j.issn.1002-137X.2017.03.045

• 软件与数据库技术 • 上一篇    下一篇

异构信息空间中时间感知的查询时实体识别与数据融合

杨丹,陈默,王刚,孙良旭   

  1. 辽宁科技大学软件学院 鞍山114051,东北大学计算中心 沈阳110004,辽宁科技大学软件学院 鞍山114051,辽宁科技大学软件学院 鞍山114051
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61402213,61402093)资助

Time-aware Query-time Entity Resolution and Data Fusion in Heterogeneous Information Spaces

YANG Dan, CHEN Mo, WANG Gang and SUN Liang-xu   

  • Online:2018-11-13 Published:2018-11-13

摘要: 已有的传统的实体识别技术大多是以线下、非实时的方式,在静态数据集上进行,对于大数据集的执行通常需要大量的时间和系统资源。对于异构信息空间中具有时间信息、不断演化的异构实体来说,时间感知的查询时实体识别与数据融合越来越成为一种保证数据质量和满足用户需求的发展趋势。针对异构信息空间中使用时间上下文的关键字查询进行的实体搜索,提出一种时间感知的查询时实体识别与数据融合方法TQ-ER,以给用户提供准确的实体概貌(entity profile);提出一种迭代式时间感知的实体候选集生成算法。TQ-ER充分利用查询的时间上下文和实体的时间信息给正确的回答一个给定查询所需要的、最少的实体数据,以进行识别与数据融合。在真实数据集上的大量实验结果表明了TQ-ER的有效性和正确性。

关键词: 时间感知,查询时实体识别,数据融合,异构信息空间

Abstract: Most of existing traditional entity resolution (ER) techniques mainly deal with static data sets by offline,non real-time methods.For large data sets,it usually requires a lot of time and system resources.In the face of evolved,hete-rogeneous entities with time information in heterogeneous information spaces,time-aware query-time ER and data fusion become a necessary trend to ensure data quality and user requirements.Aiming at entity search based on keyword query with temporal context in heterogeneous information spaces,this paper proposed a time-aware query-time ER approach TQ-ER to provide more accurate entity profiles to users.A time-aware iterative query expansion algorithm was proposed.TQ-ER leverags temporal context of query and temporal information of entities,which can identify the minimum entities to do ER and data fusion for a given query to be correctly answered.Extensive experimental results on real data sets show the effectiveness and correctness of TQ-ER.

Key words: Time-aware,Query-time entity resolution,Data fusion,Heterogeneous information spaces

[1] MISHRA N,WHITE R W,lEONG S,et al.Time-critical search[C]∥SIGIR.2014.
[2] CHRISTEN P,GAYLER R,HAWKING D.Similarity-Awareindexing for real-time entity resolution[C]∥Proc of CIKM.2009.
[3] RAMADAN B,CHRISTEN P,LIANG H,et al.Dynamic similarity-aware inverted indexing for real-time entity resolution[C]∥PAKDD Workshops.2013.
[4] CHRISTEN P.Data Matching-Concepts and techniques for record linkage,entity resolution,and duplicate detection[M].Springer-Verlag Berlin and Heidelberg GmbH & Co.k,2012.
[5] RAMADAN B,CHRISTEN P.Dynamic sorted neighborhoodindexing for real-time entity resolution[C]∥ADC.2014.
[6] RAMADAN B,CHRISTEN P.Forest-based dynamic sortedneighborhood indexing for real-time entity resolution[C]∥CIKM.2014.
[7] HERNANDEZ M A,STOLFO S J.The merge/purge problem for large databases[C]∥SIGMOD.1995.
[8] BHATTACHARYA I,GETOOR L,LICAMELE L.query-time entity resolution[C]∥KDD.2006:529-534.
[9] IOANNOU E,NEJDL W,NIEDEREE C,et al.On-the-fly entity-aware query processing in the presence of linkage[C]∥VLDB.2010:429-438.
[10] ALTWAIJRY H,KALASHNIKOV D V,MEHROTRA S.Query-driven approach to entity resolution[C]∥VLDB.2013.
[11] REZIG E K,DRAGUT E C,QUZZANI M,et al.Query-time record linkage and fusion over web databases[C]∥International Conference on Data Engineering.Seoul,South Korea,
[12] HERZIG D M,MIKA P,BLANCOR,et al.Federated entitysearch using on-the-fly consolidation[M]∥The Semantic Web-ISWC 2013.2013:167-183.
[13] DONG X,NAUMANN F.Data fusion-resolving data confiicts for integration[C]∥VLDB.2009.
[14] ZHANG Y X,LI Q Z,PENG Z H.2-Stage Data Conflict Resolution Based on Markov Logic Networks[J].Chinese Journal of Computers,2012,35(1):101-111.(in Chinese) 张永新,李庆忠,彭朝晖.基于Markov逻辑网的两阶段数据冲突解决方法[J].计算机学报,2012,35(1):101-111.
[15] GUO S,DONG X,SRIVASTAVA D,et al.Record linkage with uniqueness constraints and erroneous values[J].PVLDB,2010,3(1):417-428.
[16] LIU X,DONG X L,OOI B C,et al.Online data fusion[J].PVLDB,2011,4(12):932-943.
[17] FAN W F,GEERTS F,T ANG N,et al.Inferring data currency and consistency for conflict resolution[C]∥ICDE.2013:470-481.
[18] LI M H,LI J Z,GAO H.Evaluation of Data Currency[J].Chinese Journal of Computers,2012,35(1):2348-2360.(in Chinese) 李默涵,李建中,高宏.数据时效性判定问题的求解算法[J].计算机学报,2012,35(11):2348-2360.
[19] FAN W F,GEERTS F,et al.Determining the currency of data[C]∥PODS.2011.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!