计算机科学 ›› 2014, Vol. 41 ›› Issue (8): 85-89.doi: 10.11896/j.issn.1002-137X.2014.08.018

• 2013年全国理论计算机科学学术年会 • 上一篇    下一篇

列名与数值不确定情况下的模式匹配问题研究

黄冬梅,冯恺,赵丹枫,郭颖新   

  1. 上海海洋大学信息学院 上海201306;上海海洋大学信息学院 上海201306;上海海洋大学信息学院 上海201306;上海海洋大学信息学院 上海201306
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金资助

Study on Schema Matching with Uncertain Column Names and Data Values

HUANG Dong-mei,FENG Kai,ZHAO Dan-feng and GUO Ying-xin   

  • Online:2018-11-14 Published:2018-11-14

摘要: 模式匹配是数据集成领域的一个重要研究内容,列名与数据值不确定是模式匹配中的一种常见情况,当前较普遍的方法是基于互信息及欧式空间距离。但该方法没有解决因属性相似度相同或相近而引起的错误匹配问题。针对该问题,提出了多重迭代筛选方法,首先确定两个关系模式中能一次性正确匹配的部分属性对,再从中求出最优属性对,然后给出基于条件互信息的匹配方法,利用最优属性对计算未匹配属性的条件互信息,进一步计算各属性之间的欧氏距离,最终得到匹配结果,从而解决了错误匹配问题。实验结果表明所提算法正确、有效。

关键词: 不确定性,模式匹配,条件互信息

Abstract: Schema matching is an important research in the field of data integration.The uncertainty of column names and data values is a common situation.The common method at present dealing with schema matching problem is based on mutual information and Euclidean distance.But this method does not solve the mistaken matching problem caused by the identity or the high similarity of the attributes.To solve this problem,this paper proposed multiple iterative screening method,which firstly,in two relation models,fixes some of the corrects attribute pairs in one time and then selects the best optimized attribute pair.Secondly,this paper lodged the method based on conditional mutual information,which utilizes the best optimized attribute pair to calculate the conditional mutual information of un-matched attributes and further calculates the Euclidean distance between each attribute.Finally,the matching result was acquired.The wrong matching problem was solved.The experiment result indicates the given algorithm is correct and effective.

Key words: Uncertainty,Schema matching,Conditional mutual information

[1] 翁年凤,刁兴春,曹建军,等.不确定模式匹配研究综述[J].计算机科学, 2011,38(12):1-5
[2] Doan A H,Halevy A Y.Semantic integration research in the database community:A brief survey [J].AI magazine,2005,26(1):83
[3] Kang J,Naughton J F.On schema matching with opaque column names and data values[J].International Conference on Management of Data:Proceedings of the 2003 ACM SIGMOD international conference on Management of data,2003,9(12):205-216
[4] Jaiswal A,Miller D J,Mitra P.Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields[J].ACM Transactions on Database Systems (TODS),2013,38(1):2
[5] Rabinovich B,Last M.Uninterpreted Semi-Automatic SchemaMatching Approach Using Inter-Attribute Dependencies[C]∥NATO Workshop on Semantic Interoperability Framework.Oslo.Norway.2011
[6] 吕锋,王虹,刘皓春.信息理论与编码[M].北京:人民邮电出版社,2004:1-200
[7] 王萼芳,石生明.高等数学(第三版)[M].北京:高等教育出版社,2003
[8] Chen W,Guo H,Zhang F,et al.Mining schema matching between heterogeneous databases[C]∥2012 2nd International Conference on Consumer Electronics,Communications and Networks (CECNet).IEEE,2012:1128-1131

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!