计算机科学 ›› 2014, Vol. 41 ›› Issue (Z11): 411-418.

• 软件工程与数据库技术 • 上一篇    下一篇

基于Web的实例扩展与属性值扩充方法

李贵,陈韶刚,韩子扬,李征宇,孙平,孙焕良   

  1. 沈阳建筑大学信息与控制工程学院 沈阳110168;沈阳建筑大学信息与控制工程学院 沈阳110168;沈阳建筑大学信息与控制工程学院 沈阳110168;沈阳建筑大学信息与控制工程学院 沈阳110168;沈阳建筑大学信息与控制工程学院 沈阳110168;沈阳建筑大学信息与控制工程学院 沈阳110168
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61070024),辽宁省自然科学基金(2014020068)资助

Entities Expansion and Attribute Values Discovery Method Based on Web

LI Gui,CHEN Shao-gang,HAN Zi-yang,LI Zheng-yu,SUN Ping and SUN Huan-liang   

  • Online:2018-11-14 Published:2018-11-14

摘要: 实例扩展与属性值扩充是Web抽取与集成领域中的一个重要研究课题,将Web数据列表和实例建模成二分图,根据扩展实例的质量分数,对扩展集合进行迭代更新直到扩展集合的质量分数最大,且扩展集合不再更新来实现实例的扩展。同时,为了完善扩展实例的属性信息,对结构化数值属性或离散属性进行抽取,提出了基于整数线性规划的属性值扩充方法。实验表明,与以前的方法相比,本方法能更好地处理含有噪声数据的Web网页,并提高了抽取的准确率和召回率。

关键词: 实例扩展,属性值扩充,整数线性规划

Abstract: Entities expansion and attribute values discovery has been an important research topic in the field of Web data extraction and integration.In this paper the Web table and domain entity were modeled as bipartite graph.Based on quality score,the expansion entity set will be update iteratively until the expansion entity set’s quality score reaches a local maximum and the expansion entity set will not update.To collect structured numerical or discrete attributes of the entities,we presented a method based on ILP to complete the attribute values discovery of the entities.Experiment results show that the proposed approach outperforms previous techniques in terms of both precision and recall.

Key words: Entity expansion,Attribute values filling,Integer linear program

[1] 刘兵.Web数据挖掘[M].愈勇,薛贵荣,韩定一,译.北京:清华大学出版社,2013
[2] Wang R,Cohen W.Iterative set expansion of named entity using the Web[C]∥Proceedings of the 2008 Eighth IEEE International Conference on Data Mining.2008:1091-1096
[3] Lin Xi-de,Zhao Bo,Weninger T,et al.Entity RelationDis-covery from Web Tables and Links[C]∥Proc.WWW.2010:1145-1146
[4] Wang R,Cohen W.Character-level analysis of semi-structureddocuments for set expansion[C]∥EMNLP.2009
[5] Etzioni O,Cafarella M,Downey D,et al.Web-scale information extraction in KnowItAll[C]∥ WWW.2004:100-110
[6] Pantel P,Crestan E,Borkovsky A,et al.Web-Scale DistributionalSimilarity and Entity Set Expansion[C]∥Proceedings of EMNLP2009.Singapore:ACL,2009:938-947
[7] He Ye-ye,Xin Dong.Set Expansion by Iterative Similarity Ag-gregation[C]∥Proc of WWW 2011.dia:ACM,2011:427-436
[8] Pennaechiotti M,Pantel P.Entity Extraction via Ensemble Semantics[C]∥Proc of EMNLP2009.Singapore:ACL,2009:238-247
[9] Tan Pang-ning,Kumar V.Introduction to Data Mining[M].2005
[10] 李贵,张淼,李征宇,等.基于领域模型的Web数据抽取与集成[J].微电子学与计算机,2012,9(9):152-156
[11] 马安香,张斌,高克宁,等.基于结果模式的Deep Web 数据抽取[J].计算机研究,2009,6(2):280-288
[12] Probst K,Ghani R,Krema M,et al.Semi-supervised learning of at-tribute-value pairs from product descriptions[C]∥Procee-dings of the 20th International Joint Conference on Artifical Intelligence.2007:2838-2843
[13] Pasca M.Organizing and searching the world wide web of facts-step two:harnessing the isdom of the crowds[C]∥Proceedings of the 16th International Conference on World Wide Web.2007:101-110
[14] Wick M,Culotta A,McCallum A.Learning Field Compatibilities to Extract Database Records from Unstructured Text[C]∥EMNLP.2006:603-611

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!