计算机科学 ›› 2014, Vol. 41 ›› Issue (Z11): 411-418.
李贵,陈韶刚,韩子扬,李征宇,孙平,孙焕良
LI Gui,CHEN Shao-gang,HAN Zi-yang,LI Zheng-yu,SUN Ping and SUN Huan-liang
摘要: 实例扩展与属性值扩充是Web抽取与集成领域中的一个重要研究课题,将Web数据列表和实例建模成二分图,根据扩展实例的质量分数,对扩展集合进行迭代更新直到扩展集合的质量分数最大,且扩展集合不再更新来实现实例的扩展。同时,为了完善扩展实例的属性信息,对结构化数值属性或离散属性进行抽取,提出了基于整数线性规划的属性值扩充方法。实验表明,与以前的方法相比,本方法能更好地处理含有噪声数据的Web网页,并提高了抽取的准确率和召回率。
[1] 刘兵.Web数据挖掘[M].愈勇,薛贵荣,韩定一,译.北京:清华大学出版社,2013 [2] Wang R,Cohen W.Iterative set expansion of named entity using the Web[C]∥Proceedings of the 2008 Eighth IEEE International Conference on Data Mining.2008:1091-1096 [3] Lin Xi-de,Zhao Bo,Weninger T,et al.Entity RelationDis-covery from Web Tables and Links[C]∥Proc.WWW.2010:1145-1146 [4] Wang R,Cohen W.Character-level analysis of semi-structureddocuments for set expansion[C]∥EMNLP.2009 [5] Etzioni O,Cafarella M,Downey D,et al.Web-scale information extraction in KnowItAll[C]∥ WWW.2004:100-110 [6] Pantel P,Crestan E,Borkovsky A,et al.Web-Scale DistributionalSimilarity and Entity Set Expansion[C]∥Proceedings of EMNLP2009.Singapore:ACL,2009:938-947 [7] He Ye-ye,Xin Dong.Set Expansion by Iterative Similarity Ag-gregation[C]∥Proc of WWW 2011.dia:ACM,2011:427-436 [8] Pennaechiotti M,Pantel P.Entity Extraction via Ensemble Semantics[C]∥Proc of EMNLP2009.Singapore:ACL,2009:238-247 [9] Tan Pang-ning,Kumar V.Introduction to Data Mining[M].2005 [10] 李贵,张淼,李征宇,等.基于领域模型的Web数据抽取与集成[J].微电子学与计算机,2012,9(9):152-156 [11] 马安香,张斌,高克宁,等.基于结果模式的Deep Web 数据抽取[J].计算机研究,2009,6(2):280-288 [12] Probst K,Ghani R,Krema M,et al.Semi-supervised learning of at-tribute-value pairs from product descriptions[C]∥Procee-dings of the 20th International Joint Conference on Artifical Intelligence.2007:2838-2843 [13] Pasca M.Organizing and searching the world wide web of facts-step two:harnessing the isdom of the crowds[C]∥Proceedings of the 16th International Conference on World Wide Web.2007:101-110 [14] Wick M,Culotta A,McCallum A.Learning Field Compatibilities to Extract Database Records from Unstructured Text[C]∥EMNLP.2006:603-611 |
No related articles found! |
|