基于DOM树的可适应性Web信息抽取

doi:10.11896/j.issn.1002-137X.2009.07.048

Computer Science ›› 2009, Vol. 36 ›› Issue (7): 202-203.doi: 10.11896/j.issn.1002-137X.2009.07.048

Previous Articles Next Articles

Adaptive Web Information Extraction Based on Tree

LI Zhao,PENG Hong,YE Su-nan,ZHANG Huan,YANG Qin-yao

Online:2018-11-16 Published:2018-11-16

Abstract

Abstract: Many Web information extraction methods are related to wrapper induction. It extracts the items by the rules learnt from the Web pages used for training. Although it can get the information accurately,it is hard to be maintained when the template of the Web site is changed, as it needs to learn the rules again. In our research, we put forward a new adaptive Web information extraction. It determines the block which contains all information about the merchandise by using the keywords of a certain topic, which is based on DOM tree structure. The experiments on a great amount of Web pages show that our method can not only extract the information efficiently, but also is irrelevant to the site structure,which can be widely used for many different Web information extractions.

Key words: DOM tree, Information extraction, Adaptive

LI Zhao,PENG Hong,YE Su-nan,ZHANG Huan,YANG Qin-yao. Adaptive Web Information Extraction Based on Tree[J].Computer Science, 2009, 36(7): 202-203.