Computer Science ›› 2009, Vol. 36 ›› Issue (11): 235-237.
Previous Articles Next Articles
GU Yun-hua,TIAN Wei
Online:
Published:
Abstract: A method of information extraction from Web pages was presented, and it is based on extended DOM tree. Web pages were firstly transformed to DOM tree, then the DOM tree was extended by adding semantic expression to node and influence degree was calculated for each node. According to influence degree of nodes, the DOM tree was pruned,and it can automatically extract the useful relevant content from Web pages. This approach is a universal method, which does not recauire to pre-know the structure of the Web page. The results of the information extraction are used not only for browsing but also for further Web information process, such as Internet data mining, topic-based search engine.
Key words: DOM,Extraction of information from Web pages,Influence degree,Extended DOM tree
GU Yun-hua,TIAN Wei. Extraction of Information from Web Pages Based on Extended DOM Tree[J].Computer Science, 2009, 36(11): 235-237.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://www.jsjkx.com/EN/
https://www.jsjkx.com/EN/Y2009/V36/I11/235
Cited