Computer Science ›› 2016, Vol. 43 ›› Issue (6): 316-320.doi: 10.11896/j.issn.1002-137X.2016.06.063

Previous Articles     Next Articles

Research on Recognition Algorithm for Subject Web Pages Based on Tag Tree Adjacency Matrix

SONG Jun, YANG Xiao-fu, LI Yi-cai and WANG Jia-wei   

  • Online:2018-12-01 Published:2018-12-01

Abstract: With the development of Web program technology,the same type subject pages can show the same visual feature information of the Web page by using different HTML tags,resulting in existing Web structure similarity algorithm which measures the structure similarity of the Web page base on matching the HTML tag name information can’taccurately recognize the same type subject pages.So,we proposed a recognition algorithm for the same type subject pages based on the tag tree adjacency matrix.This algorithm constructs Web page tag tree’s adjacency matrix and re-cognizes the same type subject pages by computing the structure similarity between the Web pages through the tag tree adjacency matrix.The experimental results indicate that the optimal performance of the algorithm can reach 100% recall rate and 96% precision rate,and the average performance can reach 97% recall rate and 89% precision rate.

Key words: Web page structure,Html tag,Tag tree adjacency matrix

[1] Lin Zhen-jiang,Lyu M R,King I.PageSim:A novel linkbasedmeasure of Web page similarity[C]∥Proc of the 15th WWW Conf.Los Alamitos:IEEE Computer Society Press,2006:1019-1020
[2] Kang Chun-ying.DOM based Web Page to Determine the Structure of the Similarity Algorithm[C]∥The Workshop on INtelligent Information Technology Applications IEEE.2009:245-248
[3] Shi Peng,Ding Lian-hong,Liu Bing-wu.Similarity Computation of Web Pages[C]∥IEEE International Symposium on Know-ledge Acqustion and Modeling Workshop,2008.Kam Workshop,2008:777-780
[4] Zhang Rui-xue.Research & Application of Web SimiliarityBased on DOM Tree[D].Dalian:Dalian University of Technology,2011(in Chinese) 张瑞学.基于DOM树的网页相似度研究与应用[D].大连:大连理工大学,2011
[5] He Xin,Xie Zhi-peng.Measurement of Web page structure similarity based on simple tree matching algorithm[J].The Research and Development of Computer,2007,44(Suppl):1-6(in Chinese) 何昕,谢志鹏.基于简单树匹配算法的Web页面结构相似性度量[J].计算机研究与发展,2007,4(Suppl.):1-6
[6] Li Rui,Zeng Jun-yu,Zhou Si-wang.Structural Similarity Mea-surement of Web Pages Based on Simple Tree Matching[J].The Development of Computer,2010,30(3):818-820(in Chinese) 李睿,曾俊瑀,周四望.基于局部标签树匹配的改进网页聚类算法[J].计算机应用,2010,30(3):818-820
[7] Liao Hao-wei,Yang Yan.An Improved Web Structure Similarity Based on Matching Algorithm of Tree Paths[J].Journal of Jilin University,2012,50(6):1200-1202(in Chinese) 廖浩伟,杨燕.一种改进的基于树路径匹配的网页结构相似度算法[J].吉林大学学报,2012,50(6):1200-1202
[8] Bapat R B.图与矩阵[M].吴少川,译.哈尔滨:哈尔滨工业大学出版社,2014

No related articles found!
Full text



No Suggested Reading articles found!