计算机科学 ›› 2016, Vol. 43 ›› Issue (6): 316-320.doi: 10.11896/j.issn.1002-137X.2016.06.063
宋军,杨晓夫,李益才,王家伟
SONG Jun, YANG Xiao-fu, LI Yi-cai and WANG Jia-wei
摘要: 随着Web编程技术的发展,同类主题网页可以采用不同的Html标签展示出视觉特征相同的网页信息,导致需要匹配Html标签名称的现有网页结构相似性算法无法准确识别同类主题网页。因此,提出一种主题网页标签树邻接矩阵识别算法,通过构造主题网页标签树邻接矩阵,并利用邻接矩阵的结构特征来计算网页之间的结构相似度以实现同类主题网页识别。实验结果表明,该算法的最佳性能达到查全率100%、查准率96%,平均性能达到查全率97%、查准率89%。
[1] Lin Zhen-jiang,Lyu M R,King I.PageSim:A novel linkbasedmeasure of Web page similarity[C]∥Proc of the 15th WWW Conf.Los Alamitos:IEEE Computer Society Press,2006:1019-1020 [2] Kang Chun-ying.DOM based Web Page to Determine the Structure of the Similarity Algorithm[C]∥The Workshop on INtelligent Information Technology Applications IEEE.2009:245-248 [3] Shi Peng,Ding Lian-hong,Liu Bing-wu.Similarity Computation of Web Pages[C]∥IEEE International Symposium on Know-ledge Acqustion and Modeling Workshop,2008.Kam Workshop,2008:777-780 [4] Zhang Rui-xue.Research & Application of Web SimiliarityBased on DOM Tree[D].Dalian:Dalian University of Technology,2011(in Chinese) 张瑞学.基于DOM树的网页相似度研究与应用[D].大连:大连理工大学,2011 [5] He Xin,Xie Zhi-peng.Measurement of Web page structure similarity based on simple tree matching algorithm[J].The Research and Development of Computer,2007,44(Suppl):1-6(in Chinese) 何昕,谢志鹏.基于简单树匹配算法的Web页面结构相似性度量[J].计算机研究与发展,2007,4(Suppl.):1-6 [6] Li Rui,Zeng Jun-yu,Zhou Si-wang.Structural Similarity Mea-surement of Web Pages Based on Simple Tree Matching[J].The Development of Computer,2010,30(3):818-820(in Chinese) 李睿,曾俊瑀,周四望.基于局部标签树匹配的改进网页聚类算法[J].计算机应用,2010,30(3):818-820 [7] Liao Hao-wei,Yang Yan.An Improved Web Structure Similarity Based on Matching Algorithm of Tree Paths[J].Journal of Jilin University,2012,50(6):1200-1202(in Chinese) 廖浩伟,杨燕.一种改进的基于树路径匹配的网页结构相似度算法[J].吉林大学学报,2012,50(6):1200-1202 [8] Bapat R B.图与矩阵[M].吴少川,译.哈尔滨:哈尔滨工业大学出版社,2014 |
No related articles found! |
|