Computer Science ›› 2015, Vol. 42 ›› Issue (11): 284-287.doi: 10.11896/j.issn.1002-137X.2015.11.058

Previous Articles     Next Articles

Web Page Optimal Segmentation Algorithm Based on Visual Features

LI Wen-hao, PENG Hong-chao, TONG Ming-wen and SHI Jun-jie   

  • Online:2018-11-14 Published:2018-11-14

Abstract: The Web page segmentation technique is a key point to realize Web page adaptive presentation.To overcome the shortcomings of the classical Web page segmentation algorithm VIPS(Vision-based Page Segmentation Algorithm) including fragmented content and semi-automatic,a novel Web page segmentation VWOS(Vision-based Web Optimal Segmentation) was proposed based on the optimal division of graph.The Web page is constructed as the weighted undirected connected graph from the perspective of visual features and structure of the Web page.Therefore,the problem of Web page segmentation is transformed into the optimal division of graph.VWOS was designed by combining Kruskal algorithm and the process of the Web page segmentation.It was proved by the experimentation that the effect of Web page segmentation produced by VWOS is better than that by VIPS.

Key words: Web page optimal segmentation,Web page vision features,Web page adaptive presentation,Optimal division

[1] Diao Y,Lu H,Chen S,et al.Toward Learning Based Web Query Processing[C]∥VLDB.2000:317-328
[2] Wong W,Fu A W C.Finding Structure and Characteristics of Web Documents for Classification[C]∥ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Disco-very.2000(s1):96-105
[3] Kaasinen E,Aaltonen M,Kolari J,et al.Two approaches to bringing Internet services to WAP devices[J].Computer Networks,2000,33(1):231-246
[4] Buyukkokten O,Garcia-Molina H,Paepcke A.Accordion summarization for end-game browsing on PDAs and cellular phones[C]∥Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.ACM,2001:213-220
[5] 吴鹏飞,孟祥增,刘俊晓,等.网页区域分割与识别技术[J].现代计算机(专业版),2006(6):48-50 Wu Peng-fei,Meng Xiang-zeng,Liu Jun-xiao,et al.Segmentation and Identification of Web Page’s Areas[J].Modern Computer,2006(6):48-50
[6] 王畸,唐世渭,杨冬青,等.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1786-1792 Wang Qi,Tang Shi-wei,Yang Dong-qing,et al.DOM-based automatic extraction of topical information from Web pages[J].Journal of Computer Research and Development,2004,1(10):1786-1792
[7] Hattori G,Hoashi K,Matsumoto K,et al.Robust web page segmentation for mobile terminal using content-distances and page layout information[C]∥Proceedings of the 16th international conference on World Wide Web.ACM,2007:361-370
[8] Romero R,Berger A.Automatic partitioning of web pages using clustering[M]∥Mobile Human-Computer Interaction-MobileHCI 2004.Springer Berlin Heidelberg,2004:388-393
[9] Hattori G,Matsumoto K,Sugaya F.Auto Web Page DistillingScheme Using Content Distance Based on Depth of Tag Hierarchy[J].DBSJ Letters,2005,4(1):1-8
[10] Chen Y,Xie X,Ma W Y,et al.Adapting Web pages for small-screen devices[J].Internet Computing,IEEE,2005,9(1):50-56
[11] Sano H,Shiramatsu S,Ozono T,et al.A Web Page Segmentation Method based on Page Layouts and Title Blocks[J].International Journal of Computer Science and Network Security,2011,11(10):84-90
[12] Sano H,Swezey R M E,Shiramatsu S,et al.A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations[J].International Journal of Computer Science & Network Security,2013,13(1):1-6
[13] Cai D,Yu S,Wen J R,et al.VIPS:a vision-based page segmentation algorithm: MSR-TR-2003-79[R].Microsoft,2003
[14] 蒙韧,邵延振,袁鼎荣.一种基于页面Block 的Web信息提取方法[J].计算机技术与发展,2010,20(1):197-200 Meng Ren,Shao Yan-zhen,Yuan Ding-rong.A Web Information Extraction Algorithm Based on Web Page[J].Computer Technology and Development,2010,0(1):197-200
[15] Li L,Liu Y,Obregon A.Visual segmentation-based data record extraction from web documents[C]∥IEEE International Conference on Information Reuse and Integration,2007(IRI 2007).IEEE,2007:502-507
[16] 王静,姚勇,刘志镜.基于广义隐马尔可夫模型的网页信息抽取方法[J].山东大学学报(理学版),2007,42(11):49-52 Wang Jing,Yao Yong,Liu Zhi-jing.Web information extraction based on a generalized hidden Markov model[J].Journal of Shandong University(Natural Science),2007,2(11):49-52
[17] 史晶,吴庆波,杨沙洲.移动终端个性化页面显示优化技术研究[J].计算机工程,2012,38(18):277-281 Shi Jing,Wu Qing-bo,Yang Sha-zhou.Research on Personalized Page-display Optimization Technology in Mobile Terminal[J].Computer Engineering,2012,8(18):277-281
[18] Song R,Liu H,Wen J R,et al.Learning block importance mo-dels for Web pages[C]∥Proceedings of the 13th international conference on World Wide Web.ACM,2004:203-211
[19] 彭红超.一种基于视觉的网页分割技术及应用研究[D].武汉:华中师范大学,2014:21-26 Peng Hong-chao.Research on Technique and Application of a Vision-based Webpage Segmentation[D].Wuhan:Central China Normal University,2014:21-26

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!