计算机科学 ›› 2015, Vol. 42 ›› Issue (11): 284-287.doi: 10.11896/j.issn.1002-137X.2015.11.058

• 人工智能 • 上一篇    下一篇

基于视觉特征的网页最优分割算法

李文昊,彭红超,童名文,石俊杰   

  1. 华中师范大学教育信息技术学院 武汉430079,华中师范大学教育信息技术学院 武汉430079,华中师范大学教育信息技术学院 武汉430079,解放军63981部队 武汉430311
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受教育部科技发展中心网络时代的科技论文快速共享专项研究资助

Web Page Optimal Segmentation Algorithm Based on Visual Features

LI Wen-hao, PENG Hong-chao, TONG Ming-wen and SHI Jun-jie   

  • Online:2018-11-14 Published:2018-11-14

摘要: 网页分割技术是实现网页自适应呈现的关键。针对经典的基于视觉的网页分割算法VIPS(Vision-based Page Segmentation Algorithm)分割过碎和半自动的问题,基于图最优划分思想提出了一种新颖的基于视觉的网页最优分割算法VWOS(Vision-based Web Optimal Segmentation)。考虑到视觉特征和网页结构,将网页构造为加权无向连通图,网页分割转化为图的最优划分,基于Kruskal算法并结合网页分割的过程,设计网页分割算法VWOS。实验证明,与VIPS相比,采用VWOS算法分割网页的语义完整性更好,且不需要人工参与。

关键词: 网页最优分割,网页视觉特征,网页自适应呈现,最优划分

Abstract: The Web page segmentation technique is a key point to realize Web page adaptive presentation.To overcome the shortcomings of the classical Web page segmentation algorithm VIPS(Vision-based Page Segmentation Algorithm) including fragmented content and semi-automatic,a novel Web page segmentation VWOS(Vision-based Web Optimal Segmentation) was proposed based on the optimal division of graph.The Web page is constructed as the weighted undirected connected graph from the perspective of visual features and structure of the Web page.Therefore,the problem of Web page segmentation is transformed into the optimal division of graph.VWOS was designed by combining Kruskal algorithm and the process of the Web page segmentation.It was proved by the experimentation that the effect of Web page segmentation produced by VWOS is better than that by VIPS.

Key words: Web page optimal segmentation,Web page vision features,Web page adaptive presentation,Optimal division

[1] Diao Y,Lu H,Chen S,et al.Toward Learning Based Web Query Processing[C]∥VLDB.2000:317-328
[2] Wong W,Fu A W C.Finding Structure and Characteristics of Web Documents for Classification[C]∥ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Disco-very.2000(s1):96-105
[3] Kaasinen E,Aaltonen M,Kolari J,et al.Two approaches to bringing Internet services to WAP devices[J].Computer Networks,2000,33(1):231-246
[4] Buyukkokten O,Garcia-Molina H,Paepcke A.Accordion summarization for end-game browsing on PDAs and cellular phones[C]∥Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.ACM,2001:213-220
[5] 吴鹏飞,孟祥增,刘俊晓,等.网页区域分割与识别技术[J].现代计算机(专业版),2006(6):48-50 Wu Peng-fei,Meng Xiang-zeng,Liu Jun-xiao,et al.Segmentation and Identification of Web Page’s Areas[J].Modern Computer,2006(6):48-50
[6] 王畸,唐世渭,杨冬青,等.基于DOM的网页主题信息自动提取[J].计算机研究与发展,2004,41(10):1786-1792 Wang Qi,Tang Shi-wei,Yang Dong-qing,et al.DOM-based automatic extraction of topical information from Web pages[J].Journal of Computer Research and Development,2004,1(10):1786-1792
[7] Hattori G,Hoashi K,Matsumoto K,et al.Robust web page segmentation for mobile terminal using content-distances and page layout information[C]∥Proceedings of the 16th international conference on World Wide Web.ACM,2007:361-370
[8] Romero R,Berger A.Automatic partitioning of web pages using clustering[M]∥Mobile Human-Computer Interaction-MobileHCI 2004.Springer Berlin Heidelberg,2004:388-393
[9] Hattori G,Matsumoto K,Sugaya F.Auto Web Page DistillingScheme Using Content Distance Based on Depth of Tag Hierarchy[J].DBSJ Letters,2005,4(1):1-8
[10] Chen Y,Xie X,Ma W Y,et al.Adapting Web pages for small-screen devices[J].Internet Computing,IEEE,2005,9(1):50-56
[11] Sano H,Shiramatsu S,Ozono T,et al.A Web Page Segmentation Method based on Page Layouts and Title Blocks[J].International Journal of Computer Science and Network Security,2011,11(10):84-90
[12] Sano H,Swezey R M E,Shiramatsu S,et al.A Web Page Segmentation Method by using Headlines to Web Contents as Separators and its Evaluations[J].International Journal of Computer Science & Network Security,2013,13(1):1-6
[13] Cai D,Yu S,Wen J R,et al.VIPS:a vision-based page segmentation algorithm: MSR-TR-2003-79[R].Microsoft,2003
[14] 蒙韧,邵延振,袁鼎荣.一种基于页面Block 的Web信息提取方法[J].计算机技术与发展,2010,20(1):197-200 Meng Ren,Shao Yan-zhen,Yuan Ding-rong.A Web Information Extraction Algorithm Based on Web Page[J].Computer Technology and Development,2010,0(1):197-200
[15] Li L,Liu Y,Obregon A.Visual segmentation-based data record extraction from web documents[C]∥IEEE International Conference on Information Reuse and Integration,2007(IRI 2007).IEEE,2007:502-507
[16] 王静,姚勇,刘志镜.基于广义隐马尔可夫模型的网页信息抽取方法[J].山东大学学报(理学版),2007,42(11):49-52 Wang Jing,Yao Yong,Liu Zhi-jing.Web information extraction based on a generalized hidden Markov model[J].Journal of Shandong University(Natural Science),2007,2(11):49-52
[17] 史晶,吴庆波,杨沙洲.移动终端个性化页面显示优化技术研究[J].计算机工程,2012,38(18):277-281 Shi Jing,Wu Qing-bo,Yang Sha-zhou.Research on Personalized Page-display Optimization Technology in Mobile Terminal[J].Computer Engineering,2012,8(18):277-281
[18] Song R,Liu H,Wen J R,et al.Learning block importance mo-dels for Web pages[C]∥Proceedings of the 13th international conference on World Wide Web.ACM,2004:203-211
[19] 彭红超.一种基于视觉的网页分割技术及应用研究[D].武汉:华中师范大学,2014:21-26 Peng Hong-chao.Research on Technique and Application of a Vision-based Webpage Segmentation[D].Wuhan:Central China Normal University,2014:21-26

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!