Computer Science ›› 2013, Vol. 40 ›› Issue (Z11): 379-382.

Previous Articles     Next Articles

Rule-based Preprocessing Algorithm for Web Page Segmentation

PENG Hong-chao,TONG Ming-wen,ZOU Jun-hua and HAO Qiu-hong   

  • Online:2018-11-16 Published:2018-11-16

Abstract: Since the independent design between web contents and styles of National Level Excellent Courses,web page segmentation algorithm can hardly run.We present a rule-based preprocessing algorithm of web page segmentation to create correlation between tags and style information.The algorithm consists of three steps:first,get the style information;second,associate styles with tags;third,output HTML and PerfectNode which is associated class list.We selected 100pages from the National Level Excellent Courses randomly to run the preprocessing algorithm.Experimental results show that the algorithm can associate tags with styles efficiently,which can solve the problems that web page segmentation algorithm cannot run.

Key words: Web page segmentation,Preprocessing algorithm,Cascading style sheets,Style information

[1] Sano H,Shiramatsu S,Ozono T,et al.A Web Page Segmentation Method based on Page Layouts and Title Blocks[J].International Journal of Computer Science and Network Security,2011,11(10):84-90
[2] Chibane I,Doan B L.A Web page topic segmentation algorithm based on visual criteria and content layout[C]∥Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval.ACM,2007:817-818
[3] Simon K,Lausen G.ViPER:augmenting automatic information extraction with visual perceptions[C]∥Proceedings of the 14th ACM international conference on Information and knowledge management.ACM,2005:381-388
[4] Cai D,Yu S,Wen J R,et al.VIPS:a visionbased page segmentation algorithm[R].Microsoft technical report,MSR-TR-2003-79.2003
[5] Gupta A,Kumar A,Tripathi V N,et al.Mobile web:web manipulation for small displays using multi-level hierarchy page segmentation[C]∥Proceedings of the 4th international conference on mobile technology,applications,and systems and the 1st international symposium on Computer human interaction in mobile technology.ACM,2007:599-606
[6] Yang S J H,Zhang J,Chen R C S,et al.A unit of information-based content adaptation method for improving web content accessibility in the mobile Internet[J].ETRI journal,2007,29(6):794-807
[7] Chen Y,Xie X,Ma W Y,et al.Adapting web pages for small-screen devices[J].Internet Computing,IEEE,2005,9(1):50-56
[8] Artail A,Raydan M.Device-aware desktop web page transformation for rendering on handhelds[J].Personal and Ubiquitous Computing,2005,9(6):368-380

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!