计算机科学 ›› 2013, Vol. 40 ›› Issue (Z11): 379-382.

• 智能系统及应用 • 上一篇    下一篇

基于规则的网页分割预处理算法研究

彭红超,童名文,邹军华,郝秋红   

  1. 华中师范大学信息与新闻传播学院 武汉430079;华中师范大学信息与新闻传播学院 武汉430079;湖北大学教育学院 武汉430070;华中师范大学信息与新闻传播学院 武汉430079
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受教育部人文社科基金项目:移动学习服务适配决策技术及优化策略研究(10YJC880113),国家科技支撑计划课题:全媒体在线编辑与适配推送数字出版技术研究及应用示范(2013BAH30F01),中央高校基本科研业务费项目:泛在多媒体服务中内容适配决策模型及优化研究,中央高校基本科研业务费项目:数字化学习环境及工具的典型应用(CCNU10C01003)资助

Rule-based Preprocessing Algorithm for Web Page Segmentation

PENG Hong-chao,TONG Ming-wen,ZOU Jun-hua and HAO Qiu-hong   

  • Online:2018-11-16 Published:2018-11-16

摘要: 针对国家精品课程网站中网页内容和样式独立设计,网页分割算法难以运行的问题,基于规则提出了一种网页分割预处理算法,建立了网页标签和样式信息的关联。算法包括3个步骤:第一,获取样式信息;第二,关联样式信息和标签;第三,输出HTML和PerfectNode关联类列表。随机选取了100个国家精品课程网站的网页运行预处理算法,实验结果表明该算法可以有效地 融合 网页标签和样式信息,解决了网页分割算法无法运行的问题。

关键词: 网页分割,预处理算法,级联样式表,样式信息

Abstract: Since the independent design between web contents and styles of National Level Excellent Courses,web page segmentation algorithm can hardly run.We present a rule-based preprocessing algorithm of web page segmentation to create correlation between tags and style information.The algorithm consists of three steps:first,get the style information;second,associate styles with tags;third,output HTML and PerfectNode which is associated class list.We selected 100pages from the National Level Excellent Courses randomly to run the preprocessing algorithm.Experimental results show that the algorithm can associate tags with styles efficiently,which can solve the problems that web page segmentation algorithm cannot run.

Key words: Web page segmentation,Preprocessing algorithm,Cascading style sheets,Style information

[1] Sano H,Shiramatsu S,Ozono T,et al.A Web Page Segmentation Method based on Page Layouts and Title Blocks[J].International Journal of Computer Science and Network Security,2011,11(10):84-90
[2] Chibane I,Doan B L.A Web page topic segmentation algorithm based on visual criteria and content layout[C]∥Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval.ACM,2007:817-818
[3] Simon K,Lausen G.ViPER:augmenting automatic information extraction with visual perceptions[C]∥Proceedings of the 14th ACM international conference on Information and knowledge management.ACM,2005:381-388
[4] Cai D,Yu S,Wen J R,et al.VIPS:a visionbased page segmentation algorithm[R].Microsoft technical report,MSR-TR-2003-79.2003
[5] Gupta A,Kumar A,Tripathi V N,et al.Mobile web:web manipulation for small displays using multi-level hierarchy page segmentation[C]∥Proceedings of the 4th international conference on mobile technology,applications,and systems and the 1st international symposium on Computer human interaction in mobile technology.ACM,2007:599-606
[6] Yang S J H,Zhang J,Chen R C S,et al.A unit of information-based content adaptation method for improving web content accessibility in the mobile Internet[J].ETRI journal,2007,29(6):794-807
[7] Chen Y,Xie X,Ma W Y,et al.Adapting web pages for small-screen devices[J].Internet Computing,IEEE,2005,9(1):50-56
[8] Artail A,Raydan M.Device-aware desktop web page transformation for rendering on handhelds[J].Personal and Ubiquitous Computing,2005,9(6):368-380

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!