计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 146-148.

• 智能计算 • 上一篇    下一篇

基于灰狼算法的主题爬虫

萧婧婕, 陈志云   

  1. 华东师范大学计算机科学技术系 上海200062
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 通讯作者: 陈志云(1967-),女,副教授,主要研究方向为多媒体技术、教育技术,E-mail:13611947576@163.com
  • 作者简介:萧婧婕(1994-),女,硕士生,主要研究方向为信息检索
  • 基金资助:
    本文受基于MOOC的计算机课资源建设项目资助。

Focused Crawling Based on Grey Wolf Algorithms

XIAO Jing-jie, CHEN Zhi-yun   

  1. Department of Computer Science and Technology,East China Normal University,Shanghai 200062,China
  • Online:2019-02-26 Published:2019-02-26

摘要: 为了解决主题爬虫在全局搜索中难以实现最优解的问题,提高主题爬虫的准确率和召回率,文中设计了一个结合灰狼算法的主题爬虫搜索策略。实验结果表明,与传统的广度优先搜索策略以及同样是群体智能算法的遗传算法相比,基于灰狼算法的主题爬虫的性能有了很大的提高,能爬取到更多的主题相关的网页。

关键词: 灰狼算法, 网页重要性, 主题爬虫, 主题相关度

Abstract: In order to solve the problem that the focused crawler is difficult to achieve an optimal solution in the global search,and improve the accuracy of the topic crawler and the recall rate,this paper designed a focused crawler search strategy combined with grey wolf algorithm.The experimental results show that compared with the traditional breadth-first search strategy and the genetic algorithm which is also a swarm intelligence algorithm,the performance of the focused crawler based on grey wolf algorithm was greatly improved,and more topic-related web pages can be crawled.

Key words: Focused crawler, Grey wolf algorithm, Thematic relevance, Webpage importance

中图分类号: 

  • TP301.6
[1]CHO J,GARCIA-MOLINA H,PAGE L.Efficient crawlingth-rough URL ordering[J].Computer Networks and ISDN Systems,1998,30(1):161-172.
[2]KLEINBERG J M.Authoritative sources in a hyperlinked environment[J].Journal of the ACM (JACM),1999,46(5):604-632.
[3]HERSEOVICI M,JACOV M,MAREK Y S.The Shark-search algorithm an application:Tailored Web site mapping[J].Computer Networks and ISDN Systems,1998,23(1):41-58.
[4]杨小平,丁浩,黄都培.基于向量空间模型的中文信息检索技术研究[J].计算机工程与运用,2003,15:109-111.
[5]邹永斌,陈兴蜀,王文贤.基于贝叶斯分类器的主题爬虫研究[J].计算机应用研究,2009,26(9):3418-3420,3439.
[6]李璐,张国印,李正文.基于SVM的主题爬虫技术研究[J].计算机科学,2015,42(2):118-122.
[7]张莉婧,曾庆涛,李业丽,等.面向图书主题的爬虫算法研究[J].计算机科学,2017,44(11):460-463
[8]陈黎,李志易,琚生根,等.基于SVM预测的金融主题爬虫[J].四川大学学报(自然科学版),2010,47(3):493-497.
[9]MIRJALILI S,MIRJALILI S M,LEWIS A.Grey wolf optimization[J].Advances in Engineering Software,2014,69(7):46-61.
[10]魏政磊,赵辉,韩邦杰,等.具有自适应搜索策略的灰狼优化算法[J].计算机科学,2017,44(3):259-263.
[11]刘国靖,康丽,罗长寿,等.基于遗传算法的主题爬虫策略[J].计算机应用,2007,27(12):172-176.
[12]张海亮,袁道华.基于遗传算法的主题爬虫[J].计算机技术与发展,2012,22(8),48-52.
[13]郭振洲,刘然,拱长青,等.基于灰狼算法的改进研究[J].计算机应用研究,2017,34(12):3603-3606.
[14]荆文鹏,王育坚,董伟伟.自适应遗传算法在主题爬虫搜索略中的应用研究[J].计算机科学,2016,43(8):254-257.
[1] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[2] 范星泽, 禹梅.
改进灰狼算法的无线传感器网络覆盖优化
Coverage Optimization of WSN Based on Improved Grey Wolf Optimizer
计算机科学, 2022, 49(6A): 628-631. https://doi.org/10.11896/jsjkx.210500037
[3] 全艺璇, 郑嘉利, 罗文聪, 林子涵, 谢孝德.
基于改进型灰狼算法的RFID网络规划
Improved Grey Wolf Optimizer for RFID Network Planning
计算机科学, 2021, 48(1): 253-257. https://doi.org/10.11896/jsjkx.200200095
[4] 李阳, 李维刚, 赵云涛, 刘翱.
基于莱维飞行和随机游动策略的灰狼算法
Grey Wolf Algorithm Based on Levy Flight and Random Walk Strategy
计算机科学, 2020, 47(8): 291-296. https://doi.org/10.11896/jsjkx.190600107
[5] 周文祥, 乔学工.
基于能量优化的无线传感器网络任播路由算法
Anycast Routing Algorithm for Wireless Sensor Networks Based on Energy Optimization
计算机科学, 2020, 47(12): 291-295. https://doi.org/10.11896/jsjkx.190900069
[6] 刘景发, 李帆, 蒋盛益.
基于综合优先度和主机信息的暴雨灾害主题退火爬虫算法
Focused Annealing Crawler Algorithm for Rainstorm Disasters Based on Comprehensive Priority and Host Information
计算机科学, 2019, 46(2): 215-222. https://doi.org/10.11896/j.issn.1002-137X.2019.02.033
[7] 赵云涛, 谌竟成, 李维刚.
融合自适应差分进化机制的多目标灰狼优化算法
Multi-objective Grey Wolf Optimization Hybrid Adaptive Differential Evolution Mechanism
计算机科学, 2019, 46(11A): 83-88.
[8] 张悦,孙惠香,魏政磊,韩博.
具有自适应调整策略的混沌灰狼优化算法
Chaotic Gray Wolf Optimization Algorithm with Adaptive Adjustment Strategy
计算机科学, 2017, 44(Z11): 119-122. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.024
[9] 张莉婧,曾庆涛,李业丽,孙华艳,字云飞.
面向图书主题的爬虫算法研究
Research on Crawler Algorithm for Theme of Books
计算机科学, 2017, 44(Z11): 460-463. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.098
[10] 荆文鹏,王育坚,董伟伟.
自适应遗传算法在主题爬虫搜索策略中的应用研究
Research on Adaptive Genetic Algorithm in Application of Focused Crawler Search Strategy
计算机科学, 2016, 43(8): 254-257. https://doi.org/10.11896/j.issn.1002-137X.2016.08.051
[11] 王冲,纪仙慧.
基于用户兴趣与主题相关的PageRank算法改进研究
Improved PageRank Algorithm Based on User Interest and Topic
计算机科学, 2016, 43(3): 275-278. https://doi.org/10.11896/j.issn.1002-137X.2016.03.051
[12] 李璐,张国印,李正文.
基于SVM的主题爬虫技术研究
Research on Focused Crawling Technology Based on SVM
计算机科学, 2015, 42(2): 118-122. https://doi.org/10.11896/j.issn.1002-137X.2015.02.025
[13] 陈立辉,苏伟,蔡川,陈晓云.
基于LaTex的Web数学公式提取方法研究
Research of Extraction Method of Web Mathematical Formula Based on LaTex
计算机科学, 2014, 41(6): 148-154. https://doi.org/10.11896/j.issn.1002-137X.2014.06.029
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!