Computer Science ›› 2018, Vol. 45 ›› Issue (11A): 146-148.

• Intelligent Computing • Previous Articles     Next Articles

Focused Crawling Based on Grey Wolf Algorithms

XIAO Jing-jie, CHEN Zhi-yun   

  1. Department of Computer Science and Technology,East China Normal University,Shanghai 200062,China
  • Online:2019-02-26 Published:2019-02-26

Abstract: In order to solve the problem that the focused crawler is difficult to achieve an optimal solution in the global search,and improve the accuracy of the topic crawler and the recall rate,this paper designed a focused crawler search strategy combined with grey wolf algorithm.The experimental results show that compared with the traditional breadth-first search strategy and the genetic algorithm which is also a swarm intelligence algorithm,the performance of the focused crawler based on grey wolf algorithm was greatly improved,and more topic-related web pages can be crawled.

Key words: Focused crawler, Grey wolf algorithm, Thematic relevance, Webpage importance

CLC Number: 

  • TP301.6
[1]CHO J,GARCIA-MOLINA H,PAGE L.Efficient crawlingth-rough URL ordering[J].Computer Networks and ISDN Systems,1998,30(1):161-172.
[2]KLEINBERG J M.Authoritative sources in a hyperlinked environment[J].Journal of the ACM (JACM),1999,46(5):604-632.
[3]HERSEOVICI M,JACOV M,MAREK Y S.The Shark-search algorithm an application:Tailored Web site mapping[J].Computer Networks and ISDN Systems,1998,23(1):41-58.
[4]杨小平,丁浩,黄都培.基于向量空间模型的中文信息检索技术研究[J].计算机工程与运用,2003,15:109-111.
[5]邹永斌,陈兴蜀,王文贤.基于贝叶斯分类器的主题爬虫研究[J].计算机应用研究,2009,26(9):3418-3420,3439.
[6]李璐,张国印,李正文.基于SVM的主题爬虫技术研究[J].计算机科学,2015,42(2):118-122.
[7]张莉婧,曾庆涛,李业丽,等.面向图书主题的爬虫算法研究[J].计算机科学,2017,44(11):460-463
[8]陈黎,李志易,琚生根,等.基于SVM预测的金融主题爬虫[J].四川大学学报(自然科学版),2010,47(3):493-497.
[9]MIRJALILI S,MIRJALILI S M,LEWIS A.Grey wolf optimization[J].Advances in Engineering Software,2014,69(7):46-61.
[10]魏政磊,赵辉,韩邦杰,等.具有自适应搜索策略的灰狼优化算法[J].计算机科学,2017,44(3):259-263.
[11]刘国靖,康丽,罗长寿,等.基于遗传算法的主题爬虫策略[J].计算机应用,2007,27(12):172-176.
[12]张海亮,袁道华.基于遗传算法的主题爬虫[J].计算机技术与发展,2012,22(8),48-52.
[13]郭振洲,刘然,拱长青,等.基于灰狼算法的改进研究[J].计算机应用研究,2017,34(12):3603-3606.
[14]荆文鹏,王育坚,董伟伟.自适应遗传算法在主题爬虫搜索略中的应用研究[J].计算机科学,2016,43(8):254-257.
[1] QUAN Yi-xuan, ZHENG Jia-li, LUO Wen-cong, LIN Zi-han, XIE Xiao-de. Improved Grey Wolf Optimizer for RFID Network Planning [J]. Computer Science, 2021, 48(1): 253-257.
[2] ZHOU Wen-xiang, QIAO Xue-gong. Anycast Routing Algorithm for Wireless Sensor Networks Based on Energy Optimization [J]. Computer Science, 2020, 47(12): 291-295.
[3] LIU Jing-fa, LI Fan, JIANG Sheng-yi. Focused Annealing Crawler Algorithm for Rainstorm Disasters Based on Comprehensive Priority and Host Information [J]. Computer Science, 2019, 46(2): 215-222.
[4] ZHAO Yun-tao, CHEN Jing-cheng, LI Wei-gang. Multi-objective Grey Wolf Optimization Hybrid Adaptive Differential Evolution Mechanism [J]. Computer Science, 2019, 46(11A): 83-88.
[5] ZHANG Li-jing, ZENG Qing-tao, LI Ye-li, SUN Hua-yan and ZI Yun-fei. Research on Crawler Algorithm for Theme of Books [J]. Computer Science, 2017, 44(Z11): 460-463.
[6] JING Wen-peng, WANG Yu-jian and DONG Wei-wei. Research on Adaptive Genetic Algorithm in Application of Focused Crawler Search Strategy [J]. Computer Science, 2016, 43(8): 254-257.
[7] LI Lu, ZHANG Guo-yin and LI Zheng-wen. Research on Focused Crawling Technology Based on SVM [J]. Computer Science, 2015, 42(2): 118-122.
[8] LI Gang ,ZHOU Li-Zhu ,GUO Qi ,LIN Ling (Department of Computer Science and Technology, Tsinghua University, Beijing 100084). [J]. Computer Science, 2007, 34(2): 137-140.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!