Computer Science ›› 2016, Vol. 43 ›› Issue (8): 254-257.doi: 10.11896/j.issn.1002-137X.2016.08.051

Previous Articles     Next Articles

Research on Adaptive Genetic Algorithm in Application of Focused Crawler Search Strategy

JING Wen-peng, WANG Yu-jian and DONG Wei-wei   

  • Online:2018-12-01 Published:2018-12-01

Abstract: How to design the crawler search strategy to improve the crawler’s coverage and accuracy has become a hot research point in the focused crawler.Mostly crawler uses best-first search algorithm.Based on the focused crawler which uses this search strategy will easily plunge into local optimum,we combined genetic algorithm with focused crawler search strategy.We set dynamic fitness function and genetic-operators to make the crawlers have certain adaptive searching adaptability.By comparing with those crawlers which use the other search strategy or which combine with traditional genetic algorithm search strategy,the experimental results show that this algorithm can partly improve the crawler search ability.

Key words: Focused crawler,Important degree,Genetic algorithm,Genetic operators,Fitness function

[1] Xian Xiao-ping.An algorithm based on a comprehensive improvement of PageRank algorithm[D].Xi’an:Northwest University,2010(in Chinese) 县小平.搜索引擎PageRank算法研究[D].西安:西北大学,2010
[2] Zou Yong-bin,et al.Research on focused crawler based on Bayes classifier[J].Application Research of Computers, 2009,6(9):3418-3420,3439(in Chinese) 邹永斌,等.基于贝叶斯分类器的主题爬虫研究[J].计算机应用研究,2009,6(9):3418-3420,3439
[3] Luo Lin-bo,et al.Research on Topical Crawler of Shark-Search Algorithm and HITS Algorithm[J].Computer Technology and Development,2010,0(11):76-79(in Chinese) 罗林波,等.基于Shark-Search和Hits算法的主题爬虫研究[J].计算机技术与发展,2010,0(11):76-79
[4] Song Hai-yang,et al.A Novel Crawling Strategy of FocusedWeb Crawler[J].Computer Application and Software, 2011,8(11):264-267,293(in Chinese) 宋海洋,等.一种新的主题网络爬虫爬行策略[J].计算机应用与软件,2011,8(11):264-267,293
[5] Wei Jing-jing,et al.Focused Crawler Based on Improved Algorithm of Web Content Similarity[J].Computer and Modernization,2011,3(9):1-4(in Chinese) 魏晶晶,等.基于网页内容相似度改进算法的主题网络爬虫[J].计算机与现代化,2011,3(9):1-4
[6] Bai Yu-zhao,et al.Research and implementation for focused cra-wler based on probabilistic model[J].Computer Engineering & Science,2013,5(1):160-165(in Chinese) 白玉昭,等.基于概率模型的主题爬虫的研究和实现[J].计算机工程与科学,2013,5(1):160-165
[7] Liu Zuo-da,et al.Focused Crawling Algorithm for BBS Information Retrieval[J].Journal of Zhengzhou University(Natural Science Edition),2010,2(2):22-25(in Chinese) 刘佐达,等.一种面向BBS信息检索的主题网络爬虫算法[J].郑州大学学报(理学版),2010,2(2):22-25
[8] Deng Yue-gui.Heuristic Search in Network Crawler Application Analysis[J].Software Guide,2008(2):80-82(in Chinese) 邓岳贵.启发式搜索在网络爬虫中应用的分析[J].软件导刊,2008(2):80-82
[9] Salton G.Automatic Text Processing:The Transformation,Analysis,and Tetrieval of Information by Computer[M].Addison-Wesley,Reading,Pennsylvania,1989
[10] 玄光男,程润传.遗传算法与工程设计[M].汪定伟,等译.北京:科学出版社,2000
[11] Li Lu,Zhang Guo-yin,et al.Defence Industry Secrecy Examination and Certification Center Laboratory[J].Computer Science,2015,2(2):118-122(in Chinese) 李璐,张国印,等.基于SVM的主题爬虫技术研究[J].计算机科学,2015,42(2):118-122
[12] Li Dong,Pan Zhi-song.Research on Parallel Genetic Algorithms Based on MapReduce[J].Computer Science,2012,9(7):182-184,4(in Chinese) 李东,潘志松.一种适用于大规模变量的并行遗传算法研究[J].计算机科学,2012,39(7):182-184,204
[13] Srinivas M,PatnaikI M.Adaptive Probabilities of Crososverand Mutationin Genetie Algorithm [J],IEEE Trans.on Systems.Manand Cybenreties,1994(4):656-667

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!