计算机科学 ›› 2016, Vol. 43 ›› Issue (8): 254-257.doi: 10.11896/j.issn.1002-137X.2016.08.051
荆文鹏,王育坚,董伟伟
JING Wen-peng, WANG Yu-jian and DONG Wei-wei
摘要: 如何提高爬虫覆盖率和准确率是主题爬虫的研究热点之一。目前大多采用最佳优先搜索策略,针对该类主题爬虫易陷入局部最优的不足,设计结合遗传算法的主题爬虫搜索策略,并设计动态适应度函数和遗传算子使得爬虫具有一定的自适应性。与其他搜索策略以及结合非自适应遗传算法的搜索策略进行了比较,结果表明该算法能够在一定程度上提高爬虫性能。
[1] Xian Xiao-ping.An algorithm based on a comprehensive improvement of PageRank algorithm[D].Xi’an:Northwest University,2010(in Chinese) 县小平.搜索引擎PageRank算法研究[D].西安:西北大学,2010 [2] Zou Yong-bin,et al.Research on focused crawler based on Bayes classifier[J].Application Research of Computers, 2009,6(9):3418-3420,3439(in Chinese) 邹永斌,等.基于贝叶斯分类器的主题爬虫研究[J].计算机应用研究,2009,6(9):3418-3420,3439 [3] Luo Lin-bo,et al.Research on Topical Crawler of Shark-Search Algorithm and HITS Algorithm[J].Computer Technology and Development,2010,0(11):76-79(in Chinese) 罗林波,等.基于Shark-Search和Hits算法的主题爬虫研究[J].计算机技术与发展,2010,0(11):76-79 [4] Song Hai-yang,et al.A Novel Crawling Strategy of FocusedWeb Crawler[J].Computer Application and Software, 2011,8(11):264-267,293(in Chinese) 宋海洋,等.一种新的主题网络爬虫爬行策略[J].计算机应用与软件,2011,8(11):264-267,293 [5] Wei Jing-jing,et al.Focused Crawler Based on Improved Algorithm of Web Content Similarity[J].Computer and Modernization,2011,3(9):1-4(in Chinese) 魏晶晶,等.基于网页内容相似度改进算法的主题网络爬虫[J].计算机与现代化,2011,3(9):1-4 [6] Bai Yu-zhao,et al.Research and implementation for focused cra-wler based on probabilistic model[J].Computer Engineering & Science,2013,5(1):160-165(in Chinese) 白玉昭,等.基于概率模型的主题爬虫的研究和实现[J].计算机工程与科学,2013,5(1):160-165 [7] Liu Zuo-da,et al.Focused Crawling Algorithm for BBS Information Retrieval[J].Journal of Zhengzhou University(Natural Science Edition),2010,2(2):22-25(in Chinese) 刘佐达,等.一种面向BBS信息检索的主题网络爬虫算法[J].郑州大学学报(理学版),2010,2(2):22-25 [8] Deng Yue-gui.Heuristic Search in Network Crawler Application Analysis[J].Software Guide,2008(2):80-82(in Chinese) 邓岳贵.启发式搜索在网络爬虫中应用的分析[J].软件导刊,2008(2):80-82 [9] Salton G.Automatic Text Processing:The Transformation,Analysis,and Tetrieval of Information by Computer[M].Addison-Wesley,Reading,Pennsylvania,1989 [10] 玄光男,程润传.遗传算法与工程设计[M].汪定伟,等译.北京:科学出版社,2000 [11] Li Lu,Zhang Guo-yin,et al.Defence Industry Secrecy Examination and Certification Center Laboratory[J].Computer Science,2015,2(2):118-122(in Chinese) 李璐,张国印,等.基于SVM的主题爬虫技术研究[J].计算机科学,2015,42(2):118-122 [12] Li Dong,Pan Zhi-song.Research on Parallel Genetic Algorithms Based on MapReduce[J].Computer Science,2012,9(7):182-184,4(in Chinese) 李东,潘志松.一种适用于大规模变量的并行遗传算法研究[J].计算机科学,2012,39(7):182-184,204 [13] Srinivas M,PatnaikI M.Adaptive Probabilities of Crososverand Mutationin Genetie Algorithm [J],IEEE Trans.on Systems.Manand Cybenreties,1994(4):656-667 |
No related articles found! |
|