计算机科学 ›› 2016, Vol. 43 ›› Issue (9): 247-249.doi: 10.11896/j.issn.1002-137X.2016.09.049

• 人工智能 • 上一篇    下一篇

融合位置相关和概率排序的Lucene排序算法改进

胡博,蒋宗礼   

  1. 北京工业大学计算机学院 北京100124,北京工业大学计算机学院 北京100124
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受计算机软件与理论北京市重点学科基金(007000541215042)资助

Improvement of Lucene Sorting Algorithm Fusing Location-related and Probabilistic Sorting

HU Bo and JIANG Zong-li   

  • Online:2018-12-01 Published:2018-12-01

摘要: 文档检索结果的排序和文本分类技术是解决垂直搜索、个性化信息检索、信息过滤等相关问题的核心技术。为了提高检索系统的性能,针对Lucene的基础排序算法,提出了一种融合位置相关和概率排序的改进方法。考虑到查询词在文档中出现的位置信息和概率排序对文档相关性的影响,利用位置相关的查询词权值和基于朴素贝叶斯分类算法的文档相关性概率值,对Lucene基础排序算法的评分公式进行改进。实验表明,该改进方法能够有效提高垂直搜索的准确率,使用户拥有更好的垂直搜索体验。

关键词: 位置相关,概率排序,Lucene,排序算法,垂直搜索

Abstract: Sorting document retrieval results and text classification technology is the core technology to solve vertical search,personalized information retrieval,information filtering and other related issues.In order to improve the performan-ce of retrieval systems,an improved method for integrating location-related and probabilistic sorting was proposed for Lucene default sorting algorithm.Taking into account the document relevance impact of query’s location information and probabilistic sorting,the scoring formula of Lucene default sorting algorithm is improved using the probability value of document relevance based on naive Bayesian classification algorithm and the weights of location-related query.Experimental results show that this improvement can effectively improve the accuracy of vertical search,allowing users to have better vertical search experience.

Key words: Location-related,Probabilistic sorting,Lucene,Sorting algorithm,Vertical search

[1] Liu J X,Sheng Y.The differences and case analysis of vertical and general search engines[J].Modern Information,2009,9(3):143-149(in Chinese) 刘俊熙,盛宇.垂直和通用搜索引擎的差异和案例分析[J].现代情报,2009,9(3):143-149
[2] 牛长流,尚宇.Lucene实战(第2版)[M].北京:人民邮电出版社,2011
[3] Bai K,Geng G H.Research and Application of vertical search engines based on Lucene/Heritrix[J].Computer Applications and Software,2009,6(1):212-215(in Chinese) 白坤,耿国华.基于Lucene/Heritrix的垂直搜索引擎的研究与应用[J].计算机应用与软件,2009,6(1):212-215
[4] Zhang X,Liu X F.Design and implementation of full-text search engine based on Lucene and Heritrix[J].Modern Computer ,2013(22):74-77(in Chinese) 张宣,刘晓飞.基于Lucene和Heritrix的全文搜索引擎的设计与实现[J].现代计算机,2013(22):74-77
[5] Cai F.Research and improvement of Lucene sorting algorithm[J].New Technology and New Products of China,2011(4):15-16(in Chinese) 蔡峰.Lucene排序算法的研究和改进[J].中国新技术新产品,2011(4):15-16
[6] Chen J X,Huang R,Ma Z B.Optimization and implementation of Lucene sorting algorithm based on PageRank[J].Computer Engineering and Science,2012,4(10):123-127(in Chinese) 陈建峡,黄日,马忠宝.基于PageRank的Lucene排序算法优化与实现[J].计算机工程与科学,2012,4(10):123-127
[7] Mohd M.Development of Search Engines using Lucene:An Experience[J].Procedia-Social and Behavioral Sciences,2011,8:282-286
[8] Milosavljevic,Branko,Boberic,et al.Retrieval of bibliographic records using Apache Lucene[J].The Electronic Library,2010,8(4):525-539
[9] Rong G,Zhang H X.Application of text classification in thesearch engine[J].Guide of Scitech Magazine,2008,2(2):14-15(in Chinese) 荣光,张化祥.文本分类在搜索引擎性能中的应用[J].科技致富向导,2008,2(2):14-15
[10] Lewis D D.Representation and learning in information retrieval[D].Graduate School of the University of Maassachusetts,1992
[11] Zhang X F.Analysis and evaluation of several common information retrieval model[J].Journal of Intelligence ,2008,7(3):121-123(in Chinese) 张小芳.几种常见信息检索模型的分析与评价[J].情报杂志,2008,7(3):121-123
[12] Croft W B,Metzler D,Strohman T.Search Engine:Information Retrieval in Practice[M].Pearson,2010

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!