计算机科学 ›› 2015, Vol. 42 ›› Issue (6): 18-22.doi: 10.11896/j.issn.1002-137X.2015.06.004
李维银,石玉龙,陈杰,施重阳
LI Wei-yin, SHI Yu-long, CHEN Jie and SHI Chong-yang
摘要: 查询扩展作为查询优化的重要组成部分,对改善信息检索系统的性能起到了至关重要的作用。传统的伪相关反馈查询扩展方法虽然在一定程度上提高了检索性能,但选择的扩展词中会包含一部分与原查询不相关的词语,这对检索性能的提升产生了不利影响。提出了一种基于分类模型的查询扩展方法,该算法综合候选扩展词的统计信息和多种特征,采用朴素贝叶斯分类模型对初次得到的候选扩展词进行再次分类选择,进一步去除与查询词相关性小的扩展词。在TREC 2013数据集上的实验结果表明,提出的查询扩展方法能够有效提高用户查询的查准率和查全率。
[1] Jansen B J,Spink A,Saracevic T.Real life,real users,and real needs:a study and analysis of user queries on the web[J].Information Processing & Management,2000,36(2):207-227 [2] Ogilvie P,Voorhees E,Callan J.On the number of terms used in automatic query expansion[J].Information Retrieval,2009,12(6):666-679 [3] 余慧佳,刘奕群,张敏,等.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114 Yu Jia-hui,Liu Yi-qun,Zhang Min,et al.Research in Search Engine User Behavior Based on Log Analysis[J].Journal of Information Processing,2007,21(1):109-114 [4] Imran H,Sharan A.A framework for automatic query expansion[M]∥Web Information Systems and Mining.Springer Berlin Heidelberg,2010:386-393 [5] Carpineto C,De Mori R,Romano G,et al.An information-theoretic approach to automatic query expansion[J].ACM Transactions on Information Systems(TOIS),2001,19(1):1-27 [6] Xu J,Croft W B.Improving the effectiveness of information retrieval with local context analysis[J].ACM Transactions on Information Systems(TOIS),2000,18(1):79-112 [7] Pal D,Mitra M,Datta K.Query expansion using term distribution and term association[J].arXiv preprint arXiv:1303.0667,2013 [8] Luo J,Meng B,Tu X,et al.Selecting good expansion termsbased on Google similarity distance[C]∥2010 2nd International Conference on Future Computer and Communication(ICFCC).IEEE,2010,2:V2-710-V2-714 [9] Cao G,Nie J Y,Gao J,et al.Selecting good expansion terms for pseudo-relevance feedback[C]∥Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2008:243-250 [10] Collins-Thompson K.Reducing the risk of query expansion via robust constrained optimization[C]∥Proceedings of the 18th ACM Conference on Information and Knowledge Management.ACM,2009:837-846 [11] Carpineto C,Romano G.A survey of automatic query expansion in information retrieval[J].ACM Computing Surveys(CSUR),2012,44(1):1-50 [12] Pal D,Mitra M,Datta K.Query expansion using term distribution and term association[J].arXiv preprint arXiv:1303.0667,2013 [13] Cummins R.A Standard Document Score for Information Re-trieval[C]∥Proceedings of the 2013 Conference on the Theory of Information Retrieval.ACM,2013:24 [14] 范晨熙,黄理灿,李雪利.基于Lucene的BM25模型的评分机制的研究[J].工业控制计算机,2013,26(3):78-79 Fan Chen-xi,Huang Li-can,Li Xue-li.Research on Scoring Mechanism of BM25 Model Based on Lucene[J].Industrial Control Computer,2013,26(3):78-79 [15] Rish I.An empirical study of the naive Bayes classifier[J].IJCAI 2001 workshop on empirical methods in artificial intelligence,2001,3(22):41-46 [16] Dean-Hall A,Clarke C L A,Kamps J,et al.Overview of the TREC 2012 contextual suggestion track[C]∥21st Text REtrieval Conference.Gaithersburg,Maryland,2012 |
No related articles found! |
|