基于分类模型的查询扩展方法

doi:10.11896/j.issn.1002-137X.2015.06.004

计算机科学 ›› 2015, Vol. 42 ›› Issue (6): 18-22.doi: 10.11896/j.issn.1002-137X.2015.06.004

基于分类模型的查询扩展方法

李维银,石玉龙,陈杰,施重阳

北京理工大学计算机学院北京100081,北京理工大学计算机学院北京100081,北京理工大学计算机学院北京100081,北京理工大学计算机学院北京100081

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受中国科学院自动化研究所复杂系统管理与控制国家重点实验室开放课题(99S9021F4D),国家自然科学基金(61472034),教育部新世纪优秀人才支持计划(NCET-13-0041),北京理工大学基础研究基金资助

Query Expansion Based on Classification Model

LI Wei-yin, SHI Yu-long, CHEN Jie and SHI Chong-yang

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 查询扩展作为查询优化的重要组成部分,对改善信息检索系统的性能起到了至关重要的作用。传统的伪相关反馈查询扩展方法虽然在一定程度上提高了检索性能,但选择的扩展词中会包含一部分与原查询不相关的词语,这对检索性能的提升产生了不利影响。提出了一种基于分类模型的查询扩展方法,该算法综合候选扩展词的统计信息和多种特征,采用朴素贝叶斯分类模型对初次得到的候选扩展词进行再次分类选择,进一步去除与查询词相关性小的扩展词。在TREC 2013数据集上的实验结果表明,提出的查询扩展方法能够有效提高用户查询的查准率和查全率。

关键词: 查询扩展,分类模型,信息检索,伪相关反馈

Abstract: As a key component of query optimization,query expansion plays an important role in improving the perfor-mance of information retrieval systems.Traditional query expansion methods on pseudo-relevance feedback improve the performance of retrieval to some extent.However,the selected expansion terms will also include some irrelevant ones,which leads to adverse effect.In this paper,a novel query expansion method based on classification model was proposed.Combining with statistical information and various features of the candidate expansion terms,this method employs Naive Bayes classification model to reselect the candidate expansion terms so as to further filter the irrelevant ones.Experimental results on TREC 2013 datasets show that the proposed query expansion method can efficiently improve the precision and recall of user queries.

Key words: Query expansion,Classification model,Information retrieval,Pseudo relevance feedback

李维银,石玉龙,陈杰,施重阳. 基于分类模型的查询扩展方法[J]. 计算机科学, 2015, 42(6): 18-22. https://doi.org/10.11896/j.issn.1002-137X.2015.06.004

LI Wei-yin, SHI Yu-long, CHEN Jie and SHI Chong-yang. Query Expansion Based on Classification Model[J]. Computer Science, 2015, 42(6): 18-22. https://doi.org/10.11896/j.issn.1002-137X.2015.06.004

参考文献

[1] Jansen B J,Spink A,Saracevic T.Real life,real users,and real needs:a study and analysis of user queries on the web[J].Information Processing & Management,2000,36(2):207-227
[2] Ogilvie P,Voorhees E,Callan J.On the number of terms used in automatic query expansion[J].Information Retrieval,2009,12(6):666-679
[3] 余慧佳,刘奕群,张敏,等.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114 Yu Jia-hui,Liu Yi-qun,Zhang Min,et al.Research in Search Engine User Behavior Based on Log Analysis[J].Journal of Information Processing,2007,21(1):109-114
[4] Imran H,Sharan A.A framework for automatic query expansion[M]∥Web Information Systems and Mining.Springer Berlin Heidelberg,2010:386-393
[5] Carpineto C,De Mori R,Romano G,et al.An information-theoretic approach to automatic query expansion[J].ACM Transactions on Information Systems(TOIS),2001,19(1):1-27
[6] Xu J,Croft W B.Improving the effectiveness of information retrieval with local context analysis[J].ACM Transactions on Information Systems(TOIS),2000,18(1):79-112
[7] Pal D,Mitra M,Datta K.Query expansion using term distribution and term association[J].arXiv preprint arXiv:1303.0667,2013
[8] Luo J,Meng B,Tu X,et al.Selecting good expansion termsbased on Google similarity distance[C]∥2010 2nd International Conference on Future Computer and Communication(ICFCC).IEEE,2010,2:V2-710-V2-714
[9] Cao G,Nie J Y,Gao J,et al.Selecting good expansion terms for pseudo-relevance feedback[C]∥Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2008:243-250
[10] Collins-Thompson K.Reducing the risk of query expansion via robust constrained optimization[C]∥Proceedings of the 18th ACM Conference on Information and Knowledge Management.ACM,2009:837-846
[11] Carpineto C,Romano G.A survey of automatic query expansion in information retrieval[J].ACM Computing Surveys(CSUR),2012,44(1):1-50
[12] Pal D,Mitra M,Datta K.Query expansion using term distribution and term association[J].arXiv preprint arXiv:1303.0667,2013
[13] Cummins R.A Standard Document Score for Information Re-trieval[C]∥Proceedings of the 2013 Conference on the Theory of Information Retrieval.ACM,2013:24
[14] 范晨熙,黄理灿,李雪利.基于Lucene的BM25模型的评分机制的研究[J].工业控制计算机,2013,26(3):78-79 Fan Chen-xi,Huang Li-can,Li Xue-li.Research on Scoring Mechanism of BM25 Model Based on Lucene[J].Industrial Control Computer,2013,26(3):78-79
[15] Rish I.An empirical study of the naive Bayes classifier[J].IJCAI 2001 workshop on empirical methods in artificial intelligence,2001,3(22):41-46
[16] Dean-Hall A,Clarke C L A,Kamps J,et al.Overview of the TREC 2012 contextual suggestion track[C]∥21st Text REtrieval Conference.Gaithersburg,Maryland,2012

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于分类模型的查询扩展方法

Query Expansion Based on Classification Model

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0