Computer Science ›› 2015, Vol. 42 ›› Issue (6): 18-22.doi: 10.11896/j.issn.1002-137X.2015.06.004

Previous Articles     Next Articles

Query Expansion Based on Classification Model

LI Wei-yin, SHI Yu-long, CHEN Jie and SHI Chong-yang   

  • Online:2018-11-14 Published:2018-11-14

Abstract: As a key component of query optimization,query expansion plays an important role in improving the perfor-mance of information retrieval systems.Traditional query expansion methods on pseudo-relevance feedback improve the performance of retrieval to some extent.However,the selected expansion terms will also include some irrelevant ones,which leads to adverse effect.In this paper,a novel query expansion method based on classification model was proposed.Combining with statistical information and various features of the candidate expansion terms,this method employs Naive Bayes classification model to reselect the candidate expansion terms so as to further filter the irrelevant ones.Experimental results on TREC 2013 datasets show that the proposed query expansion method can efficiently improve the precision and recall of user queries.

Key words: Query expansion,Classification model,Information retrieval,Pseudo relevance feedback

[1] Jansen B J,Spink A,Saracevic T.Real life,real users,and real needs:a study and analysis of user queries on the web[J].Information Processing & Management,2000,36(2):207-227
[2] Ogilvie P,Voorhees E,Callan J.On the number of terms used in automatic query expansion[J].Information Retrieval,2009,12(6):666-679
[3] 余慧佳,刘奕群,张敏,等.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114 Yu Jia-hui,Liu Yi-qun,Zhang Min,et al.Research in Search Engine User Behavior Based on Log Analysis[J].Journal of Information Processing,2007,21(1):109-114
[4] Imran H,Sharan A.A framework for automatic query expansion[M]∥Web Information Systems and Mining.Springer Berlin Heidelberg,2010:386-393
[5] Carpineto C,De Mori R,Romano G,et al.An information-theoretic approach to automatic query expansion[J].ACM Transactions on Information Systems(TOIS),2001,19(1):1-27
[6] Xu J,Croft W B.Improving the effectiveness of information retrieval with local context analysis[J].ACM Transactions on Information Systems(TOIS),2000,18(1):79-112
[7] Pal D,Mitra M,Datta K.Query expansion using term distribution and term association[J].arXiv preprint arXiv:1303.0667,2013
[8] Luo J,Meng B,Tu X,et al.Selecting good expansion termsbased on Google similarity distance[C]∥2010 2nd International Conference on Future Computer and Communication(ICFCC).IEEE,2010,2:V2-710-V2-714
[9] Cao G,Nie J Y,Gao J,et al.Selecting good expansion terms for pseudo-relevance feedback[C]∥Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2008:243-250
[10] Collins-Thompson K.Reducing the risk of query expansion via robust constrained optimization[C]∥Proceedings of the 18th ACM Conference on Information and Knowledge Management.ACM,2009:837-846
[11] Carpineto C,Romano G.A survey of automatic query expansion in information retrieval[J].ACM Computing Surveys(CSUR),2012,44(1):1-50
[12] Pal D,Mitra M,Datta K.Query expansion using term distribution and term association[J].arXiv preprint arXiv:1303.0667,2013
[13] Cummins R.A Standard Document Score for Information Re-trieval[C]∥Proceedings of the 2013 Conference on the Theory of Information Retrieval.ACM,2013:24
[14] 范晨熙,黄理灿,李雪利.基于Lucene的BM25模型的评分机制的研究[J].工业控制计算机,2013,26(3):78-79 Fan Chen-xi,Huang Li-can,Li Xue-li.Research on Scoring Mechanism of BM25 Model Based on Lucene[J].Industrial Control Computer,2013,26(3):78-79
[15] Rish I.An empirical study of the naive Bayes classifier[J].IJCAI 2001 workshop on empirical methods in artificial intelligence,2001,3(22):41-46
[16] Dean-Hall A,Clarke C L A,Kamps J,et al.Overview of the TREC 2012 contextual suggestion track[C]∥21st Text REtrieval Conference.Gaithersburg,Maryland,2012

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .