Computer Science ›› 2015, Vol. 42 ›› Issue (7): 265-269.doi: 10.11896/j.issn.1002-137X.2015.07.057

Previous Articles     Next Articles

Positional Language Model-based Chinese IR System

CHEN Ya-lan, HU Xiao-hua, TU Xin-hui and HE Ting-ting   

  • Online:2018-11-14 Published:2018-11-14

Abstract: In most existing retrieval models,the facts are often overlooked that the proximity of matched query terms in a document and passage retrieval used to score can also be exploited to promote scoring for documents.Inspired by this,a Chinese information retrieval system based on the positional language model was proposed.Firstly,we defined the concept of propagated count to establish a positional language model for each position.Then through combing KL-divergence retrieval model and positional language model,we scored for each individual position.Finally,we scored the document by the multi-parameter strategy.The experiment also focuses on comparing the retrieval effect of the two Chinese indexing approaches named multi character-based and dictionary-based on positional language models.Experiments on standard NTCIR5,NTCIR6 test sets show that the performance of the two indexing approaches of IR system improves greatly and it performs better than the vector space model,okapi bm25 model and classical language model.

Key words: Positional language model,Proximity,Passage retrieval,Propagated count

[1] Ponte J,Croft W B.A Language Modeling Approach to Information Retrieval[C]∥Proceedings of the 1998 ACM SIGIR Conference on Research and Development in Information Retrieval.Melbourne,1998:275-281
[2] Lv Yuan-hua,Zhai Cheng-xiang.A comparative study of methodsfor estimating query language models with pseudo feedback[C]∥Proceedings of 2009 CIKM Conference on Information and Knowledge Management.HongKong,2009:1895-1898
[3] Diaz F,Metzler D.Improving the estimation of relevance models using large external corpora[C]∥Proceedings of the 2006 ACM SIGIR Conference on Research and Development in Information Retrieval.Washington,2006:154-161
[4] Liu Xiao-yong,Croft W B.Cluster-based retrieval using lan-guage models[C]∥Proceedings of the 2004 ACM SIGIR Conference on Research and Development in Information Retrieval.Sheffield,2004:186-193
[5] Lv Yuan-hua,Zhai Cheng-xiang.Positional language models for information retrieval[C]∥Proceedings of the 2009 ACM SIGIR Conference on Research and Development in Information Retrieval.Boston,2009:299-306
[6] 余伟,王明文,万剑怡,等.结合语义的位置语言模型[J].北京大学学报(自然科学版),2013,49(2):203-212 Yu Wei,Wang Ming-wen,Wan Jian-yi,et al.Positional language models with semantic information[J].Journal of Peking University(Natural Science Edition),2013,49(2):203-212
[7] Miao Jun,Huang Xiang-ji,Ye Zheng.Proximity-based rocchio’s model for pseudo relevance[C]∥Proceedings of the 2012 ACM SIGIR Conference on Research and Development in Information Retrieval.Portland,2012:535-544
[8] Lv Yuan-hua,Zhai Cheng-xiang.Positional relevance model for pseudo-relevance feedback[C]∥Proceedings of the 2010 ACM SIGIR Conference on Research and Development in Information Retrieval.Geneva,2010:579-586
[9] Kwok K L.Comparing representations in Chinese informationretrieval[C]∥Proceedings of the 1997 ACM SIGIR Conference on Research and Development in Information Retrieval.1997:34-41
[10] Lam W,Wong C Y,Wong K F.Performance evaluation of chara-cter,word and n-gram-based indexing for Chinese text retrieval[C]∥Proceedings of the Information Retrieval with Asian Languages 97 Conference.1997:68-80
[11] Nie J Y,Ren F.Chinese information retrieval:using characters or words[J].Information Processing and Management,1997,35(4):443-462
[12] Zhai Cheng-xiang,Lafferty J D.A study of smoothing methods for language models applied to ad hoc information retrieval[C]∥Proceedings of the 2001 ACM SIGIR Conference on Research and Development in Information Retrieval.New Orleans,2001:334-342
[13] Zhao Jia-shu,Huang Xiang-ji,He Ben.CRTER:using cross termsto enhance probabilistic information retrieval[C]∥Proceedings of the 2011 ACM SIGIR Conference on Research and Development in Information Retrieval.Beijing,2011:155-164
[14] Kise K,Junker M,Dengel A,et al.Passage Retrieval Based on Density Distributions of Terms and Its Applications to Document Retrieval and Question Answering[M].Reading and Learning:Adaptive Content Recognition.2004:306-327
[15] Petkova D,Croft W B.Proximity-based document representation for named entity retrieval[C]∥Proceedings of the 2007 CIKM Conference on Information and Knowledge Management.Lisboa,2007:731-740
[16] Kaszkiel M,Zobel J,Sacks-Davis R.Efficient passage ranking for document databases[J].ACM Transactions on Information Systems,1999,17(4):406-439
[17] Salton G,Wong A,Yang C S.A vector space model for automaticindexing[J].Communications of the ACM,1975,18(11):613-620
[18] Salton G,Fox E A,Wu H.Extended Boolean information retrieval[J].Communications of the ACM,1983,26(11):1022-1036
[19] Maron M E,Kuhns J L.On relevance,probabilistic indexing and information retrieval[J].Journal of the ACM(JACM),1960,7(3):216-244
[20] Berger A,Lafferty J.Information retrieval as statistical translation[C]∥Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval.Berkley,1999:222-229

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!