基于检索结果聚类的XML伪相关文档查找

计算机科学 ›› 2013, Vol. 40 ›› Issue (10): 172-177.

基于检索结果聚类的XML伪相关文档查找

钟敏娟,万常选,刘德喜,廖述梅

江西财经大学信息管理学院南昌330013 江西财经大学数据与知识工程江西省高校重点实验室南昌330013;江西财经大学信息管理学院南昌330013 江西财经大学数据与知识工程江西省高校重点实验室南昌330013;江西财经大学信息管理学院南昌330013 江西财经大学数据与知识工程江西省高校重点实验室南昌330013;江西财经大学信息管理学院南昌330013 江西财经大学数据与知识工程江西省高校重点实验室南昌330013

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家自然基金项目(61173146,61262035,1),国家社会科学基金(12CTQ042)资助

Finding XML Pseudo-relevance Document Based on Search Results Clustering

ZHONG Min-juan,WAN Chang-xuan,LIU De-xi and LIAO Shu-mei

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 传统伪相关反馈容易产生“查询主题漂移”,有效避免“查询主题漂移”的首要前提是确定高质量的相关文档,形成与用户查询需求相关的伪相关文档集合。在检索结果聚类的基础上,研究了XML伪相关文档查找方法,在充分考虑XML内容和结构特征的前提下,提出了基于均衡化权值的簇标签提取方法,并以此为基础,提出了候选簇的排序模型和基于候选簇的文档排序模型。相关实验数据表明,与初始检索结果相比,排序模型获得了较好的性能,有效地查找到了更多的XML伪相关文档。

Abstract: Recently study shows that traditional pseudo-relevance feedback may bring topic drift．Therefore,to avoid topic drift effectively,it is essential to identify relevant documents and to form the pseudo relevant documents to user’s query．In this paper,based on clustering XML search results,a method was proposed to find good feedback documents．Firstly,a cluster-label extraction method based on equalizing weights was introduced,by fully considering the content and structure features in XML documents．Secondly,a two-stage ranking strategy was presented,as the candidate cluster ranking model and document ranking model．Finally,experimental data shows that compared to original retrieving method, the ranking models obtain better performance and find more relevant XML documents.

Key words: Information retrieval,XML pseudo-relevance feedback,XML search results clustering,Cluster label,Ran-king model

钟敏娟,万常选,刘德喜,廖述梅. 基于检索结果聚类的XML伪相关文档查找[J]. 计算机科学, 2013, 40(10): 172-177. https://doi.org/

ZHONG Min-juan,WAN Chang-xuan,LIU De-xi and LIAO Shu-mei. Finding XML Pseudo-relevance Document Based on Search Results Clustering[J]. Computer Science, 2013, 40(10): 172-177. https://doi.org/

参考文献

[1] Qiang H,Dawei S,Stefan R．Robust Query-Specific PseudoFeedback Document Selection for Query Expansion[A]∥Proc．of the 30th European Conf．on Information Retrieval(ECIR),2008[C]．Heidelberg:Springer-Verlag,2008:547-554
[2] Ben H,Ladh O．Finding Good Feedback Documents[A]∥Proc．of the 18th ACM Conf．on Information and Knowledge Management(CIKM),2009[C]．New York:ACM Press,2009:2011-2014
[3] Karthik R,Raghavendra U,Pushpak B,et al．On ImprovingPseudo-Relevance Feedback Using Pseudo-Irrelevant Documents[A]∥Proc．of the 32nd European Conf．on Information Retrie-val(ECIR),2010[C]．Heidelberg:Springer-Verlag,2010:573-576
[4] Lv Yuan-hua,Zhai Cheng-xiang,Chen Wan．A Boosting Ap-proach to Improving Pseudo-Relevance Feedback[A]∥Proc．of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval,2011[C]．New York:ACM Press,2011:165-174
[5] Sakai T,Manabe T,Koyama M．Flexible Pseudo-RelevanceFeedback via Selective Sampling[J].ACM Transactions on AsianLanguage Information Processing,2005,4(2):111-135
[6] Kyung S L,Croft W B,James A．A Cluster-Based ResamplingMethod for Pseudo-Relevance Feedback[A]∥Proc．of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2008[C]．New York:ACM Press,2008:235-242
[7] Shariq B,Andreas B．Improving Retrievability of Patents withCluster-Based Pseudo-Relevance Feedback Document Selection[A]∥Proc．of the 18th ACM Conf．on Information and Know-ledge Management(CIKM),2009[C]．New York:ACM Press,2009:1863-1866
[8] Kevyn C T,Jamie C．Estimation and Use of Uncertainty inPseudo-Relevance Feedback[A]∥Proc．of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2007[C]．New York:ACM Press,2007:303-310
[9] 叶正．基于网络挖掘与机器学习技术的相关反馈研究[D]．大连:大连理工大学,2011
[10] 蒲强,何大庆,杨国纬．一种基于统计语义聚类的查询语言模型估计[J]．计算机研究与发展,2011,48(2):224-231
[11] Gong Bi-hong,Peng Bo,Li Xiao-ming．A personalized re-ranking algorithm based on relevance feedback[A]∥1st International workshop on Database Management and Applications over Networks,DBMAN,2007[C].2007:4537:255-263
[12] 钟敏娟．基于内容与结构语义相融合的XML检索结果聚类[J]．情报学报,2012,31(5):515-525
[13] Singhal A,Choi J,Hindle D,et al．AT&T at TREC-7[A]∥Proc．of the 7th Text Retrieval Confernece(TREC-7),1998[C]．NIST Special Publication,1998:239-252

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed