计算机科学 ›› 2014, Vol. 41 ›› Issue (4): 200-204.

• 软件与数据库技术 • 上一篇    下一篇

基于较高质量扩展源和局部词共现模型的XML查询词扩展

钟敏娟,万常选,刘德喜,廖述梅,焦贤沛   

  1. 江西财经大学信息管理学院 南昌330013;江西财经大学信息管理学院 南昌330013;江西财经大学信息管理学院 南昌330013;江西财经大学信息管理学院 南昌330013;江西财经大学信息管理学院 南昌330013
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61173146,61262035,9,71361012),国家社会科学基金(12CTQ042),江西省教育厅科技项目(GJJ11729,GJJ12734)资助

XML Query Expansion Based on High Quality Expansion Source and Local Word Co-occurrence Model

ZHONG Min-juan,WAN Chang-xuan,LIU De-xi,LIAO Shu-mei and JIAO Xian-pei   

  • Online:2018-11-14 Published:2018-11-14

摘要: 查询词扩展要解决两个方面的问题:一是扩展词的来源,二是如何在来源集合里挑选扩展词项。对此,首先利用检索结果聚类和排序模型获取了较高质量的相关文档集合,并以此作为扩展源;然后结合XML文档的特点,通过词项间的局部共现特征进行查询扩展。相关实验结果表明,一方面,所采用的检索结果聚类和排序模型的相关文档集扩展源具有较高的用户查询相关性,相比传统的伪反馈扩展源,具有更高的质量;另一方面,提出的结合了XML结构特点的词共现查询扩展方案能获得与用户查询意图相关的扩展信息,与初始查询和无结构的词项扩展方法相比,所提方法能够更有效地提高搜索引擎检索性能。

关键词: XML查询扩展,扩展源,词共现,XML结构

Abstract: The two problems should be solved in query expansion.One is the origin of the expanded terms and the other is to select appropriate expanded terms from the expansion source.Therefore,this paper proposed query expansion method,in which the high quality relevant documents set is firstly obtained based on xml search results clustering and ranking model and it is regarded as the expansion source,and then the local word co-occurrence model combing xml documents structure features is applied to select the expanded query.The experimental data have proved two sides.On the one hand,the proposed expansion source acquisition method has obtained more relevant documents and the source has higher quality than those of traditional pseudo relevant feedback.On the other hand,compared to original query and no structure method,the selected expanded terms based on local word co-occurrence with XML structural features are more relevant to user’s query intension and lead to good performance in retrieval.

Key words: XML query expansion,Expansion source,Word co-occurrence model,XML structural feature

[1] 黄名选,严小卫,张师超.基于矩阵加权差联规则挖掘的伪相关反馈查询扩展[J].软件学报,2009,20(7):1854-1865
[2] Sakai T,Manabe T,Koyama M.Flexible Pseudo-RelevanceFeedback via Selective Sampling[J].ACM Transactions on Asian Language Information Processing,2005,4(2):111-135
[3] Kyung S L,Croft W B,James A.A Cluster-Based Resampling Method for Pseudo-Relevance Feedback[C]∥Proc.of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2008.New York:ACM Press,2008:235-242
[4] Shariq B,Andreas B.Improving Retrievability of Patents with Cluster-Based Pseudo-Relevance Feedback Document Selection[C]∥Proc.of the 18th ACM Conf.on Information and Know-ledge Management (CIKM),2009.New York:ACM Press,2009:1863-1866
[5] 叶正.基于网络挖掘与机器学习技术的相关反馈研究[D].大连:大连理工大学,2011
[6] 蒲强,何大庆,杨国纬.一种基于统计语义聚类的查询语言模型估计[J].计算机研究与发展,2011,48(2):224-231
[7] Cao G H,Nie J Y,Gao J F,et al.Selecting Good ExpansionTerms for Pseudo-Relevance-Feedback[C]∥Proc.of the ACM SIGIR Conf.Singapore,2008:243-250
[8] 黄名选,严小卫,张师超.基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J].计算机研究与发展,2009,20(7):1854-1865
[9] Schenkel R,Theobald M.Feedback-Driben Structural QueryExpansion for Ranked Retrieval of XML Data[C]∥Procee-dings of the 10th International Conference on Extending Database Technology( LNCS).Munich,Germany,2006:331-348
[10] 万常选,鲁远.基于权重查询词的XML结构查询扩展[J].软件学报,2008,19(10):2611-2619
[11] 钟敏娟.基于内容与结构语义相融合的XML检索结果聚类[J].情报学报,2012,31(5):515-525
[12] Singhal A,Choi J,Hindle D,et al.AT&T at TREC-7[C]∥Proc.of the 7th Text Retrieval Conference(TREC-7),1998.NIST Special Publication,1998:239-252
[13] 丁国栋,白硕,王斌.一种基于局部共现的查询扩展方法[J].中文信息学报,2006,20(3):84-91

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!