计算机科学 ›› 2015, Vol. 42 ›› Issue (10): 275-280.
刘彤,倪维健
LIU Tong and NI Wei-jian
摘要: 各种专业领域中的文档往往具有显著的结构化特征,即一篇文档往往是由具有不同表达功能的相对固定的多个文本字段构成,同时这些字段蕴含了相关的领域知识。针对专业文档的结构化和领域化特征,设计了一种面向结构化领域文档的信息检索模型。在该模型中,首先对领域文档集进行挖掘以构建能够反映领域知识的结构化模型,之后以此为基础设计了结构化文档检索算法来为用户查询返回相关的领域文档。选择一类典型的领域文档——农技处方开展了应用研究,利用一份现实的农技处方文档数据集将提出的方法与传统的信息检索方法进行了实验对比分析,并开发了农技处方检索原型系统。
[1] Robertson S,Zaragoza H,Taylor M.Simple BM25 Extension to Multiple Weighted Fields[C]∥Proceedings of the 13th ACM CIKM.Washington DC,USA,2004:42-49 [2] Lu W,Robertson S,MacFarlane A.Field-Weighted XML Re-trieval Based on BM25[C]∥Proceedings of the 5th Workshop of INEX.Germany,2006:161-171 [3] Ogilvie P,Callan J.Hierarchical language models for XML component retrieval[C]∥Proceedings of the 4th Workshop of INEX.Germany,2005:224-237 [4] Ogilvie P,Callan J.Combining document representations forknown-item search[C]∥Proceedings of the 26th ACM SIGIR.Toronto,Canada,2003:143-150 [5] Kim J,Xue X,Croft W B.A Probabilistic Retrieval Model for Semistructured Data[C]∥Proceedings of the 31th ECIR.Toulouse,France,2009:228-239 [6] Kim J,Croft W B.A Field Relevance Model for Structured Docu-ment Retrieval[C]∥Proceedings of the 34th ECIR.Barcelona,Spain,2012:97-108 [7] Itakura K Y,Clarke C L.A framework for BM25F-based XML retrieval[C]∥Proceedings of the 33rd ACM SIGIR.Geneva,Switzerland,2010:843-844 [8] 刘德喜,万常选,刘喜平,等.基于结点权重模型的XML片段检索策略[J].计算机学报,2013,6(8):1729-1744 Liu,De-xi,Wan Chang-xuan,Liu Xi-ping,et al.A Snipet Retrieval Strategy Based on Element Weighting Model[J].Chinese Journal of Computers,2013,6(8):1729-1744 [9] Yi X,Allan J,Croft W B.Matching resumes and jobs based on relevance models[C]∥Proceedings of the 30th ACM SIGIR.Amsterdam,2007:809-810 [10] Zhao L,Callan J.Effective and Efficient Structured Retrieval[C]∥Proceedings of the 18th ACM CIKM.Hong Kong,China,2009:1573-1576 [11] Blei D M,Ng A Y,Jordan M I.Latent Dirichletallocation[J].Journal of Machine Learning Research,2003,3(4/5):993-1022 [12] Yi X,Allan J.A Comparative Study of Utilizing Topic Models for Information Retrieval[C]∥Proceedings of the 31th ECIR.Toulouse,France,2009:29-41 [13] Lavrenko V,Croft W B.Relevance-based language models[C]∥Proceedings of the 24th ACM SIGIR.New Orleans,Louisiana,USA,2001:120-127 [14] Ganguly D,Leveling J,Jones G J F.An LDA-smoothed relevance model for document expansion:a case study for spoken document retrieval[C]∥Proceedings of the 36th SIGIR.Dublin,Ireland,2013:1057-1060 [15] Bai J,Song D,Bruza P,et al.Query Expansion Using Term Relationships in Language Models for Information Retrieval[C]∥Proceedings of the 14th CIKM.Bremen,Germany,2005:688-695 [16] Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation[C]∥Proceedings of SIGMOD.Dallas,Texas,USA,2000:1-12 [17] Liang Y,Liu T,Ni W.Augmented Vector Space Model for Passage Intention Classification in Chinese Agricultural Prescription Documents[J].Journal of Computational Information Systems,2014,10(1):101-108 [18] Songa M,Song I-Y,Hu X,et al.Integration of association rules and ontologies for semantic query expansion[J].Data & Know-ledge Engineering,2007,3(1):63-75 |
No related articles found! |
|