计算机科学 ›› 2015, Vol. 42 ›› Issue (10): 275-280.

• 人工智能 • 上一篇    下一篇

一种面向领域文档的结构化检索模型及其在农技处方检索中的应用

刘彤,倪维健   

  1. 山东科技大学信息科学与工程学院 青岛266590,山东科技大学信息科学与工程学院 青岛266590
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受山东省优秀中青年科学家科研奖励基金(BS2012DX030),中国博士后科学基金(2012M521363),全国统计科学研究计划项目(2012LY001),山东省高校科技计划(J12LN45,J14LN33),山东省博士后创新项目专项(201303072)资助

Information Retrieval Model for Domain-specific Structural Documents and its Application in Agricultural Disease Prescription Retrieval

LIU Tong and NI Wei-jian   

  • Online:2018-11-14 Published:2018-11-14

摘要: 各种专业领域中的文档往往具有显著的结构化特征,即一篇文档往往是由具有不同表达功能的相对固定的多个文本字段构成,同时这些字段蕴含了相关的领域知识。针对专业文档的结构化和领域化特征,设计了一种面向结构化领域文档的信息检索模型。在该模型中,首先对领域文档集进行挖掘以构建能够反映领域知识的结构化模型,之后以此为基础设计了结构化文档检索算法来为用户查询返回相关的领域文档。选择一类典型的领域文档——农技处方开展了应用研究,利用一份现实的农技处方文档数据集将提出的方法与传统的信息检索方法进行了实验对比分析,并开发了农技处方检索原型系统。

关键词: 信息检索,农技处方,查询扩展,结构化检索

Abstract: Different from plain text,professional documents in various domains are mostly a type of structural document which is composed of several roughly fixed textual fields and embeds rich domain knowledge.To incorporate the inhe-rent structure information and domain knowledge,we proposed a novel retrieval model for professional documents based on structural retrieval.In particular,we first derived a domain model from a given professional document collection,and then used it as a basis to design a domain-specific structural retrieval function.We applied the proposed structural retrieval model to agricultural disease prescriptions,i.e.,a representative type of professional document in agriculture,and developed a prototype search engine for agricultural disease prescription.The experimental results on a real prescription collection show advantages of the proposed model to conventional information retrieval approaches.

Key words: Information retrieval,Agricultural disease prescription,Query expansion,Structural retrieval

[1] Robertson S,Zaragoza H,Taylor M.Simple BM25 Extension to Multiple Weighted Fields[C]∥Proceedings of the 13th ACM CIKM.Washington DC,USA,2004:42-49
[2] Lu W,Robertson S,MacFarlane A.Field-Weighted XML Re-trieval Based on BM25[C]∥Proceedings of the 5th Workshop of INEX.Germany,2006:161-171
[3] Ogilvie P,Callan J.Hierarchical language models for XML component retrieval[C]∥Proceedings of the 4th Workshop of INEX.Germany,2005:224-237
[4] Ogilvie P,Callan J.Combining document representations forknown-item search[C]∥Proceedings of the 26th ACM SIGIR.Toronto,Canada,2003:143-150
[5] Kim J,Xue X,Croft W B.A Probabilistic Retrieval Model for Semistructured Data[C]∥Proceedings of the 31th ECIR.Toulouse,France,2009:228-239
[6] Kim J,Croft W B.A Field Relevance Model for Structured Docu-ment Retrieval[C]∥Proceedings of the 34th ECIR.Barcelona,Spain,2012:97-108
[7] Itakura K Y,Clarke C L.A framework for BM25F-based XML retrieval[C]∥Proceedings of the 33rd ACM SIGIR.Geneva,Switzerland,2010:843-844
[8] 刘德喜,万常选,刘喜平,等.基于结点权重模型的XML片段检索策略[J].计算机学报,2013,6(8):1729-1744 Liu,De-xi,Wan Chang-xuan,Liu Xi-ping,et al.A Snipet Retrieval Strategy Based on Element Weighting Model[J].Chinese Journal of Computers,2013,6(8):1729-1744
[9] Yi X,Allan J,Croft W B.Matching resumes and jobs based on relevance models[C]∥Proceedings of the 30th ACM SIGIR.Amsterdam,2007:809-810
[10] Zhao L,Callan J.Effective and Efficient Structured Retrieval[C]∥Proceedings of the 18th ACM CIKM.Hong Kong,China,2009:1573-1576
[11] Blei D M,Ng A Y,Jordan M I.Latent Dirichletallocation[J].Journal of Machine Learning Research,2003,3(4/5):993-1022
[12] Yi X,Allan J.A Comparative Study of Utilizing Topic Models for Information Retrieval[C]∥Proceedings of the 31th ECIR.Toulouse,France,2009:29-41
[13] Lavrenko V,Croft W B.Relevance-based language models[C]∥Proceedings of the 24th ACM SIGIR.New Orleans,Louisiana,USA,2001:120-127
[14] Ganguly D,Leveling J,Jones G J F.An LDA-smoothed relevance model for document expansion:a case study for spoken document retrieval[C]∥Proceedings of the 36th SIGIR.Dublin,Ireland,2013:1057-1060
[15] Bai J,Song D,Bruza P,et al.Query Expansion Using Term Relationships in Language Models for Information Retrieval[C]∥Proceedings of the 14th CIKM.Bremen,Germany,2005:688-695
[16] Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation[C]∥Proceedings of SIGMOD.Dallas,Texas,USA,2000:1-12
[17] Liang Y,Liu T,Ni W.Augmented Vector Space Model for Passage Intention Classification in Chinese Agricultural Prescription Documents[J].Journal of Computational Information Systems,2014,10(1):101-108
[18] Songa M,Song I-Y,Hu X,et al.Integration of association rules and ontologies for semantic query expansion[J].Data & Know-ledge Engineering,2007,3(1):63-75

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!