计算机科学 ›› 2013, Vol. 40 ›› Issue (11): 242-247.

• 人工智能 • 上一篇    下一篇

基于分类的term重要性识别方法

邱云飞,鲍莉,邵良杉   

  1. 辽宁工程技术大学软件学院 葫芦岛125100;辽宁工程技术大学软件学院 葫芦岛125100;辽宁工程技术大学系统工程研究所 阜新123000
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金(70971059),辽宁省创新团队项目(2009T045)资助

Term Importance Identification Method Based on Classification

QIU Yun-fei,BAO Li and SHAO Liang-shan   

  • Online:2018-11-16 Published:2018-11-16

摘要: 在传统的搜索引擎和信息检索中,用户Query中的term-weight通常是以一种上下文无关的方式得到的。现有的大多数信息检索技术都使用词袋方法,例如布尔模型、向量空间模型和概率模型等,这些方法均没有考虑Query中term之间的相关性。为了能够充分利用Query中的信息来提高term-weight的准确度,提出了一种有监督的机器学习方法来学习用户Query中的term-weight。该方法基于分类的方法,并引入了句法分析作为分类的一项重要的特征来训练模型。考虑用户Query中term之间的关系后,既避免了由Query到单个term的信息丢失,又增加了短文本的特征,同时使分类器实现软输出,能够给term的重要程度一个更为准确的量化值。

关键词: 分类,依存句法分析,查询词权重,查询分析,term重要性,搜索引擎,信息检索

Abstract: In the field of traditional search engines and information retrieval,term weights for the input query are typically derived in a context independent fashion.Most information retrieval techniques employ bag-of-words approaches like Boolean models,vector-space models and other probabilistic ranking approaches to obtain term-weight of a term in a query.However,all these algorithms treat terms independently,and do not take the relationship among the terms.This paper employed supervised machine learning based on classification and syntactic parsing to derive a context-sensitive and query-dependent term weight for each word in a search query.By taking the result of syntactic parsing as a major feature of the classification,it is now able to avoid the information loss and increase the features of the short text.Meanwhile the classifier could achieve soft output,in order to give a more accurate quantized value to term importance.

Key words: Classification,Dependency parsing,Term-weight,Query analysis,Term importance,Search engine,Information retrieval

[1] 第30次中国互联网发展状况统计报告[R].中国互联网络信息中心(CNNIC),2012
[2] Guo Jia-feng,Xu Gu,Chen Xue-qi,et al.Named entity recognition in query[C]∥Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval.Boston,MA,USA:ACM,2009:267-274
[3] Fonseca B M,Golgher P,Possas B,et al.Concept-based interactive query expansion[C]∥Proceedings of the 14th ACM international conference on information and knowledge management.New York,NY,USA:ACM,2005:696-703
[4] Cao G,Nie J Y,Gao J,et al.Selecting good expansion terms for pseudo-relevance feedback[C]∥Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval.New York,NY,USA:ACM,2008:243-250
[5] Gao J,Nie J Y,Xun E,et al.Improving query translation for cross-language information retrieval using statistical models[C]∥Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval.New York,NY,USA:ACM,2001:96-104
[6] Callan J P,Croft W B,Broglio J.Trec and tipster experiments with inquery [C]∥Information Processing and Management:an International Journal-Special issue:the second text retrieval conference(TREC-2).1995:327-343
[7] Allan J,Callan J,Croft W B,et al.Inquery at trec-5[C]∥TREC.1997:119-132
[8] Bendersky M,Croft W B.Discovering key concepts in verbosequeries[C]∥Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.New York,NY,USA:ACM,2008:491-498
[9] Kumaran G,Allan J.Effective and efficient user interaction for long queries[C]∥Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.New York,NY,USA:ACM,2008:11-18
[10] Kumaran G,Carvalho V R.Reducing long queries using queryquality predictors[C]∥Proceedings of the 32nd annual international ACM SIGIR conference on Research and development in information retrieval.New York,NY,USA:ACM,2009:564-571
[11] Lease M,Allan J,Croft W B.Regression Rank:Learning toMeet the Opportunity of Descriptive Queries[C]∥Proceedings of the 31st European Conference on IR Research on Advances in Information Retrieval.Toulouse,France,2009:99-101
[12] Nivre J,Hall J,Nilsson J.MaltParser:A data-driven parser-ge-nerator for dependency parsing [C]∥Proc.of LREC.2006
[13] 李珏伶.搜索引擎网页相关性评估方法设计及其在rank模型上的应用[D].北京:北京交通大学,2011

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!