计算机科学 ›› 2014, Vol. 41 ›› Issue (5): 219-222.doi: 10.11896/j.issn.1002-137X.2014.05.045

• 软件与数据库技术 • 上一篇    下一篇

一种基于图结构的Web实体排序方法

徐曜,赵政文,陈群,刘海龙,杜晶,胡嘉琪,李战怀   

  1. 西北工业大学计算机学院 西安710129;西北工业大学计算机学院 西安710129;西北工业大学计算机学院 西安710129;西北工业大学计算机学院 西安710129;西北工业大学计算机学院 西安710129;西北工业大学计算机学院 西安710129;西北工业大学计算机学院 西安710129
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家973课题(2012CB316203),自然基金重点项目(61033007),国家863项目(2012AA011004),西北工业大学研究生种子基金(Z2013125,Z2013126)资助

Graph-based Web Entity Ranking Method

XU Yao,ZHAO Zheng-wen,CHEN Qun,LIU Hai-long,DU Jing,HU Jia-qi and LI Zhan-huai   

  • Online:2018-11-14 Published:2018-11-14

摘要: 现阶段,用户常常希望利用搜索引擎获得期望的实体,然而传统搜索引擎只能返回包含关键字的多个文档,并不能直接返回用户想要的答案,且现有的实体排序技术主要采用权值叠加的方法,需要很多先验知识对权值进行训练。文中从搜索引擎返回的文档中提取多个候选实体,并提出一种基于图结构的算法PERA(Probabilistic Entity Ranking Algorithm),利用随机游走的思想,在不需要知道相关先验知识的情况下,将候选实体排序。经过实验验证,各个类型的正确实体均有着较高的排序分值。

关键词: Web,实体排序,搜索引擎,图

Abstract: In recent decades,users tend to get expected entities directly.Unfortunately,traditional search engine can only return some documents related to the key words instead of the entities user expect.What’s worse,most state-of-art entity ranking methods adopt the approach of weight stack by considering some factors related to the entities,and need many priori knowledge to train the weights.This paper extracted several candidate entities from the snippets returned by search engine and exploited the ideology of “Random Walk” to raise a graph-based algorithm,PERA(Probabilistic Entity Ranking Algorithm),to rank the candidates without many priori knowledge.The results of experiments show that the target entity gets a high ranking score.

Key words: Web,Entity ranking,Search engine,Graph

[1] 黄云,洪佳明,颜一鸣.基于图的特征词权重算法及其在文档排序中的应用[J].计算机系统应用,2012(6):216-218
[2] 毕鹏.Web信息检索结果个性化排序模型[J].计算机科学,2004,31(B09):35-37
[3] 王扬,黄亚楼,谢茂强.多查询相关的排序支持向量机融合算法[J].计算机研究与发展,2011,48(4):558-566
[4] Li Xian,Meng Wei-yi,Yu C.T-verifier:Verifying truthfulnessof fact statements[C]∥ 27th International Conference on Data Engineering(ICDE) IEEE.IEEE,2011
[5] Li Zhi-xu,et al.WebPut:efficientWeb-based data imputation[C]∥Web Information Systems Engineering-WISE 2012.Berlin Heidelberg:Springer,2012:243-256
[6] Kahng,Minsuk,Lee S,et al.Ranking objects by following paths in entity-relationship graphs[C]∥Proceedings of the 4th workshop on Workshop for Ph.D.students in information & know-ledge management.ACM,2011
[7] Lovász,László.Random walks on graphs:A survey[M]∥Combinatorics,Paul erdos is eighty(volume 2).Janor Bolyai Mathematical Society,1993:1-46
[8] Sergey B,Page L.The anatomy of a large-scale hypertextualWeb search engine[J].Computer Networks and ISDN Systems,1998,30(1):107-117
[9] Kleinberg Jon M.Authoritative sources in a hyperlinked environment[J].Journal of the ACM (JACM),1999,46(5):604-632
[10] Goldberg David E.Genetic algorithms in search,optimization,and machine learning[M].Addision-Wesley Professional,1989
[11] 米切尔,曾华军.机器学习[M].张银奎,译.北京:机械工业出版社,2003
[12] 周明,运筹学,孙树栋.遗传算法原理及应用[M].北京:国防工业出版社,1999
[13] NER.http://nlp.stanford.edu/software/CRF-NER.shtml
[14] OpenNLP.http://opennlp.sourceforge.net/
[15] http://www.grouplens.org/node/74
[16] http://cs.brown.edu/~pavlo/fortune1000/
[17] Miller G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!