计算机科学 ›› 2017, Vol. 44 ›› Issue (5): 226-231.doi: 10.11896/j.issn.1002-137X.2017.05.040

• 人工智能 • 上一篇    下一篇

基于异构中文在线百科的层次话题构建

王煦中,刘琰,胡琳梅,陈静   

  1. 数学工程与先进计算国家重点实验室 郑州450002,数学工程与先进计算国家重点实验室 郑州450002,清华大学计算机科学与技术系 北京100084,数学工程与先进计算国家重点实验室 郑州450002
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61309007),国家“八六三”高技术研究发展计划基金项目(2006AA01Z409)资助

Building Hierarchical Topic Based on Heterogeneous Chinese Online Encyclopedia

WANG Xu-zhong, LIU Yan, HU Lin-mei and CHEN Jing   

  • Online:2018-11-13 Published:2018-11-13

摘要: 中文在线百科包含大量有价值的信息,很多工作成功地将其用于各类知识获取任务。例如,拥有相似话题的文档可以被归为一个概念。从这些在线百科中构建出的针对某一概念的层次话题对于搜索与浏览、信息组织和检索等应用都有很大的帮助。然而,目前尚未出现对在线百科中某一概念层次话题构建的研究。针对中文在线百科的异构性与粗糙性的问题,提出了一种基于贝叶斯网络的话题层次构建方法。该方法同时综合文档的结构化目录信息和非结构化文本信息,采用最大树形图算法自动地在文档所属概念的贝叶斯话题网络中建立层次话题。实验证明,与原有的百科话题结构相比较,所提方法在保持75%的准确性的同时扩充了4倍的内容。

关键词: 中文在线百科,层次话题,结构化目录信息,非结构化文本信息

Abstract: Chinese online encyclopedia carries a huge amount of high quality information.Previous studies have utilized it for different knowledge acquisition tasks.For instance,the articles with similar subjects are grouped together into ca-tegories.Constructing a certain category topical hierarchy from the online encyclopedia is significantly beneficial for many applications such as search and browsing,information organizing and information retrieval.However,no attempts have been made to explore topic hierarchy of given category in online encyclopedia.Considering most of the online encyclopedia is heterogeneous and rough,this paper proposed a novel scheme of constructing topic hierarchy based on the Bayesian network.This scheme will incorporate both the structured contents table and unstructured text descriptions in the articles of the same category into automatic topic hierarchy learning for the online encyclopedia category using the algorithm of maximum spanning tree on the Bayesian topic network.Experimental results show that,compared with the existed encyclopedia topical hierarchy,our approach expand the content of 4 times while maintaining the accuracy of 75%.

Key words: Chinese online encyclopedia,Topic hierarchy,Structured contents table,Unstructured text description

[1] TED P,SIDDHARTH P,JASON M.Wordnet:Similarity-mea-suring the relatedness of concepts[C]∥HLT-NAACL 2004.Association for Computational Linguistics,2004:38-41.
[2] WU F,WELD D S.Automatically refining the wikipediainfobox ontology[C]∥Proceedings of the 17th International Conference on World Wide Web.ACM,2008:635-644.
[3] WU F,HOFFMANN V,WELD D S.Information extractionfrom wikipedia:Movingdown the long tail[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2008:731-739.
[4] LI R,BAO S H,YU Y,et al.Towards effective browsing of large scale social annotations[C]∥Proceedings of the 16th International Conference on World Wide Web.ACM,2007:943-952.
[5] NASTASE V,STRUBE M.Decoding wikipedia categories forknowledge acquisition[C]∥AAAI.2008:1219-1224.
[6] DMBTL G,MIJJB T.Hierarchical topic models and the nested chinese restaurant process[J].Advances in Neural Information Processing Systems,2004,16:17.
[7] MIMNO D,LI W,MCCALLUM A.Mixtures of hierarchical to-pics with pachinko allocation[C]∥Proceedings of the 24th ICML.ACM,2007:633-640.
[8] ZAVITSANON E,PALIOURAS G,VOUROS G A.Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes[J].The Journal of Machine Learning Research,2011,12:2749-2775.
[9] CHUANG S L,CHIEN L F.A practical web-based approach to generating topic hierarchy for text segments[C]∥Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management.ACM,2004:127-136.
[10] TANG J,LEUNGH F,LUO Q,et al.Towards ontology lear-ning from folksonomies[C]∥IJCAI.2009:2089-2094.
[11] ZHU X W,MING Z Y,ZHU X Y, et al.Topic hierarchy construction for the organization of multi-source user generated contents[C]∥Proceedings of the 36th International ACMSIGIR Conference on Research and Development in Information Retrieval.ACM,2013:233-242.
[12] NAVIGLI R,VELARDI P,FARALLI S.A graph-based algo-rithm for inducing lexicaltaxonomies from scratch[C]∥IJCAI.2011:1872-1877.
[13] MONGE A E,ELKAN C,et al.The field matching problem:algorithms and applications[C]∥Proceedings of the 2nd ACM SIGKDD.1996:267-270.
[14] CHU Y J,LIU T H.On shortest arborescence of a directedgraph[J].Scientia Sinica,1965,14(10):1396.
[15] VINH N X,EPPS J,BAILEY J.Information theoretic measures for clusterings comparison:is a correction for chance necessary?[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:1073-1080.
[16] LIU X,SONG Y,LIU S,et al.Automatictaxonomy construction from keywords[C]∥KDD.2012:1433-1441.
[17] WANG C,DANILEVSKY M,DESAI N,et al.A phrase mining framework forrecursive construction of a topical hierarchy[C]∥KDD.New York,NY,USA,ACM,2013:437-445.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!