计算机科学 ›› 2017, Vol. 44 ›› Issue (5): 226-231.doi: 10.11896/j.issn.1002-137X.2017.05.040
王煦中,刘琰,胡琳梅,陈静
WANG Xu-zhong, LIU Yan, HU Lin-mei and CHEN Jing
摘要: 中文在线百科包含大量有价值的信息,很多工作成功地将其用于各类知识获取任务。例如,拥有相似话题的文档可以被归为一个概念。从这些在线百科中构建出的针对某一概念的层次话题对于搜索与浏览、信息组织和检索等应用都有很大的帮助。然而,目前尚未出现对在线百科中某一概念层次话题构建的研究。针对中文在线百科的异构性与粗糙性的问题,提出了一种基于贝叶斯网络的话题层次构建方法。该方法同时综合文档的结构化目录信息和非结构化文本信息,采用最大树形图算法自动地在文档所属概念的贝叶斯话题网络中建立层次话题。实验证明,与原有的百科话题结构相比较,所提方法在保持75%的准确性的同时扩充了4倍的内容。
[1] TED P,SIDDHARTH P,JASON M.Wordnet:Similarity-mea-suring the relatedness of concepts[C]∥HLT-NAACL 2004.Association for Computational Linguistics,2004:38-41. [2] WU F,WELD D S.Automatically refining the wikipediainfobox ontology[C]∥Proceedings of the 17th International Conference on World Wide Web.ACM,2008:635-644. [3] WU F,HOFFMANN V,WELD D S.Information extractionfrom wikipedia:Movingdown the long tail[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2008:731-739. [4] LI R,BAO S H,YU Y,et al.Towards effective browsing of large scale social annotations[C]∥Proceedings of the 16th International Conference on World Wide Web.ACM,2007:943-952. [5] NASTASE V,STRUBE M.Decoding wikipedia categories forknowledge acquisition[C]∥AAAI.2008:1219-1224. [6] DMBTL G,MIJJB T.Hierarchical topic models and the nested chinese restaurant process[J].Advances in Neural Information Processing Systems,2004,16:17. [7] MIMNO D,LI W,MCCALLUM A.Mixtures of hierarchical to-pics with pachinko allocation[C]∥Proceedings of the 24th ICML.ACM,2007:633-640. [8] ZAVITSANON E,PALIOURAS G,VOUROS G A.Non-parametric estimation of topic hierarchies from texts with hierarchical dirichlet processes[J].The Journal of Machine Learning Research,2011,12:2749-2775. [9] CHUANG S L,CHIEN L F.A practical web-based approach to generating topic hierarchy for text segments[C]∥Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management.ACM,2004:127-136. [10] TANG J,LEUNGH F,LUO Q,et al.Towards ontology lear-ning from folksonomies[C]∥IJCAI.2009:2089-2094. [11] ZHU X W,MING Z Y,ZHU X Y, et al.Topic hierarchy construction for the organization of multi-source user generated contents[C]∥Proceedings of the 36th International ACMSIGIR Conference on Research and Development in Information Retrieval.ACM,2013:233-242. [12] NAVIGLI R,VELARDI P,FARALLI S.A graph-based algo-rithm for inducing lexicaltaxonomies from scratch[C]∥IJCAI.2011:1872-1877. [13] MONGE A E,ELKAN C,et al.The field matching problem:algorithms and applications[C]∥Proceedings of the 2nd ACM SIGKDD.1996:267-270. [14] CHU Y J,LIU T H.On shortest arborescence of a directedgraph[J].Scientia Sinica,1965,14(10):1396. [15] VINH N X,EPPS J,BAILEY J.Information theoretic measures for clusterings comparison:is a correction for chance necessary?[C]∥Proceedings of the 26th Annual International Conference on Machine Learning.ACM,2009:1073-1080. [16] LIU X,SONG Y,LIU S,et al.Automatictaxonomy construction from keywords[C]∥KDD.2012:1433-1441. [17] WANG C,DANILEVSKY M,DESAI N,et al.A phrase mining framework forrecursive construction of a topical hierarchy[C]∥KDD.New York,NY,USA,ACM,2013:437-445. |
No related articles found! |
|