计算机科学 ›› 2016, Vol. 43 ›› Issue (7): 224-229.doi: 10.11896/j.issn.1002-137X.2016.07.040

• 人工智能 • 上一篇    下一篇

基于用户自描述标签的层次分类体系构建方法

刘苏祺,白光伟,沈航   

  1. 南京工业大学计算机科学与技术学院 南京211816,南京工业大学计算机科学与技术学院 南京211816;南京理工大学高维信息智能感知与系统教育部重点实验室 南京210094,南京工业大学计算机科学与技术学院 南京211816
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(60673185,7),江苏省自然科学基金(BK2010548),江苏省科技支撑计划(工业)(BE2011186),南京邮电大学宽带无线通信与传感网技术教育部重点实验室开放研究基金资助

Taxonomy Construction Based on User Self-describing Tags

LIU Su-qi, BAI Guang-wei and SHEN Hang   

  • Online:2018-12-01 Published:2018-12-01

摘要: 模式层知识对于语义万维网的发展非常重要,然而当前开放链接数据(LOD)中模式层知识的数量十分有限,为突破这一局限,提出一种基于社交网络中用户自描述标签的层次分类体系构建方法。该方法首先设计基于搜索引擎的标签分块算法,将描述相同话题的标签划分到同一标签块中,然后采用基于半监督学习的标签传播算法挖掘相同标签块中标签间的上下位关系,最后运用基于启发式规则的贪心算法来构建层次分类体系,从而在社交站点中构建出大规模且高质量的层次分类体系。实验结果表明,该构建方法与现有相关工作相比在准确率、召回率以及F值上均有明显提高。

关键词: 模式层知识,用户自描述标签,层次分类体系,标签传播

Abstract: Knowledge on schema level is vital for the development of semantic Web.However,the number of schema knowledge is limited in current linking open data (LOD).To optimize the issue,this paper proposed an approach for constructing a taxonomy using user self-describing tags in social network.This approach first designs a tag blocking algorithm based on search engine to partition tags into the same block,which describes the same topic.Then,it uses a label propagation algorithm based on the semi-supervised learning to detect hypernym relation between tags in the same block.Finally,it applies a greedy algorithm based on heuristic rules to construct a taxonomy.A large scale and high-quality taxonomy can be constructed after applying the proposed approach in social Web sites.The experimental results show that,compared with the existing related work,the proposed approach performs better in terms of precision,recall and F-score.

Key words: Knowledge on schema level,User self-describing tags,Taxonomy,Label propagation

[1] Linking Open Data.[2014-10-11].http://linkeddata.org
[2] Auer S,Bizer C,Kobilarov G,et al.DBpedia:A nucleus for aWeb of open data[C]∥Proceedings of the 6th International Semantic Web Conference.2007:722-735
[3] Suchanek F M,Kasneci G,Weikum G.Yago:A large ontology from wikipedia and wordnet[J].Journal of Web Semantics:Science,Services and Agents on the World Wide Web,2008,6(3):203-217
[4] Bollacker K,Evans C,Paritosh P,et al.Freebase:a collabora-tively created graph database for structuring human knowledge[C]∥Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.2008:1247-1250
[5] Bizer C,Lehmann J,Kobilarov G,et al.Dbpedia-a crystallization point for the Web of data[J].Journal of Web Semantics:Science,Services and Agents on the World Wide Web,2009,7(3):154-165
[6] Tang J,Leung H,Luo Q,et al.Towards ontology learning form folksonomies[C]∥Proceedings of the 21st International Joint Conference on Artificial Intelligence.2009:2089-2094
[7] Liu Kai-peng,Fang Bin-xing.Ontology Induction Based on So-cial Annotations[J].Chinese Journal of Computers,2010,33(10):1823-1834(in Chinese) 刘凯鹏,方滨兴.基于社会性标注的本体学习方法[J].计算机学报,2010,33(10):1823-1834(下转第239页)(上接第229页)
[8] Zhou M,Bao S,Wu X,et al.An unsupervised model for exploring hierarchical semantics from social annotations[C]∥Proceedings of the 6th International Semantic Web Conference.2007:680-693
[9] Wu W,Li H,Wang H,et al.Probase:A probabilistic taxonomy for text understanding[C]∥Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.2012:481-492
[10] Hearst M A.Automatic acquisition of hyponyms from large text corpora[C]∥Proceedings of the 14th Conference on Computational Linguistics.1992:539-545
[11] Ponzetto S P,Strube M.WikiTaxonomy:A Large Scale Know-ledge Resource[C]∥Proceedings of the 18th European Confe-rence on Artificial Intelligence.2008,178:751-752
[12] Wu F,Weld D S.Automatically refining the wikipedia infoboxontology[C]∥Proceedings of the 17th International Conference on World Wide Web.2008:635-644
[13] Fellbaum C,et al.WordNet:An electronic lexical database[M].MIT Press,1998
[14] Wang H,Wu T,Qi G,et al.On publishing Chinese linked open schema[C]∥Proceedings of the 13th International Semantic Web Conference.2014:293-308
[15] Cilibrasi R L,Vitanyi P M B.The google similarity distance[J].IEEE Transactions on Knowledge and Data Engineering,2007,19(3):370-383
[16] 百度知道.[2014-10-11].http://zhidao.baidu.com
[17] Zhu X,Ghahramani Z.Learning from labeled and unlabeled data with label propagation[R].Technical Report CMU-CALD-02-107,Carnegie Mellon University,2002
[18] Gabrilovich E,Markovitch S.Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C]∥Procee-dings of the 20th International Joint Conference on Artificial Intelligence.2010:1606-1611
[19] 网易微博.[2014-10-11].http://t.163.com
[20] Zhou Jin,Chen Chao,Yu Neng-hai.Tag Clustering AlgorithmUsing Object-based Feature Vector[J].Journal of Chinese Computer Systems,2012,33(3):525-530(in Chinese) 周津,陈超,俞能海.采用对象特征向量表示法的标签聚类算法[J].小型微型计算机系统,2012,33(3):525-530
[21] Fernández-Delgado M,Cernadas E,Barro S,et al.Do we need hundreds of classifiers to solve real world classification problems?[J].The Journal of Machine Learning Research,2014,15(1):3133-3181

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!