计算机科学 ›› 2011, Vol. 38 ›› Issue (7): 175-180.

• 数据库与数据挖掘 • 上一篇    下一篇

基于概率主题模型的标签预测

袁柳,张龙波   

  1. (陕西师范大学计算机科学学院 西安710062);(山东理工大学计算机学院 淄博255049)
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受国家自然科学基金项目面向入侵检测的数据流挖掘研究(60873196)资助。

Social Tag Predication Based on Probabilistic Topic Model

YUAN Liu,ZHANG Long-bo   

  • Online:2018-11-16 Published:2018-11-16

摘要: 充分利用用户自定义标签信息,是理解Web资源语义,提高Web应用智能程度的重要途径。针对资源标签分派中大量存在的信息不完整、不一致的现象,建立基于用户标记行为特征的概率主题模型,利用概率主题模型实现对标记信息不完整资源的标签预测。根据每个资源所对应的标签的统计特征,可产生不同形式的标签文档,通过分析标签文档所生成主题的性能,确定适合于特定数据集的标签文档形式;利用同一主题内词汇间的高度相关性,设计合理的预测标签排序方法,从而实现对标记信息不完整资源的标签预测以及标签语义不一致现象的检测。在数据集DeliciousT 140和Wikilo+上的测试表明,所提方法能有效实现标签预测,并可提高信息检索的性能。

关键词: 标签系统,标签预测,统计主题模型

Abstract: Fagging information created by users is important to understand the Web resource semantics and to improve the intelligence of Web applications. Probabilistic topic model was exploited to deal with the incompleteness and inconsistence of tagging systems. A probabilistic topic model generating technique based on tag statistical characteristics was proposed. According to tag statistical characteristics of each resource, tag documents with different format can be created. By analyzing the performance generated by different tag documents, document format that is appropriate for a certwin dataset was confirmed. High relatedness between the vocabularies in the same topic was exploited to predicate the tag for resources with incomplete and inconsistence tags. Experiments on DeliciousT 140 and Wiki10+ show the effectiveness of the technique proposed.

Key words: Tagging system, Tag predication, Statistical topic model

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!