计算机科学 ›› 2012, Vol. 39 ›› Issue (5): 177-179.

• 数据库与数据挖掘 • 上一篇    下一篇

基于组合特征的动态垃圾博客过滤算法

任永功,尹明飞,杨荣杰   

  1. (辽宁师范大学计算机与信息技术学院 大连 116029)
  • 出版日期:2018-11-16 发布日期:2018-11-16

Dynamic Splog Filtering Algorithm Based on Combined Features

  • Online:2018-11-16 Published:2018-11-16

摘要: 近几年,垃圾博客过滤成为国际上新的热点研究领域。现有的过滤算法大多基于词频特征分类,特征冗余并缺乏关联性。为了解决此问题,提出一种基于组合特征的动态垃圾博客过滤算法(crwsr}>,该算法采用作者属性和自相似特征来解决特征冗余和关联性低的问题,并应用贝叶斯分类算法优化词频特征分类。实验表明,该算法能适应博客随时间变化而动态更新的特点,同时提高了过滤效率。

关键词: 垃圾博客过滤,词频特征,自相似特征,组合特征,贝叶斯分类

Abstract: Splog filtering has become a new hot area in the international in recent years. Most of the traditional filtering algorithms arc based on word frequency feature classification, which is quite redundancy and lack of relevance. According to this problem,a dynamic filtering algorithm based on the combination of features for splog(CFDSI))was proposed to solve the problem of low relevance and redundancy. The CFDSD algorithm uses self-similarity feathers and the attributes of author, at the same time adopts the 13ayesian classification algorithm to optimize word frequency feature classification. Experiments show that the algorithm is adaptable to dynamical updated features of the blog with time changes, and improves filtering efficiency, while reducing the time to filter splog.

Key words: Splog filtering, Term frequency features, Sclf-similarity features, Combined features, Bayesian classification

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!