Computer Science ›› 2016, Vol. 43 ›› Issue (4): 256-259, 269.doi: 10.11896/j.issn.1002-137X.2016.04.052

Bayesian Chinese Spam Filtering Method Based on Phrases

WANG Qing-song and WEI Ru-yu   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Naive Bayesian has been widely used in the field of spam filtering,in which the feature extraction is one of the essential links in the algorithm.In the past,only words were used as text features for the extraction in the method of Chinese spam filtering.In face of large-scale email training samples,time efficiency of this algorithm will become a bottleneck of spam filtering technology.A Bayesian spam filtering algorithm based on phrases was proposed here which combines a new phrase analysis method put forward in text classification field.Phrases are extracted as the unit accor-ding to the rules of basic noun phrases,verb phrases and semantic analysis.Through comparison test experiment of spam filtering based on words and phrases as unit,the effectiveness of the proposed method was confirmed.

Key words: Spam filtering,Bayesian,Feature extraction,Phrased-based,Chinese word segmentation

