Computer Science ›› 2014, Vol. 41 ›› Issue (6): 214-216.doi: 10.11896/j.issn.1002-137X.2014.06.042

Previous Articles     Next Articles

Improvement of Information Gain in Spam Filtering

ZHAI Jun-chang,QIN Yu-ping and CHE Wei-wei   

  • Online:2018-11-14 Published:2018-11-14

Abstract: The paper put forward a kind of improved information gain for the feature words selection in spam filtering.Firstly,defined gain ratio according to the probability of feature words,and then amplifed or weakened the amount of information of the feature words for classification,thereby improving the calculation method of category conditional entropy. Finally, combining with the naive Bayes decision method of maximum a posteriori hypothesis,carried out an experiment on the English Corpus to analyze the algorithm through recall,correct,accuracy and error.The experimental results show that the improved algorithm can enhance classification precision and reduce user loss.

Key words: Information gain,Feature selection,Spam,Naive Bayes

[1] Guzella T S,Caminhas W M.A review of machine learning approaches to spam filtering[J].Expert Systems with Application,2009,6(7):10206-10222
[2] Lai Chih-chin.An Empirical Study of Three Machine LearningMethods for Spam Filtering[J].Knowledge-Based System,2007,20(3):249-254
[3] 黄国伟,许昱玮.基于用户反馈的混合型垃圾邮件过滤方法[J].计算机应用,2013,33(7):1861-1865
[4] 邓维斌,王国胤,洪智勇.基于粗糙集的加权朴素贝叶斯邮件过滤方法[J].计算机科学,2011,38(2):218-221
[5] Sanchez F,Duan Zhen-hai,Dong Ying-fei.Understanding Forgery Properties of Spam Delivery Paths[C]∥CEAS 2010Se-venth annual Collaboration,Electronic messaging,AntiAbuse and Spam Conference(CEAS 2010).Redmond,Washington,US,July 2010
[6] 陈孝礼,刘培玉.应用于垃圾邮件过滤的词序列核[J].计算机应用,2011,31(3):698-701
[7] Sahami M,Dumais S,Heckerman D,et al.A Bayesian approach to filtering Junk e-mail [C]∥Learning for Text Categorization:Papers from AAAI Workshop.Madison,Wisconsin,1998:55-62
[8] Androutsopoulos I,Koutsias J,Chandrinos K V,et al.An Evalua-tion of Naive Bayesian Anti-Spam Filtering[C]∥Proc of the Workshop on Machine learning in the New Information Age,11th European Conference on Machine Learning(ECML’00).Barcelona,Spain,June 2000:9-17
[9] Schneider K.A Comparison of Event Models for Naive BayesAnti-spam E-mail Filtering[C]∥Procedings of the 10th Confe-rence of the European Chapter of the Association for Computational Linguistics(EACL’03).2003:307-314
[10] Vangelis M,Androutsopoulos I,Georgios P.Spam filtering with Naive Bayes-which Naive Bayes?[C]∥CEAS 2006Third Conference on Email and AntiSpam(CEAS 2006).Mountain View,California USA,July 2006:27-28
[11] Chen Bin,Dong Shou-bin,Fang Wei-dong.Introduction of Fin-gerprint Vector based Bayesian Method for Spam Filtering [C]∥CEAS 2007Fourth Conference on Email and Anti-Spam(CEAS 2007).Mountain View,California USA,August 2007

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!