计算机科学 ›› 2014, Vol. 41 ›› Issue (6): 214-216.doi: 10.11896/j.issn.1002-137X.2014.06.042
翟军昌,秦玉平,车伟伟
ZHAI Jun-chang,QIN Yu-ping and CHE Wei-wei
摘要: 针对垃圾邮件过滤中的特征项选择问题,提出了一种改进的信息增益方法。首先利用特征词的先验概率定义增益比,然后利用增益比对特征词为整个分类所提供的信息量进行放大或弱化,从而对特征词的类别条件熵计算作了改进,采用极大后验假设朴素贝叶斯决策方法在英文语料库上进行实验,通过召回率、正确率、精确率和错误率对算法进行评价分析。实验结果表明,改进后的算法提高了过滤器的分类精度,降低了过滤器对合法邮件的误判给用户带来的损失。
[1] Guzella T S,Caminhas W M.A review of machine learning approaches to spam filtering[J].Expert Systems with Application,2009,6(7):10206-10222 [2] Lai Chih-chin.An Empirical Study of Three Machine LearningMethods for Spam Filtering[J].Knowledge-Based System,2007,20(3):249-254 [3] 黄国伟,许昱玮.基于用户反馈的混合型垃圾邮件过滤方法[J].计算机应用,2013,33(7):1861-1865 [4] 邓维斌,王国胤,洪智勇.基于粗糙集的加权朴素贝叶斯邮件过滤方法[J].计算机科学,2011,38(2):218-221 [5] Sanchez F,Duan Zhen-hai,Dong Ying-fei.Understanding Forgery Properties of Spam Delivery Paths[C]∥CEAS 2010Se-venth annual Collaboration,Electronic messaging,AntiAbuse and Spam Conference(CEAS 2010).Redmond,Washington,US,July 2010 [6] 陈孝礼,刘培玉.应用于垃圾邮件过滤的词序列核[J].计算机应用,2011,31(3):698-701 [7] Sahami M,Dumais S,Heckerman D,et al.A Bayesian approach to filtering Junk e-mail [C]∥Learning for Text Categorization:Papers from AAAI Workshop.Madison,Wisconsin,1998:55-62 [8] Androutsopoulos I,Koutsias J,Chandrinos K V,et al.An Evalua-tion of Naive Bayesian Anti-Spam Filtering[C]∥Proc of the Workshop on Machine learning in the New Information Age,11th European Conference on Machine Learning(ECML’00).Barcelona,Spain,June 2000:9-17 [9] Schneider K.A Comparison of Event Models for Naive BayesAnti-spam E-mail Filtering[C]∥Procedings of the 10th Confe-rence of the European Chapter of the Association for Computational Linguistics(EACL’03).2003:307-314 [10] Vangelis M,Androutsopoulos I,Georgios P.Spam filtering with Naive Bayes-which Naive Bayes?[C]∥CEAS 2006Third Conference on Email and AntiSpam(CEAS 2006).Mountain View,California USA,July 2006:27-28 [11] Chen Bin,Dong Shou-bin,Fang Wei-dong.Introduction of Fin-gerprint Vector based Bayesian Method for Spam Filtering [C]∥CEAS 2007Fourth Conference on Email and Anti-Spam(CEAS 2007).Mountain View,California USA,August 2007 |
No related articles found! |
|