计算机科学 ›› 2013, Vol. 40 ›› Issue (Z11): 86-90.

• 智能控制与优化 • 上一篇    下一篇

基于一类SVM的不良信息过滤算法改进

丁霄云,刘功申,孟魁   

  1. 上海交通大学信息安全工程学院 上海200240;上海交通大学信息安全工程学院 上海200240;上海交通大学信息安全工程学院 上海200240
  • 出版日期:2018-11-16 发布日期:2018-11-16
  • 基金资助:
    本文受973计划(2013CB329603),国家自然科学基金项目(61272441,61171173)资助

Research and Improvement of Filter Algorithm of Malicious Information Based on One-class SVM

DING Xiao-yun,LIU Gong-shen and MENG Kui   

  • Online:2018-11-16 Published:2018-11-16

摘要: 互联网的高速发展使得通过网络传输的文件监控和过滤成为一个热门课题。使用传统的基于字符串匹配的算法显然无法满足呈几何爆炸级别的信息增长的监管需求。而使用SVM确实可以提高分类效率,但依然存在维数过大导致存储资源和计算能力浪费的现象。为了有效减少SVM的维数,提出通过使用特征简约对向量机的维数进行约束的一个一类SVM算法改进。实验表明:在选用相同数量的特征词的前提下,改进算法使得不良信息分类和过滤的正确率有明显提高。

关键词: 特征简约,一类SVM算法,分类

Abstract: The research of monitoring and filtering of the files transporting through internet is getting hotter and hotter now.The traditional algorithm based on string-matched is not able to meet the need of the huge increase of information.Although SVM model can surely improve the efficiency of the classification,the problem that SVM’s too large dimension will affect the speed of examine still exists.It also causes a waste of storage space and compute ability.One algorithm was raised by first reducing the dimension by some specific algorithm before classification.The analysis result shows that after the improvement,we can get a more accurate result.

Key words: Feature reduce,One-class SVM,Classification

[1] 冯长远,普杰信.Web文本特征选择算法的研究[J].计算机应用研究,2005,22(7)
[2] 杨凯峰,张毅坤.基于文档频率的特征选择方法[J].计算机工程,2010,6(17)
[3] http://tech.ddvip.com/2009-03/1237883850112130_4.html
[4] 詹毅.朴素贝叶斯算法和SVM算法在Web文本分类中的效率分析[J].成都大学学报,2013,2(1)
[5] 陈燃燃.基于SVM算法的Web分类研究与实现[M].北京:北京邮电大学,2010
[6] 曹建芳,王鸿斌.一种新的基于SVM-KNN的Web文本分类算法[J].计算机与数字工程,2010
[7] Maji S.Efficient Classification for Additive Kernel SVMs[J].Pattern Analysis and Machine Intelligence,2013,5(1)
[8] Erdmann M,Nguyen D D.Hierarchical Training of MultipleSVMs for Personalized Web Filtering[C]∥PRICAI 2012:Trends in Artificial Intelligence.2012
[9] Maldonado S,L’Huillier G.SVM-Based Feature Selection and Classification for Email Filtering[M].Pattern Recognition-Applications and Methods,2013
[10] 许高建.基于Web的文本挖掘技术研究[J].计算机技术与发展,2007,7(6)
[11] 申红,吕宝粮.文本分类的特征提取方法比较与改进[J].计算机仿真,2006,23(3)
[12] 闭乐鹏,徐伟,宋瀚涛.基于一类SVM的贝叶斯分类算法[J].北京理工大学学报,2006,6(2)
[13] 刘文,吴陈.一种新的中文文本分类算法-One ClassSVM—KNN算法[J].计算机技术与发展,2012
[14] Yang Y,Pedersen J O.A comparative study on feature selection in text categorization[M].Machine Learning-International,1997
[15] Manevitz L M.One-class svms for document classification[J].The Journal of Machine Learning Research,2002,2:139-154
[16] Chang C-C,Lin C-J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3)
[17] Li Wen-kai.A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data[J].Geoscience and Remote Sensing,IEEE Transactions on,2011,40(2):717-725

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!