计算机科学 ›› 2018, Vol. 45 ›› Issue (4): 157-162.doi: 10.11896/j.issn.1002-137X.2018.04.026

• 信息安全 • 上一篇    下一篇

基于伪梯度提升决策树的内网防御算法

厉柏伸,李领治,孙涌,朱艳琴   

  1. 苏州大学计算机科学与技术学院 江苏 苏州215006,苏州大学计算机科学与技术学院 江苏 苏州215006,苏州大学计算机科学与技术学院 江苏 苏州215006,苏州大学计算机科学与技术学院 江苏 苏州215006
  • 出版日期:2018-04-15 发布日期:2018-05-11
  • 基金资助:
    本文受国家自然科学基金(61373164,1)资助

Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree

LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin   

  • Online:2018-04-15 Published:2018-05-11

摘要: 结合TF-IDF算法思想,提出了特征频率、森林频率以及伪梯度提升决策树,解决了梯度提升决策树随着迭代次数的增加,错误数据被边缘化的问题。在伪梯度提升决策树中,所有决策树分别在原始数据集的Bootstrapping后的数据集上产生,无须针对每次迭代来对数据集采样。在分布式集群上进行内网防御的实验,结果表明在一定规模的训练集上,伪梯度提升决策树具有更好的预测准确度。

关键词: 伪梯度提升决策树,分布式集群,内网防御

Abstract: Combining with the idea of TF-IDF algorithm,the frequency of characteristics(Eigen Frequency),the frequency of forest(Forest Frequency) and the pseudo boosting decision tree(PBDT) were put forward,solving the margi-nalized problem of wrong data with the increasing number of iterations for gradient boosting decision tree(GBDT).In PBDT,all the decision trees produce respectively in data sets after the original data set of the Bootstrapping,without aiming at each iteration to sample data sets.Then intranet defense experiment was conducted on distributed cluster.The experimental results show that on the training set with a certain scale,PBDT has better prediction accuracy.

Key words: Pseudo boosting decision tree,Distributed cluster,Intranet defense

[1] PRADHAN B.A comparative study on the predictive ability of the decision tree,support vector machine and neuro-fuzzy mo-dels in landslide susceptibility mapping using GIS[J].Computers & Geosciences,2013,51(2):350-365.
[2] SHOTTON J.Real-time human pose recognition in parts from single depth images[J].Communications of the ACM,2013,56(1):116-124.
[3] FREUND Y,SCHAPIRE R E.A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting[J].Journal of Computer and System Sciences,1997,5(1):119-139.
[4] GLIGOROV V V,WILLIAMS M.Efficient,reliable and fasthigh-level triggering using a bonsai boosted decision tree[J].Journal of Instrumentation,2012,8(2):6.
[5] RUTKOWSKI L.Trees for Mining Data Streams Based on the McDiarmid’s Bound[J].IEEE Transactions on Knowledge and Data Engineering,2013,25(6):1272-1279.
[6] SCHAPIRE R E.The strength of weak learn ability[J].Ma-chine Learning,1990,5(2):197-227.
[7] XIE J B.Research and Implementation of Intranet Security Situa-tion Awareness Technology[D].Guangzhou:Guangdong University of Technology,2015.(in Chinese) 谢锦彪.内网安全态势感知技术的研究与实现[D].广州:广东工业大学,2015.
[8] BREIMAN L.Bagging predictors [J].Machine Learning,1996,24(2):123-140.
[9] SCHAPIRE R E.A brief introduction to boosting[C]∥International Joint Conference on Artificial Intelligence.Sweden,1999:1401-1406.
[10] SCHAPIRE R E,SINGER Y.Improved boosting algorithmsusing confidence-rated predictions[J].Machine Learning,1999,37(3):297-336.
[11] WITTEN I H,FRANK E,HALL M A.Data Mining:Practical Machine Learning Tools and Techniques(Second Edition)[M].San Francisco:Morgan Kaufmann publications,2005.
[12] PAIK J H.A novel TF-IDF weighting scheme for effective ran-king[C]∥36th International ACM SIGIR Conference on Research and Development in Information Retrieval.2013:343-352.
[13] WU H C,LUK R W P,WONG K F,et al.Interpreting TF-IDF term weights as making relevance decisions[J].ACM Transactions on Information Systems,2008,26(3):55-59.
[14] ESCALANTE H J.Term-weighting learning via genetic Pro-gramming for text classification[J].Knowledge-Based Systems,2014,83(1):176-189.
[15] KUNCORO B A,ISWANTO B H.TF-IDF method in ranking keywords of Instagram users’ image captions[C]∥International Conference on Information Technology Systems & Innovation.2015:1-5.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 编辑部. 新网站开通,欢迎大家订阅![J]. 计算机科学, 2018, 1(1): 1 .
[2] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[3] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .