计算机科学 ›› 2018, Vol. 45 ›› Issue (10): 155-159.doi: 10.11896/j.issn.1002-137X.2018.10.029

• 信息安全 • 上一篇    下一篇

基于Isolation Forest改进的数据异常检测方法

徐东, 王岩俊, 孟宇龙, 张子迎   

  1. 哈尔滨工程大学计算机科学与技术学院 哈尔滨150001
  • 收稿日期:2017-09-05 出版日期:2018-11-05 发布日期:2018-11-05
  • 作者简介:徐 东(1969-),男,博士,教授,主要研究方向为可信计算、网络与信息安全;王岩俊(1992-),男,硕士生,主要研究方向为可信计算、网络与信息安全;孟宇龙(1976-),男,博士,讲师,主要研究方向为可信计算、网络与信息安全,E-mail:mengyulong@hrbeu.edu.cn(通信作者);张子迎(1973-),男,博士,副教授,主要研究方向为人工智能与机器感知。
  • 基金资助:
    国家自然科学基金项目(61502118)资助

Improved Data Anomaly Detection Method Based on Isolation Forest

XU Dong, WANG Yan-jun, MENG Yu-long, ZHANG Zi-ying   

  1. College of Computer Science and Technology,Harbin Engineering University,Harbin 150001,China
  • Received:2017-09-05 Online:2018-11-05 Published:2018-11-05

摘要: 针对现有的基于隔离森林(Isolation Forest)的数据异常检测算法检测精度低、执行效率差和泛化能力弱等问题,提出一种改进的数据异常检测方法SA-iForest。该方法基于模拟退火算法选择精度高和有差异性的隔离树来优化森林,同时去除冗余的隔离树,改进了隔离森林的森林构建。采用标准仿真数据集对所提方法进行验证,结果表明该方法与传统Isolation Forest和LOF方法相比,在准确率、执行效率和稳定性方面均有显著提高。

关键词: SA-iForest, 隔离森林, 模拟退火, 异常检测

Abstract: An improved data anomaly detection method namely SA-iForest was proposed to solve the problem of low accuracy,poor execution efficiency and generalization ability of exsiting anomaly data detection algorithm based on isolated forest.The isolation tree with high precision and differences is selected to optimize the forest based on simulated annealing algorithm.At the same time,the redundant isolated trees are removed,and the forest construction of isdated trees is improved.The method of data anomaly detection based on SA-iForest was compared with the traditional Isolation Forest algorithm and LOF algorithm.The accuracy,execution efficiency,and stability of the proposed algorithm have significant improvement through the standard simulation data set.

Key words: Isolation forest, Outlier detection, SA-iForest, Simulated annealing

中图分类号: 

  • TP306
[1]AGGARWAL C C.Outlier analysis [M].Berlin:Springer,2013.
[2]KNORR E M,NG R T.Algorithms for minning distance based outliers in large datasets[C]∥Proceedings of the 24th International Conference on Very Large Databases.1998:392-403.
[3]BREUNING M M,KRIEGEL H P,NG R T,et al.LoF:Indentifying density-based local outliers[J].ACM SIGMOD Record,2000,29(2):93-104.
[4]HE Z,XU X,DENG S.Discovering cluster-based local outliers[J].Pattern Recognition Letters,2003,24(9):1641-1650.
[5]ROUSSEEUW P J,DRIESSEN K V.A fast algorithm for the minimum covariance determinant estimator[J].Technometrics,1999,41(3):212-223.
[6]LIU F T,TING K M,ZHOU Z H.Isolation-based Anomaly Detection[J].ACM Transactions on Knowledge Discovery from Data,2012,6(1):1556-4681.
[7]VINH N X,CHAN J,ROMANO S,et al.Discovering outlying aspects in large datasets[J].Data Mining & KnowledgeDisco-very,2016,30(6):1520-1555.
[8]DING Z.An anomaly detection approach based on isolation fo- rest algorithm for streaming data using sliding window[C]∥Proceedings of the 3rd IFAC International Conference on Intelligent Control and Automation Science.2013:12-17.
[9]YU X,TANG L A,HAN J W.Filtering and refinement:a two-stage approach for efficient and effective anomaly detection[C]∥IEEE International Conference on Data Mining.2009:617-626.
[10]ARYAL S,TING K M,WELLS J R,et al.Improving iForest with Relative Mass [J].Advances in Knowledge Discovery and Data Minning,2014,8444(2):510-521.
[11]HOU Y X,DUAN L,QIN J L,et al.Parallel Detection Design Based on Isolation Forest [J].Journal of Computer Engineering and Science,2017,39(2):236-244.(in Chinese)
侯泳旭,段磊,秦江龙,等.基于Isolation Forest的并行化异常探测设计[J].计算机工程与科学,2017,39(2):236-244.
[12]PREISS B R.Data Structures and Algorithms with Object Orien- ted Design Patterns in Java[M].New Jersey:Wiley,1999.
[13]WANG B,WANG S Y.A New Test Case Generation and Reduction Algorithm Based on Simulated Annealing[J].Computer Applications and Software,2013,30(2):78-81.(in Chinese)
王博,王曙燕.一种新的基于模拟退火的测试用例生成与约简算法[J].计算机应用与软件,2013,30(2):78-81.
[14]WANG C Y,LIU Z,WANG H B.An Algorithm for Exception Data Mining Based on OPTICS and IncLOF [J].Journal of Tianjin University of Science and Technology,2015,6(31):14-18.(in Chinese)
王传玉,刘震,王怀彬.一种基于OPTICS和IncLOF的异常数据挖掘算法[J].天津理工大学学报,2015,6(31):14-18.
[15]KELLER F,MULLER E,BOHM K.HiCS:High contrast subspaces for density-based outlier ranking [C]∥Proceedings of IEEE International Conference on Data Engineering.2012:1037-1048.
[16]BRADLEY A P.The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms[J].Pattern Recognition,1997,30(7):1145-1159.
[17]FAWCETT T.An Introduction to ROC Analysis[J].Pattern Recognition Letters,2006,27(8):861-874.
[1] 徐天慧, 郭强, 张彩明.
基于全变分比分隔距离的时序数据异常检测
Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance
计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174
[2] 李其烨, 邢红杰.
基于最大相关熵的KPCA异常检测方法
KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion
计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175
[3] 王馨彤, 王璇, 孙知信.
基于多尺度记忆残差网络的网络流量异常检测模型
Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network
计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011
[4] 杜航原, 李铎, 王文剑.
一种面向电商网络的异常用户检测方法
Method for Abnormal Users Detection Oriented to E-commerce Network
计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092
[5] 武玉坤, 李伟, 倪敏雅, 许志骋.
单类支持向量机融合深度自编码器的异常检测模型
Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder
计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142
[6] 冷佳旭, 谭明圮, 胡波, 高新波.
基于隐式视角转换的视频异常检测
Video Anomaly Detection Based on Implicit View Transformation
计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266
[7] 刘意, 毛莺池, 程杨堃, 高建, 王龙宝.
基于邻域一致性的异常检测序列集成方法
Locality and Consistency Based Sequential Ensemble Method for Outlier Detection
计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156
[8] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[9] 高士顺, 赵海涛, 张晓瀛, 魏急波.
一种自适应于不同场景的智能无线传播模型
Self-adaptive Intelligent Wireless Propagation Model to Different Scenarios
计算机科学, 2021, 48(7): 324-332. https://doi.org/10.11896/jsjkx.201000181
[10] 郭奕杉, 刘漫丹.
基于时空轨迹数据的异常检测
Anomaly Detection Based on Spatial-temporal Trajectory Data
计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193
[11] 王国武, 陈元琰.
基于跳数修正和遗传模拟退火优化DV-Hop定位算法
Improvement of DV-Hop Location Algorithm Based on Hop Correction and Genetic Simulated Annealing Algorithm
计算机科学, 2021, 48(6A): 313-316. https://doi.org/10.11896/jsjkx.201000101
[12] 邢红杰, 郝忠.
基于全局和局部判别对抗自编码器的异常检测方法
Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder
计算机科学, 2021, 48(6): 202-209. https://doi.org/10.11896/jsjkx.200400083
[13] 管文华, 林春雨, 杨尚蓉, 刘美琴, 赵耀.
基于人体关节点的低头异常行人检测
Detection of Head-bowing Abnormal Pedestrians Based on Human Joint Points
计算机科学, 2021, 48(5): 163-169. https://doi.org/10.11896/jsjkx.200800214
[14] 刘立成, 徐一凡, 谢贵才, 段磊.
面向NoSQL数据库的JSON文档异常检测与语义消歧模型
Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database
计算机科学, 2021, 48(2): 93-99. https://doi.org/10.11896/jsjkx.200900039
[15] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!