Computer Science ›› 2018, Vol. 45 ›› Issue (10): 155-159.doi: 10.11896/j.issn.1002-137X.2018.10.029

• Information Security • Previous Articles     Next Articles

Improved Data Anomaly Detection Method Based on Isolation Forest

XU Dong, WANG Yan-jun, MENG Yu-long, ZHANG Zi-ying   

  1. College of Computer Science and Technology,Harbin Engineering University,Harbin 150001,China
  • Received:2017-09-05 Online:2018-11-05 Published:2018-11-05

Abstract: An improved data anomaly detection method namely SA-iForest was proposed to solve the problem of low accuracy,poor execution efficiency and generalization ability of exsiting anomaly data detection algorithm based on isolated forest.The isolation tree with high precision and differences is selected to optimize the forest based on simulated annealing algorithm.At the same time,the redundant isolated trees are removed,and the forest construction of isdated trees is improved.The method of data anomaly detection based on SA-iForest was compared with the traditional Isolation Forest algorithm and LOF algorithm.The accuracy,execution efficiency,and stability of the proposed algorithm have significant improvement through the standard simulation data set.

Key words: Isolation forest, Outlier detection, SA-iForest, Simulated annealing

CLC Number: 

  • TP306
[1]AGGARWAL C C.Outlier analysis [M].Berlin:Springer,2013.
[2]KNORR E M,NG R T.Algorithms for minning distance based outliers in large datasets[C]∥Proceedings of the 24th International Conference on Very Large Databases.1998:392-403.
[3]BREUNING M M,KRIEGEL H P,NG R T,et al.LoF:Indentifying density-based local outliers[J].ACM SIGMOD Record,2000,29(2):93-104.
[4]HE Z,XU X,DENG S.Discovering cluster-based local outliers[J].Pattern Recognition Letters,2003,24(9):1641-1650.
[5]ROUSSEEUW P J,DRIESSEN K V.A fast algorithm for the minimum covariance determinant estimator[J].Technometrics,1999,41(3):212-223.
[6]LIU F T,TING K M,ZHOU Z H.Isolation-based Anomaly Detection[J].ACM Transactions on Knowledge Discovery from Data,2012,6(1):1556-4681.
[7]VINH N X,CHAN J,ROMANO S,et al.Discovering outlying aspects in large datasets[J].Data Mining & KnowledgeDisco-very,2016,30(6):1520-1555.
[8]DING Z.An anomaly detection approach based on isolation fo- rest algorithm for streaming data using sliding window[C]∥Proceedings of the 3rd IFAC International Conference on Intelligent Control and Automation Science.2013:12-17.
[9]YU X,TANG L A,HAN J W.Filtering and refinement:a two-stage approach for efficient and effective anomaly detection[C]∥IEEE International Conference on Data Mining.2009:617-626.
[10]ARYAL S,TING K M,WELLS J R,et al.Improving iForest with Relative Mass [J].Advances in Knowledge Discovery and Data Minning,2014,8444(2):510-521.
[11]HOU Y X,DUAN L,QIN J L,et al.Parallel Detection Design Based on Isolation Forest [J].Journal of Computer Engineering and Science,2017,39(2):236-244.(in Chinese)
侯泳旭,段磊,秦江龙,等.基于Isolation Forest的并行化异常探测设计[J].计算机工程与科学,2017,39(2):236-244.
[12]PREISS B R.Data Structures and Algorithms with Object Orien- ted Design Patterns in Java[M].New Jersey:Wiley,1999.
[13]WANG B,WANG S Y.A New Test Case Generation and Reduction Algorithm Based on Simulated Annealing[J].Computer Applications and Software,2013,30(2):78-81.(in Chinese)
王博,王曙燕.一种新的基于模拟退火的测试用例生成与约简算法[J].计算机应用与软件,2013,30(2):78-81.
[14]WANG C Y,LIU Z,WANG H B.An Algorithm for Exception Data Mining Based on OPTICS and IncLOF [J].Journal of Tianjin University of Science and Technology,2015,6(31):14-18.(in Chinese)
王传玉,刘震,王怀彬.一种基于OPTICS和IncLOF的异常数据挖掘算法[J].天津理工大学学报,2015,6(31):14-18.
[15]KELLER F,MULLER E,BOHM K.HiCS:High contrast subspaces for density-based outlier ranking [C]∥Proceedings of IEEE International Conference on Data Engineering.2012:1037-1048.
[16]BRADLEY A P.The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms[J].Pattern Recognition,1997,30(7):1145-1159.
[17]FAWCETT T.An Introduction to ROC Analysis[J].Pattern Recognition Letters,2006,27(8):861-874.
[1] LIU Yi, MAO Ying-chi, CHENG Yang-kun, GAO Jian, WANG Long-bao. Locality and Consistency Based Sequential Ensemble Method for Outlier Detection [J]. Computer Science, 2022, 49(1): 146-152.
[2] GAO Shi-shun, ZHAO Hai-tao, ZHANG Xiao-ying, WEI Ji-bo. Self-adaptive Intelligent Wireless Propagation Model to Different Scenarios [J]. Computer Science, 2021, 48(7): 324-332.
[3] WANG Guo-wu, CHEN Yuan-yan. Improvement of DV-Hop Location Algorithm Based on Hop Correction and Genetic Simulated Annealing Algorithm [J]. Computer Science, 2021, 48(6A): 313-316.
[4] LIU Li-cheng, XU Yi-fan, XIE Gui-cai, DUAN Lei. Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database [J]. Computer Science, 2021, 48(2): 93-99.
[5] ZHONG Ying-yu, CHEN Song-can. High-order Multi-view Outlier Detection [J]. Computer Science, 2020, 47(9): 99-104.
[6] WANG Zhe, TANG Qi, WANG Ling, WEI Ji-bo. Joint Optimization Algorithm for Partition-Scheduling of Dynamic Partial Reconfigurable Systems Based on Simulated Annealing [J]. Computer Science, 2020, 47(8): 26-31.
[7] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[8] JIN Xiao-min, HUA Wen-qiang. Energy Optimization Oriented Resource Management in Mobile Cloud Computing [J]. Computer Science, 2020, 47(6): 247-251.
[9] CHEN Jia,OUYANG Jin-yuan,FENG An-qi,WU Yuan,QIAN Li-ping. DoS Anomaly Detection Based on Isolation Forest Algorithm Under Edge Computing Framework [J]. Computer Science, 2020, 47(2): 287-293.
[10] ZHANG De-gan, YANG Peng, ZHANG Jie, GAO Jin-xin, ZHANG Ting. New Method of Traffic Flow Forecasting of Connected Vehicles Based on Quantum Particle Swarm Optimization Strategy [J]. Computer Science, 2020, 47(11A): 327-333.
[11] XU Fei-xiang,YE Xia,LI Lin-lin,CAO Jun-bo,WANG Xin. Comprehensive Calculation of Semantic Similarity of Ontology Concept Based on SA-BP Algorithm [J]. Computer Science, 2020, 47(1): 199-204.
[12] WANG Gai-yun, WANG Lei-yang, LU Hao-xiang. RSSI-based Centroid Localization Algorithm Optimized by Hybrid Swarm Intelligence Algorithm [J]. Computer Science, 2019, 46(9): 125-129.
[13] ZHANG Huan-long, GAO Zeng, ZHANG Xiu-jiao, SHI Kun-feng. Image Matching Method Combining Hybrid Simulated Annealing and Antlion Optimizer [J]. Computer Science, 2019, 46(6): 328-333.
[14] LI Chang-jing,ZHAO Shu-liang,CHI Yun-xian. Outlier Detection Algorithm Based on Spectral Embedding and Local Density [J]. Computer Science, 2019, 46(3): 260-266.
[15] LIU Jing-fa, LI Fan, JIANG Sheng-yi. Focused Annealing Crawler Algorithm for Rainstorm Disasters Based on Comprehensive Priority and Host Information [J]. Computer Science, 2019, 46(2): 215-222.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!