计算机科学 ›› 2018, Vol. 45 ›› Issue (10): 155-159.doi: 10.11896/j.issn.1002-137X.2018.10.029
徐东, 王岩俊, 孟宇龙, 张子迎
XU Dong, WANG Yan-jun, MENG Yu-long, ZHANG Zi-ying
摘要: 针对现有的基于隔离森林(Isolation Forest)的数据异常检测算法检测精度低、执行效率差和泛化能力弱等问题,提出一种改进的数据异常检测方法SA-iForest。该方法基于模拟退火算法选择精度高和有差异性的隔离树来优化森林,同时去除冗余的隔离树,改进了隔离森林的森林构建。采用标准仿真数据集对所提方法进行验证,结果表明该方法与传统Isolation Forest和LOF方法相比,在准确率、执行效率和稳定性方面均有显著提高。
中图分类号:
[1]AGGARWAL C C.Outlier analysis [M].Berlin:Springer,2013. [2]KNORR E M,NG R T.Algorithms for minning distance based outliers in large datasets[C]∥Proceedings of the 24th International Conference on Very Large Databases.1998:392-403. [3]BREUNING M M,KRIEGEL H P,NG R T,et al.LoF:Indentifying density-based local outliers[J].ACM SIGMOD Record,2000,29(2):93-104. [4]HE Z,XU X,DENG S.Discovering cluster-based local outliers[J].Pattern Recognition Letters,2003,24(9):1641-1650. [5]ROUSSEEUW P J,DRIESSEN K V.A fast algorithm for the minimum covariance determinant estimator[J].Technometrics,1999,41(3):212-223. [6]LIU F T,TING K M,ZHOU Z H.Isolation-based Anomaly Detection[J].ACM Transactions on Knowledge Discovery from Data,2012,6(1):1556-4681. [7]VINH N X,CHAN J,ROMANO S,et al.Discovering outlying aspects in large datasets[J].Data Mining & KnowledgeDisco-very,2016,30(6):1520-1555. [8]DING Z.An anomaly detection approach based on isolation fo- rest algorithm for streaming data using sliding window[C]∥Proceedings of the 3rd IFAC International Conference on Intelligent Control and Automation Science.2013:12-17. [9]YU X,TANG L A,HAN J W.Filtering and refinement:a two-stage approach for efficient and effective anomaly detection[C]∥IEEE International Conference on Data Mining.2009:617-626. [10]ARYAL S,TING K M,WELLS J R,et al.Improving iForest with Relative Mass [J].Advances in Knowledge Discovery and Data Minning,2014,8444(2):510-521. [11]HOU Y X,DUAN L,QIN J L,et al.Parallel Detection Design Based on Isolation Forest [J].Journal of Computer Engineering and Science,2017,39(2):236-244.(in Chinese) 侯泳旭,段磊,秦江龙,等.基于Isolation Forest的并行化异常探测设计[J].计算机工程与科学,2017,39(2):236-244. [12]PREISS B R.Data Structures and Algorithms with Object Orien- ted Design Patterns in Java[M].New Jersey:Wiley,1999. [13]WANG B,WANG S Y.A New Test Case Generation and Reduction Algorithm Based on Simulated Annealing[J].Computer Applications and Software,2013,30(2):78-81.(in Chinese) 王博,王曙燕.一种新的基于模拟退火的测试用例生成与约简算法[J].计算机应用与软件,2013,30(2):78-81. [14]WANG C Y,LIU Z,WANG H B.An Algorithm for Exception Data Mining Based on OPTICS and IncLOF [J].Journal of Tianjin University of Science and Technology,2015,6(31):14-18.(in Chinese) 王传玉,刘震,王怀彬.一种基于OPTICS和IncLOF的异常数据挖掘算法[J].天津理工大学学报,2015,6(31):14-18. [15]KELLER F,MULLER E,BOHM K.HiCS:High contrast subspaces for density-based outlier ranking [C]∥Proceedings of IEEE International Conference on Data Engineering.2012:1037-1048. [16]BRADLEY A P.The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms[J].Pattern Recognition,1997,30(7):1145-1159. [17]FAWCETT T.An Introduction to ROC Analysis[J].Pattern Recognition Letters,2006,27(8):861-874. |
[1] | 徐天慧, 郭强, 张彩明. 基于全变分比分隔距离的时序数据异常检测 Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance 计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174 |
[2] | 李其烨, 邢红杰. 基于最大相关熵的KPCA异常检测方法 KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion 计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175 |
[3] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[4] | 杜航原, 李铎, 王文剑. 一种面向电商网络的异常用户检测方法 Method for Abnormal Users Detection Oriented to E-commerce Network 计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092 |
[5] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
[6] | 冷佳旭, 谭明圮, 胡波, 高新波. 基于隐式视角转换的视频异常检测 Video Anomaly Detection Based on Implicit View Transformation 计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266 |
[7] | 刘意, 毛莺池, 程杨堃, 高建, 王龙宝. 基于邻域一致性的异常检测序列集成方法 Locality and Consistency Based Sequential Ensemble Method for Outlier Detection 计算机科学, 2022, 49(1): 146-152. https://doi.org/10.11896/jsjkx.201000156 |
[8] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[9] | 高士顺, 赵海涛, 张晓瀛, 魏急波. 一种自适应于不同场景的智能无线传播模型 Self-adaptive Intelligent Wireless Propagation Model to Different Scenarios 计算机科学, 2021, 48(7): 324-332. https://doi.org/10.11896/jsjkx.201000181 |
[10] | 郭奕杉, 刘漫丹. 基于时空轨迹数据的异常检测 Anomaly Detection Based on Spatial-temporal Trajectory Data 计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193 |
[11] | 王国武, 陈元琰. 基于跳数修正和遗传模拟退火优化DV-Hop定位算法 Improvement of DV-Hop Location Algorithm Based on Hop Correction and Genetic Simulated Annealing Algorithm 计算机科学, 2021, 48(6A): 313-316. https://doi.org/10.11896/jsjkx.201000101 |
[12] | 邢红杰, 郝忠. 基于全局和局部判别对抗自编码器的异常检测方法 Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder 计算机科学, 2021, 48(6): 202-209. https://doi.org/10.11896/jsjkx.200400083 |
[13] | 管文华, 林春雨, 杨尚蓉, 刘美琴, 赵耀. 基于人体关节点的低头异常行人检测 Detection of Head-bowing Abnormal Pedestrians Based on Human Joint Points 计算机科学, 2021, 48(5): 163-169. https://doi.org/10.11896/jsjkx.200800214 |
[14] | 刘立成, 徐一凡, 谢贵才, 段磊. 面向NoSQL数据库的JSON文档异常检测与语义消歧模型 Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database 计算机科学, 2021, 48(2): 93-99. https://doi.org/10.11896/jsjkx.200900039 |
[15] | 邹承明, 陈德. 高维大数据分析的无监督异常检测方法 Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis 计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141 |
|