计算机科学 ›› 2022, Vol. 49 ›› Issue (1): 146-152.doi: 10.11896/jsjkx.201000156
刘意, 毛莺池, 程杨堃, 高建, 王龙宝
LIU Yi, MAO Ying-chi, CHENG Yang-kun, GAO Jian, WANG Long-bao
摘要: 异常检测已广泛应用于多个应用领域,如网络入侵检测、信用卡欺诈检测等。数据维度的增加导致出现许多不相关和冗余的特征,这些特征会掩盖相关特征,出现假阳性结果。由于高维数据具有稀疏性和距离聚集效应,传统的基于密度、距离等的异常检测算法不再适用。大部分基于机器学习的异常检测研究都关注单一模型,而单一模型在抗过拟合能力上存在一定的不足。集成学习模型有着良好的泛化能力,而且在实际应用中展现出比单一模型更好的预测准确性。文中提出了基于邻域一致性的异常检测序列集成方法(Locality and Consistency Based Sequential Ensemble Method for Outlier Detection,LCSE)。首先基于多样性构造异常检测基本模型,其次根据全局集成一致性筛选出异常候选点,最后考虑数据局部邻域相关性选择并组合基本模型结果。通过实验验证,LCSE相比传统方法异常检测的准确率平均提升了20.7%,与集成算法LSCP_AOM和iForest相比,性能 (AUC)平均提升了3.6%,因此其性能优于其他集成方法和神经网络方法。
中图分类号:
[1]AGGARWALC C.Outlier analysis[C]//Data mining.Cham:Springer,2015:237-263. [2]SCHUBERT E,WOJDANOWSKI R,ZIMEK A,et al.On eva-luation of outlier rankings and outlier scores[C]//Proceedings of the 2012 SIAM International Conference on Data Mining.Philadelphia:SIAM,2012:1047-1058. [3]CAMPOS G O,ZIMEK A,MEIRA W.An unsupervised boosting strategy for outlier detection ensembles[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Cham:Springer,2018:564-576. [4]ZHAO Y,NASRULLAH Z,HRYNIEWICKI M K,et al.LSCP:Locally selective combination in parallel outlier ensembles[C]//Proceedings of the 2019 SIAM International Confe-rence on Data Mining.Philadelphia:SIAM,2019:585-593. [5]CHEN Y P,YU L,CHEN H.Traffic Anomaly Detection Based on Wavelet Neural Network and ARMA Model in Big Data Environment[J].Journal of Chongqing Institute of Technology(Natural Science),2019,33(10):149-154. [6]CHEN J,SATHE S,AGGARWAL C,et al.Outlier detectionwith autoencoder ensembles[C]//Proceedings of the 2017 SIAM International Conference on Data Mining.Philadelphia:SIAM,2017:90-98. [7]XING H J,HAO Z.Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder[J].Computer Science,2021,48(6):202-209. [8]CHALAPATHY R,CHAWLA S.Deep learning for anomaly detection:A survey[J].arXiv:1901.03407,2019. [9]LAZAREVIC A,KUMAR V.Feature bagging for outlier detection[C]//Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.New York:ACM,2005:157-166. [10]RAYANA S,AKOGLU L.Less is more:building selectiveanomaly ensembles[J].ACM TKDD,2016,10(4):1-33. [11]LIU F T,TING K M,ZHOU Z H.Isolation forest[C]//2008 Eighth IEEE International Conference on Data Mining.Pisca-taway:IEEE,2008:413-422. [12]RAYANA S,ZHONG W,AKOGLU L.Sequential ensemblelearning for outlier detection:A bias-variance perspective[C]//2016 IEEE 16th International Conference on Data Mining (ICDM).Piscataway:IEEE,2016:1167-1172. [13]GAO J,TAN P N.Converting output scores from outlier detection algorithms into probability estimates[C]//Sixth International Conference on Data Mining (ICDM'06).Piscataway:IEEE,2006:212-221. [14]ZIMEK A,GAUDET M,CAMPELLO R J G B,et al.Subsampling for efficient and effective unsupervised outlier detection ensembles[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2013:428-436. [15]ZIMEK A,CAMPELLO R J G B,SANDER J.Data perturbationfor outlier detection ensembles[C]//Proceedings of the 26th International Conference on Scientific and Statistical Database Management.New York:ACM,2014:1-12. [16]PASILLAS-DÍAZ J R,RATTÉ S.Bagged subspaces for unsu-pervised outlier detection[J].Computational Intelligence,2017,33(3):507-523. [17]NGUYEN H V,ANG H H,GOPALKRISHNAN V.Miningoutliers with ensemble of heterogeneous detectors on random subspaces[C]//International Conference on Database Systems for Advanced Applications.Berlin,Heidelberg:Springer,2010:368-383. [18]CAMPOS G O,ZIMEK A,MEIRA W.An unsupervised boosting strategy for outlier detection ensembles[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Cham:Springer,2018:564-576. [19]VAN STEIN B,VAN LEEUWEN M,BÄCK T.Local subspace-based outlier detection using global neighborhoods[C]//2016 IEEE International Conference on Big Data (Big Data).Piscata-way:IEEE,2016:1136-1142. [20]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identi-fying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.New York:ACM,2000:93-104. [21]KRIEGEL H P,KRÖGER P,SCHUBERT E,et al.LoOP:local outlier probabilities[C]//Proceedings of the 18th ACMConfe-rence on Information and Knowledge Management.New York:ACM,2009:1649-1652. [22]RAYANA S.ODDS Library[DB/OL].http://odds.cs.stonybrook.edu,2016/2020-03-15. [23]CAMPOS G O,ZIMEK A,SANDER J,et al.On the evaluation of unsupervised outlier detection:measures,datasets,and an empirical study[J].Data Mining and Knowledge Discovery,2016,30(4):891-927. |
[1] | 徐天慧, 郭强, 张彩明. 基于全变分比分隔距离的时序数据异常检测 Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance 计算机科学, 2022, 49(9): 101-110. https://doi.org/10.11896/jsjkx.210600174 |
[2] | 李其烨, 邢红杰. 基于最大相关熵的KPCA异常检测方法 KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion 计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175 |
[3] | 王馨彤, 王璇, 孙知信. 基于多尺度记忆残差网络的网络流量异常检测模型 Network Traffic Anomaly Detection Method Based on Multi-scale Memory Residual Network 计算机科学, 2022, 49(8): 314-322. https://doi.org/10.11896/jsjkx.220200011 |
[4] | 杜航原, 李铎, 王文剑. 一种面向电商网络的异常用户检测方法 Method for Abnormal Users Detection Oriented to E-commerce Network 计算机科学, 2022, 49(7): 170-178. https://doi.org/10.11896/jsjkx.210600092 |
[5] | 武玉坤, 李伟, 倪敏雅, 许志骋. 单类支持向量机融合深度自编码器的异常检测模型 Anomaly Detection Model Based on One-class Support Vector Machine Fused Deep Auto-encoder 计算机科学, 2022, 49(3): 144-151. https://doi.org/10.11896/jsjkx.210100142 |
[6] | 冷佳旭, 谭明圮, 胡波, 高新波. 基于隐式视角转换的视频异常检测 Video Anomaly Detection Based on Implicit View Transformation 计算机科学, 2022, 49(2): 142-148. https://doi.org/10.11896/jsjkx.210900266 |
[7] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[8] | 郭奕杉, 刘漫丹. 基于时空轨迹数据的异常检测 Anomaly Detection Based on Spatial-temporal Trajectory Data 计算机科学, 2021, 48(6A): 213-219. https://doi.org/10.11896/jsjkx.201100193 |
[9] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[10] | 邢红杰, 郝忠. 基于全局和局部判别对抗自编码器的异常检测方法 Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder 计算机科学, 2021, 48(6): 202-209. https://doi.org/10.11896/jsjkx.200400083 |
[11] | 管文华, 林春雨, 杨尚蓉, 刘美琴, 赵耀. 基于人体关节点的低头异常行人检测 Detection of Head-bowing Abnormal Pedestrians Based on Human Joint Points 计算机科学, 2021, 48(5): 163-169. https://doi.org/10.11896/jsjkx.200800214 |
[12] | 刘立成, 徐一凡, 谢贵才, 段磊. 面向NoSQL数据库的JSON文档异常检测与语义消歧模型 Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database 计算机科学, 2021, 48(2): 93-99. https://doi.org/10.11896/jsjkx.200900039 |
[13] | 邹承明, 陈德. 高维大数据分析的无监督异常检测方法 Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis 计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141 |
[14] | 石琳姗, 马创, 杨云, 靳敏. 基于SSC-BP神经网络的异常检测算法 Anomaly Detection Algorithm Based on SSC-BP Neural Network 计算机科学, 2021, 48(12): 357-363. https://doi.org/10.11896/jsjkx.201000086 |
[15] | 杨月麟, 毕宗泽. 基于深度学习的网络流量异常检测 Network Anomaly Detection Based on Deep Learning 计算机科学, 2021, 48(11A): 540-546. https://doi.org/10.11896/jsjkx.201200077 |
|