计算机科学 ›› 2026, Vol. 53 ›› Issue (1): 115-127.doi: 10.11896/jsjkx.241000163
宋亦静, 张继福
SONG Yijing, ZHANG Jifu
摘要: 属性分组是高维离群检测的有效途径之一,但现有的属性组离群检测集成策略仅利用了各属性组内的局部离群信息,忽略了属性组的全局离群信息,导致属性组离群信息集成出现偏差。为此,利用属性组局部与全局离群信息,提出了一种基于隔离森林集成策略的分类型属性分组离群检测方法。该方法根据属性之间的相关性,将属性自动划分为若干属性组,获得数据对象在各属性组中的离群信息;理论分析了现有离群信息集成策略存在集成偏差,并定义了属性组集成偏差系数;利用隔离森林设计了一种离群信息集成策略,有效地刻画了属性组局部与全局离群信息,降低了属性组离群检测集成偏差,并在此基础上提出了一种分类型属性分组离群检测算法。实验结果表明,与对比方法相比,该算法的 AUC 指标、效率分别平均提高了7.83%和48.43%。
中图分类号:
| [1]ZHANG J F,LI Y H,QIN X,et al.Related-Subspace- Based Local Outlier Detection Algorithm Using MapReduce[J].Ruan Jian Xue Bao/Journal of Software,2015,26(5):1079-1095. [2]MAXIMILIAN T,BERNHARD C G,ROMAN K.CLUSTERPurging:Efficient Outlier Detection Based on Rate-Distortion Theory[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(2):1270-1282. [3]SAJANRAJ T,JASISON P M,RAGHAVENDRA S.Opera-tional pattern forecast improvement with outlier detection in metro rail transport system[J].Multimedia Tools and Applications,2024,83(4):11229-11245. [4]FANG J Z,WANG Z D,LIU W B,et al.A New Particle Swarm Optimization Algorithm for Outlier Detection:Industrial Data Clustering in Wire Arc Additive Manufacturing[J].IEEE Transactions on Automation Science and Engineering,2024,21(2):1244-1257. [5]HUANG J Z,ZHAO Y,MENG B,et al.SEAOP:a statisticalensemble approach for outlier detection in quantitative proteomics data[J].Briefings in Bioinformatics,2024,25(3):bbae129. [6]SINA D,ZEINAB T,NEGIN D.An outlier detection method based on the hidden Markov model and copula for wireless sensor networks[J].Wireless Networks,2024,30(6):4797-4810. [7]HOSSEIN M,MOHAMMAD J,HAMID R D,et al.RODEO:Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples[C]//Forty-first International Conference on Machine Learning.2024:21-27. [8]ANTONELLA M,DAVID M,MANUELE B.Detecting outliers from pairwise proximities:Proximity isolation forests[J].Pattern Recognition,2023,138,109334. [9]MAXIMILIAN T,BERNHARD C G,ROMAN K.Cluster Purging:Efficient Outlier Detection Based on Rate-Distortion Theory[J].IEEE Transactions on Knowledge and Data Engineering,2023,l.35(2):1270-1282. [10]PANG G S,XU H Z,GAO L B,et al.Selective Value Coupling Learning for Detecting Outliers in High Dimensional Categorical Data[C]//International Conference on Information and Know-ledge Management.2023:807-816. [11]LI J L,ZHANG J F,PANG N.Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2020,50(11):4295-4308. [12]AKANKSHA M,RAJEEV K.Combination fairness with scores in outlier detection ensembles[J].Information Sciences,2023,645:119337. [13]AGGARWAL C C.Outlier ensembles:position paper[J].SIGKDD Explorations,2013,14(2):49-58. [14]AGGARWAL C C,SATHE S.Theoretical foundations algo-rithms for outlier ensembles[J].SIGKDD Explorations,2015,17(1):24-47. [15]ZIMEK A,CAMPELLO R,SANDER J.Ensembles for unsupervised outlier detection:challenges and research questions a position paper[J].SIGKDD Explorations,2013,15(1):11-22. [16]HOU S Y,JIANG G X,WANG W J.A Label Noise Filtering Method Based on Relative Outlier Factor[J].ACTA AUTOMATICA SINICA,2024,50(1):1-15. [17]CAI S H,HUANG R B,CHEN J F.An effificient outlier detection method for data streams based onclosed frequent patterns by considering antimonotonic constraints[J].Information Scien-ces,2021,555:125-146. [18]JAVIER M,MARA C R,BERTRAND N.A review of recent approaches on wrapper feature selection for intrusion detection[J].Expert Systems with Applications,2022,198:116822. [19]LIU C,PENG D Z,CHEN H M,et al.Attribute granules-based object entropy for outlier detection in nominal data[J].Engineering Applications of Artificial Intelligence,2024,133:108198. [20]TANG J,QU M,WANG M Z.LINE:Large-scale Information Network Embedding[C]//Proceedings of the 24th International Conference on World Wide Web.2015:18-22. [21]DINO I,RUGGERO G,ROSA M.A Semisupervised Approach to the Detection and Characterization of Outliers in Categorical Data[J].IEEE Transactions on Neural Networks and Learning Systems,2017,28(5):1017-1029. [22]PANG G S,GAO L,CHEN L.Outlier Detection in ComplexCategorical Data by Modeling the Feature Value Couplings[C]//International Joint Conference on Artificial Intelligence.2016:1902-1908. [23]PANG G S,GAO L,CHEN L.Homophily outlier detection innon-IID categorical data[J].Data Mining and Knowledge Discovery,2021,35(4):1163-1224. [24]XU H Z,WANG Y J,WU Z Y,et al.Embedding-Based Complex Feature Value Coupling Learning for Detecting Outliers in Non- IID Categorical Data[C]//AAAI Conference on Artificial Intelligence.2019:5541-5548. [25]ZHANG X Y,DOU W H,HE Q,et al.Lshiforest:A generic framework for fast tree isolation based ensemble anomaly analysis[J].IEEE International Conference on Data Engineering.2017:983-994. [26]XIANG H L,ZHANG X Y,HU H S,et al.OptIForest:Optimal Isolation Forest for Anomaly Detection[C]//International Joint Conference on Artificial Intelligence.2023:2379-2387. [27]AU W B,KEITH C C,ANDREW W,et al.AttributeClustering for Grouping,Selection,and Classification of Gene Expression Data[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2007,4(1):157. [28]ZHENG L,CHAO F,PARTHALÁIN N M,et al.Featuregrouping and selection:A graph-based approach[J].Information Sciences,2021,546:1256-1272. [29]TANG X C,DAI Y W,SUN P,et al.Interaction-based featureselection using Factorial Design[J].Neurocomputing,2018,281:47-54. [30]AKANKSHA M,RAJEEV K.Building outlier detection ensembles by selective parameterization of heterogeneous methods[J].Pattern Recognition Letters,2021,146:126-133. [31]LIU H Y,MA F D,HE S B,et al.Fairness-aware outlier ensemble[J].arXiv:2103.09419,2021. [32]CHEN X J,YE Y M,XU X F,et al.A feature group weighting method for subspace clustering of high-dimensional data[J].Pattern Recognition,2012,45(1):434-446. [33]FENG Y,ZHAO S Y,ZHANG Y Z,et al.Noise-TolerantLearning with Silhouette Coefficient for Unsupervised Person ReIdentification[C]//IEEE International Conference on Multimedia and Expo.2022:1-6. [34]SAHAND H,MATIAS C K,ROBERT J B.Extended Isolation Forest[J].IEEE Transactions on Knowledge and Data Engineering,2021,33(4):1479-1489. [35]LIU F,TING K,ZHOU Z H.Isolation forest[C]//IEEE International Conference on Data Mining.2008:413-422. |
|
||