Computer Science ›› 2026, Vol. 53 ›› Issue (1): 115-127.doi: 10.11896/jsjkx.241000163

• Database & Big Data & Data Science • Previous Articles     Next Articles

Attribute Grouping-based Categorical Outlier Detection Using Isolation Forest Ensemble Strategy

SONG Yijing, ZHANG Jifu   

  1. School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Received:2024-10-29 Revised:2025-02-14 Published:2026-01-08
  • About author:SONG Yijing,born in 1992,Ph.D candidate,is a member of CCF(No.I9481G).Her main research interest is data mining.
    ZHANG Jifu,born in 1963,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.05740D).His main research interests include big data analysis and parallel computing.
  • Supported by:
    National Natural Science Foundation of China(62172293).

Abstract: Attribute grouping is one of the effective steps in high-dimensional outlier detection,but the current ensemble strategies in attribute grouping-based outlier detection only take into account the local outlier information within each attribute group,and ignore the global outlier information of all attribute groups,which can lead to a biased ensemble of attribute group outlier information.This paper proposes an attribute grouping outlier detection approach based on Isolated Forest ensemble strategy by using the local and global outlier information of attribute groups.Firstly,attributes are automatically divided into several attribute groups based on the local and global correlation among attributes,and the outlier information of data objects is obtained in each attribute group.Secondly,from the perspective of attribute grouping,the ensemble bias of the current outlier information ensemble strategy is theoretically analyzed,and the ensemble deviation coefficient are defined as the evaluation index of the outlier information ensemble strategy.Then an attribute grouping-based isolation forest ensemble strategy for categorical outlier detection is proposed,this strategy effectively depicts the local and global outlier information of attribute groups and lowers the ensemble bias of attribute group outlier detection.In the end,experimental results on the UCI validate that the ensemble strategy effectively alleviates the ensemble bias and improves the outlier detection performance.Importantly,compared with the competing methods,the algorithm bolsters the AUC index and the detection efficiency by averages of 7.83% and 48.43%.

Key words: Outlier detection, Attribute grouping, Ensemble deviation coefficient, Isolated Forest ensemble strategy, Outlier information of global paths

CLC Number: 

  • TP311
[1]ZHANG J F,LI Y H,QIN X,et al.Related-Subspace- Based Local Outlier Detection Algorithm Using MapReduce[J].Ruan Jian Xue Bao/Journal of Software,2015,26(5):1079-1095.
[2]MAXIMILIAN T,BERNHARD C G,ROMAN K.CLUSTERPurging:Efficient Outlier Detection Based on Rate-Distortion Theory[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(2):1270-1282.
[3]SAJANRAJ T,JASISON P M,RAGHAVENDRA S.Opera-tional pattern forecast improvement with outlier detection in metro rail transport system[J].Multimedia Tools and Applications,2024,83(4):11229-11245.
[4]FANG J Z,WANG Z D,LIU W B,et al.A New Particle Swarm Optimization Algorithm for Outlier Detection:Industrial Data Clustering in Wire Arc Additive Manufacturing[J].IEEE Transactions on Automation Science and Engineering,2024,21(2):1244-1257.
[5]HUANG J Z,ZHAO Y,MENG B,et al.SEAOP:a statisticalensemble approach for outlier detection in quantitative proteomics data[J].Briefings in Bioinformatics,2024,25(3):bbae129.
[6]SINA D,ZEINAB T,NEGIN D.An outlier detection method based on the hidden Markov model and copula for wireless sensor networks[J].Wireless Networks,2024,30(6):4797-4810.
[7]HOSSEIN M,MOHAMMAD J,HAMID R D,et al.RODEO:Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples[C]//Forty-first International Conference on Machine Learning.2024:21-27.
[8]ANTONELLA M,DAVID M,MANUELE B.Detecting outliers from pairwise proximities:Proximity isolation forests[J].Pattern Recognition,2023,138,109334.
[9]MAXIMILIAN T,BERNHARD C G,ROMAN K.Cluster Purging:Efficient Outlier Detection Based on Rate-Distortion Theory[J].IEEE Transactions on Knowledge and Data Engineering,2023,l.35(2):1270-1282.
[10]PANG G S,XU H Z,GAO L B,et al.Selective Value Coupling Learning for Detecting Outliers in High Dimensional Categorical Data[C]//International Conference on Information and Know-ledge Management.2023:807-816.
[11]LI J L,ZHANG J F,PANG N.Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping[J].IEEE Transactions on Systems,Man,and Cybernetics:Systems,2020,50(11):4295-4308.
[12]AKANKSHA M,RAJEEV K.Combination fairness with scores in outlier detection ensembles[J].Information Sciences,2023,645:119337.
[13]AGGARWAL C C.Outlier ensembles:position paper[J].SIGKDD Explorations,2013,14(2):49-58.
[14]AGGARWAL C C,SATHE S.Theoretical foundations algo-rithms for outlier ensembles[J].SIGKDD Explorations,2015,17(1):24-47.
[15]ZIMEK A,CAMPELLO R,SANDER J.Ensembles for unsupervised outlier detection:challenges and research questions a position paper[J].SIGKDD Explorations,2013,15(1):11-22.
[16]HOU S Y,JIANG G X,WANG W J.A Label Noise Filtering Method Based on Relative Outlier Factor[J].ACTA AUTOMATICA SINICA,2024,50(1):1-15.
[17]CAI S H,HUANG R B,CHEN J F.An effificient outlier detection method for data streams based onclosed frequent patterns by considering antimonotonic constraints[J].Information Scien-ces,2021,555:125-146.
[18]JAVIER M,MARA C R,BERTRAND N.A review of recent approaches on wrapper feature selection for intrusion detection[J].Expert Systems with Applications,2022,198:116822.
[19]LIU C,PENG D Z,CHEN H M,et al.Attribute granules-based object entropy for outlier detection in nominal data[J].Engineering Applications of Artificial Intelligence,2024,133:108198.
[20]TANG J,QU M,WANG M Z.LINE:Large-scale Information Network Embedding[C]//Proceedings of the 24th International Conference on World Wide Web.2015:18-22.
[21]DINO I,RUGGERO G,ROSA M.A Semisupervised Approach to the Detection and Characterization of Outliers in Categorical Data[J].IEEE Transactions on Neural Networks and Learning Systems,2017,28(5):1017-1029.
[22]PANG G S,GAO L,CHEN L.Outlier Detection in ComplexCategorical Data by Modeling the Feature Value Couplings[C]//International Joint Conference on Artificial Intelligence.2016:1902-1908.
[23]PANG G S,GAO L,CHEN L.Homophily outlier detection innon-IID categorical data[J].Data Mining and Knowledge Discovery,2021,35(4):1163-1224.
[24]XU H Z,WANG Y J,WU Z Y,et al.Embedding-Based Complex Feature Value Coupling Learning for Detecting Outliers in Non- IID Categorical Data[C]//AAAI Conference on Artificial Intelligence.2019:5541-5548.
[25]ZHANG X Y,DOU W H,HE Q,et al.Lshiforest:A generic framework for fast tree isolation based ensemble anomaly analysis[J].IEEE International Conference on Data Engineering.2017:983-994.
[26]XIANG H L,ZHANG X Y,HU H S,et al.OptIForest:Optimal Isolation Forest for Anomaly Detection[C]//International Joint Conference on Artificial Intelligence.2023:2379-2387.
[27]AU W B,KEITH C C,ANDREW W,et al.AttributeClustering for Grouping,Selection,and Classification of Gene Expression Data[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2007,4(1):157.
[28]ZHENG L,CHAO F,PARTHALÁIN N M,et al.Featuregrouping and selection:A graph-based approach[J].Information Sciences,2021,546:1256-1272.
[29]TANG X C,DAI Y W,SUN P,et al.Interaction-based featureselection using Factorial Design[J].Neurocomputing,2018,281:47-54.
[30]AKANKSHA M,RAJEEV K.Building outlier detection ensembles by selective parameterization of heterogeneous methods[J].Pattern Recognition Letters,2021,146:126-133.
[31]LIU H Y,MA F D,HE S B,et al.Fairness-aware outlier ensemble[J].arXiv:2103.09419,2021.
[32]CHEN X J,YE Y M,XU X F,et al.A feature group weighting method for subspace clustering of high-dimensional data[J].Pattern Recognition,2012,45(1):434-446.
[33]FENG Y,ZHAO S Y,ZHANG Y Z,et al.Noise-TolerantLearning with Silhouette Coefficient for Unsupervised Person ReIdentification[C]//IEEE International Conference on Multimedia and Expo.2022:1-6.
[34]SAHAND H,MATIAS C K,ROBERT J B.Extended Isolation Forest[J].IEEE Transactions on Knowledge and Data Engineering,2021,33(4):1479-1489.
[35]LIU F,TING K,ZHOU Z H.Isolation forest[C]//IEEE International Conference on Data Mining.2008:413-422.
[1] TAN Qiyin, YU Jiong, CHEN Zixin. Outlier Detection Method Based on Adaptive Graph Autoencoder [J]. Computer Science, 2025, 52(6): 129-138.
[2] XING Kaiyan, CHEN Wen. Multi-generator Active Learning Algorithm Based on Reverse Label Propagation and ItsApplication in Outlier Detection [J]. Computer Science, 2024, 51(4): 359-365.
[3] XU Maolong, JIANG Gaoxia, WANG Wenjian. Label Noise Filtering Framework Based on Outlier Detection [J]. Computer Science, 2024, 51(2): 87-99.
[4] LIU Yi, MAO Ying-chi, CHENG Yang-kun, GAO Jian, WANG Long-bao. Locality and Consistency Based Sequential Ensemble Method for Outlier Detection [J]. Computer Science, 2022, 49(1): 146-152.
[5] LIU Li-cheng, XU Yi-fan, XIE Gui-cai, DUAN Lei. Outlier Detection and Semantic Disambiguation of JSON Document for NoSQL Database [J]. Computer Science, 2021, 48(2): 93-99.
[6] ZHONG Ying-yu, CHEN Song-can. High-order Multi-view Outlier Detection [J]. Computer Science, 2020, 47(9): 99-104.
[7] LIU Zhen-peng, SU Nan, QIN Yi-wen, LU Jia-huan, LI Xiao-fei. FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest [J]. Computer Science, 2020, 47(8): 185-188.
[8] LI Chang-jing,ZHAO Shu-liang,CHI Yun-xian. Outlier Detection Algorithm Based on Spectral Embedding and Local Density [J]. Computer Science, 2019, 46(3): 260-266.
[9] FENG Gui-lan, ZHOU Wen-gang. Spark-based Parallel Outlier Detection Algorithm of K-nearest Neighbor [J]. Computer Science, 2018, 45(11A): 349-352.
[10] YING Yi, REN Kai, LIU Ya-jun. Network Log Analysis Technology Based on Big Data [J]. Computer Science, 2018, 45(11A): 353-355.
[11] XU Dong, WANG Yan-jun, MENG Yu-long, ZHANG Zi-ying. Improved Data Anomaly Detection Method Based on Isolation Forest [J]. Computer Science, 2018, 45(10): 155-159.
[12] GOU Jie, MA Zi-tang and ZHANG Zhe-cheng. PODKNN:A Parallel Outlier Detection Algorithm for Large Dataset [J]. Computer Science, 2016, 43(7): 251-254.
[13] HONG Sha, LIN Jia-li and ZHANG Yue-liang. Density-based Outlier Detection on Uncertain Data [J]. Computer Science, 2015, 42(5): 230-233.
[14] JIANG Yuan-kai, ZHENG Hong-yuan and DING Qiu-lin. On Density Based Outlier Detection for Uncertain Data [J]. Computer Science, 2015, 42(4): 172-176.
[15] ZHANG Xian-ji and WANG Lun-wen. Outlier Detection Method Based on Constructive Neural Networks [J]. Computer Science, 2014, 41(7): 297-300.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!