计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 129-138.doi: 10.11896/jsjkx.240500092

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于自适应图自编码器的离群点检测方法

谭淇尹1, 于炯1,2, 陈子歆1   

  1. 1 新疆大学软件学院 乌鲁木齐 830000
    2 新疆大学信息科学与工程学院 乌鲁木齐 830000
  • 收稿日期:2024-05-22 修回日期:2024-09-20 出版日期:2025-06-15 发布日期:2025-06-11
  • 通讯作者: 于炯(yujiong@xju.edu.cn)
  • 作者简介:(tanqiyin@stu.xju.edu.cn)
  • 基金资助:
    国家自然科学基金(62262064)

Outlier Detection Method Based on Adaptive Graph Autoencoder

TAN Qiyin1, YU Jiong1,2, CHEN Zixin1   

  1. 1 School of Software Engineering,Xinjiang University,Urumqi 830000,China
    2 College of Information Science and Engineering,Xinjiang University,Urumqi 830000,China
  • Received:2024-05-22 Revised:2024-09-20 Online:2025-06-15 Published:2025-06-11
  • About author:TAN Qiyin,born in 2000,postgraduate.Her main research interests include machine learning and anomaly detection.
    YU Jiong,born in 1965,Ph.D,professor.His main research interests include distributed computing,machine lear-ning and data mining.
  • Supported by:
    National Natural Science Foundation of China(62262064).

摘要: 离群点检测(Outlier Detection)是通过识别数据集中不同于大多数样本的少量个体来获取数据的整体健康状态与异常信息。目前,在处理欧氏结构数据集时,大部分检测算法侧重于将数据视为独立的个体,却忽视了数据实例之间的相关性。这种信息偏向性导致了一些可能位于正常数据区域内的潜在的离群值难以被有效检测出来。针对上述问题,提出了一种基于自适应邻居的图自动编码器的深度联合表示学习算法ANGAE(Adaptive Neighbor Graph Autoencoder)。该算法从图生成的角度构建图来捕捉数据点之间的关系,并利用结构和属性自动编码器学习数据的潜在表示。ANGAE引入了自适应邻居构图机制,以动态更新图结构,确保在模型训练过程中对不准确的图结构进行调整和改进。通过融合结构嵌入和属性嵌入,ANGAE实现了网络结构和节点属性之间的有效交互。实验结果表明,所提出的方法在11个数据集上表现优异,在保持高精度的同时展现了很好的鲁棒性,其有效性得到了充分证明。

关键词: 离群点检测, 深度学习, 图卷积网络, 图表示学习, 属性网络

Abstract: Outlier detection involves identifying a small number of individuals in a dataset that differ from the majority of samples,thereby obtaining insights into the overall health and abnormal information of the data.Currently,in the context of Euclidean structured datasets,most detection algorithms predominantly treat data as independent entities,overlooking the correlations between data instances.This informational bias hinders the effective identification of potential outliers that might exist within the normal data regions.To address this issue,this paper proposes a deep joint representation learning algorithm named adaptive neighbor graph autoencoder(ANGAE).This algorithm constructs a graph from the perspective of graph generation to capture the relationships between data points and leverages structural and attribute autoencoders to learn latent representations of the data.ANGAE introduces an adaptive neighbor graph construction mechanism to dynamically update the graph structure,ensuring the adjustment and improvement of inaccurate graph structures during model training.By integrating structural embeddings and attribute embeddings,ANGAE facilitates effective interaction between network structure and node attributes.Experimental results demonstrate that the proposed method achieves superior performance across 11 datasets,maintaining high precision while exhibiting robust resilience,thereby substantiating the method's efficacy.

Key words: Outlier detection, Deep learning, Graph convolutional networks, Graph representation learning, Attribute networks

中图分类号: 

  • TP391.4
[1]PANG G,SHEN C,CAO L,et al.Deep Learning for Anomaly Detection:A Review[J].ACM Computing Surveys,2021,54(2):38:1-38:38.
[2]BAO Y,KE B,LI B,et al.Detecting Accounting Fraud in Publicly Traded U.S.Firms Using a Machine Learning Approach[J].Journal of Accounting Research,2020,58(1):199-235.
[3]AL-HASHEDI K G,MAGALINGAM P.Financial fraud detection applying data mining techniques:A comprehensive review from 2009 to 2019[J].Computer Science Review,2021,40:100402.
[4]SAHOO S R,GUPTA B B.Multiple features based approach for automatic fake news detection on social networks using deep learning[J].Applied Soft Computing,2021,100:106983.
[5]ZHANG X,GHORBANI A A.An overview of online fakenews:Characterization,detection,and discussion[J].Information Processing & Management,2020,57(2):102025.
[6]SAFIAN A,WU N,LIANG X.Development of an embedded piezoelectric transducer for bearing fault detection[J].Mechanical Systems and Signal Processing,2023,188:109987.
[7]YAKHNI M F,CAUET S,SAKOUT A,et al.Variable speedinduction motors' fault detection based on transient motor current signatures analysis:A review[J].Mechanical Systems and Signal Processing,2023,184:109737.
[8]LI C T,TSAI Y C,CHEN C Y,et al.Graph Neural Networks for Tabular Data Learning:A Survey with Taxonomy and Directions[J].arXiv:2401.02143,2024.
[9]YANG X,LATECKI L J,POKRAJAC D.Outlier Detectionwith Globally Optimal Exemplar-Based GMM[M]//Procee-dings of the 2009 SIAM International Conference on Data Mining(SDM).Society for Industrial and Applied Mathematics,2009:145-154.
[10]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.New York:Association for Computing Machinery,2000:93-104.
[11]JIANG S Y,AN Q B.Clustering-Based Outlier Detection Me-thod[C]//2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.2008:429-433.
[12]PAPADIMITRIOU S,KITAGAWA H,GIBBONS P B,et al.LOCI:fast outlier detection using the local correlation integral[C]//Proceedings 19th International Conference on Data Engineering(Cat.No.03CH37405).2003:315-326.
[13]IKOTUN A M,EZUGWU A E,ABUALIGAH L,et al.K-means clustering algorithms:A comprehensive review,variants analysis,and advances in the era of big data[J].Information Sciences,2023,622:178-210.
[14]DENG D.DBSCAN Clustering Algorithm Based on Density[C]//2020 7th International Forum on Electrical Engineering and Automation(IFEEA).2020:949-953.
[15]CERVANTES J,GARCIA-LAMONT F,RODRÍGUEZ-MAZAHUA L,et al.A comprehensive survey on support vector machine classification:Applications,challenges and trends[J].Neurocomputing,2020,408:189-215.
[16]LIU F T,TING K M,ZHOU Z H.Isolation Forest[C]//2008 Eighth IEEE International Conference on Data Mining.2008:413-422.
[17]PANG G,CAO L,AGGARWAL C.Deep Learning for Anomaly Detection:Challenges,Methods,and Opportunities[C]//Proceedings of the 14th ACM International Conference on Web Search and Data Mining.New York:Association for Computing Machinery,2021:1127-1130.
[18]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[19]GIRIN L,LEGLAIVE S,BIE X,et al.Dynamical VariationalAutoencoders:A Comprehensive Review[J].Foundations and Trends© in Machine Learning,2021,15(1/2):1-175.
[20]LIU Y,LI Z,ZHOU C,et al.Generative Adversarial ActiveLearning for Unsupervised Outlier Detection[J].IEEE Transactions on Knowledge and Data Engineering,2020,32(8):1517-1528.
[21]DU X,CHEN J,YU J,et al.Generative adversarial nets for unsupervised outlier detection[J].Expert Systems with Applications,2024,236:121161.
[22]WU Z,PAN S,CHEN F,et al.A Comprehensive Survey onGraph Neural Networks[J].IEEE Transactions on Neural Networks and Learning Systems,2021,32(1):4-24.
[23]KHAN W,AL E.An Exhaustive Review on State-of-the-artTechniques for Anomaly Detection on Attributed Networks[J].Turkish Journal of Computer and Mathematics Education,2021,12(10):6707-6722.
[24]DING K,LI J,LIU H.Interactive Anomaly Detection on Attributed Networks[C]//Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining.New York:Association for Computing Machinery,2019:357-365.
[25]DING K,LI J,BHANUSHALI R,et al.Deep Anomaly Detection on Attributed Networks[C]//Proceedings of the 2019 SIAM International Conference on Data Mining(SDM).Society for Industrial and Applied Mathematics,2019:594-602.
[26]LI Y,HUANG X,LI J,et al.SpecAE:Spectral AutoEncoder for Anomaly Detection in Attributed Networks[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.New York:Association for Computing Machinery,2019:2233-2236.
[27]NIE F,WANG X,HUANG H.Clustering and projected clustering with adaptive neighbors[C]//Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.New York:Association for Computing Machinery,2014:977-986.
[28]LONSO-GONZÁLEZ M,DÍAZ V G,PÉREZ B L,et al.Bearing Fault Diagnosis With Envelope Analysis and Machine Learning Approaches Using CWRU Dataset[J].IEEE Access,2023,11:57796-57805.
[29]AN S,HU X,HUANG H,et al.ADBench:Anomaly Detection Benchmark[J].Advances in Neural Information Processing Systems,2022,35:32142-32159.
[30]ODGE A,HOOI B,NG S K,et al.LUNAR:Unifying Local Outlier Detection Methods via Graph Neural Networks[J].Procee-dings of the AAAI Conference on Artificial Intelligence,2022,36(6):6737-6745.
[31]YUAN X,ZHOU N,YU S,et al.Higher-order Structure Based Anomaly Detection on Attributed Networks[C]//2021 IEEE International Conference on Big Data(Big Data).2021:2691-2700.
[32]ZHAO Y,NASRULLAH Z,LI Z.PyOD:A Python Toolbox for Scalable Outlier Detection[J].Journal of Machine Learning Research,2019,20(96):1-7.
[33]LIU K,DOU Y,DING X,et al.PyGOD:A Python Library for Graph Outlier Detection[J].Journal of Machine Learning Research,2024,25(141):1-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!