计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 172-176.doi: 10.11896/j.issn.1002-137X.2015.04.034

• 人工智能 • 上一篇    下一篇

一种基于密度的不确定数据离群点检测算法

姜元凯,郑洪源,丁秋林   

  1. 南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016,南京航空航天大学计算机科学与技术学院 南京210016
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受江苏省产学研联合创新资金项目(SBY201320423)资助

On Density Based Outlier Detection for Uncertain Data

JIANG Yuan-kai, ZHENG Hong-yuan and DING Qiu-lin   

  • Online:2018-11-14 Published:2018-11-14

摘要: 不确定数据普遍存在于如移动计算、RFID技术和传感器网络等大量应用之中。由于不确定数据的离群点检测算法可以提高服务质量,提出一种基于密度的不确定数据离群检测算法RLOF。该算法引入一种R2-tree结构,有效降低了计算局部离群因子时的时间复杂度,同时降低了不确定数据集中的数据更新成本以及海量数据维护成本。理论分析和实验结果充分证明了该算法是有效可行的。

关键词: 不确定数据,离群点检测,R2-tree索引,最小充分邻域

Abstract: Uncertain data generally exist in a large number of applications,such as mobile computing,sensor networks and RFID technology.Outliers detection algorithm can improve the quality of these services.An uncertain data outlier detection algorithm based on density RLOF was proposed.This algorithm introduces a R2-tree structure,which effectively reduces the time complexity when calculating local outlier factor.It also reduces the cost of data updating in the uncertain data set and the maintenance cost of a massive data.The theoretical analysis and experimental results fully prove that the algorithm is effective and feasible.

Key words: Uncertain data,Outlier detection,R2-tree index,Minimal sufficient neighborhood

[1] Breunig M M,Kriegel H P,Ng R T,et al.LOF:identifying density-based local outliers[J].ACM Sigmod Record ,2000,29(2):93-104
[2] Tu L,Cui P,Tang K.A Density Grid-Based Clustering Algo-rithm for Uncertain Data Streams[C]∥2013 10th Web Information System and Application Conference (WISA).IEEE,2013:347-350
[3] Chawla S,Gionis A.k-means:A Unified Approach to Clustering and Outlier Detection[C]∥SDM.2013:189-197
[4] Duforet-Frebourg N,Blum M G B.Bayesian Matrix Factoriza-tion for Outlier Detection:An Application in Population Gene-tics[M]∥The Contribution of Young Researchers to Bayesian Statistics.Springer International Publishing,2014:143-147
[5] Cao K,Han D,Wang G,et al.An Algorithm for Outlier Detection on Uncertain Data Stream[M]∥Web Technologies and Applications.Springer Berlin Heidelberg,2013:449-460
[6] Aggarwal C C,Philip S Y.Outlier Detection with Uncertain Data[C]∥SDM.2008:483-493
[7] Yang C,Lin K I.An index structure for efficient reverse nearest neighbor queries[C]∥17th International Conference on Data Engineering,2001.IEEE,2001:485-492
[8] Cao K,Han D,Wang G,et al.An Algorithm for Outlier Detection on Uncertain Data Stream[M]∥Web Technologies and Applications.Springer Berlin Heidelberg,2013:449-460
[9] Hjaltason G R,Samet H.Distance browsing in spatial databases[J].ACM Transactions on Database Systems (TODS),1999,24(2):265-318
[10] Aggarwal C C.On density based transforms for uncertain data mining[C]∥IEEE 23rd International Conference on Data Engineering,2007(ICDE 2007).IEEE,2007:866-875
[11] HU Cai-ping,QIN Xiao-lin.A Density-Based Local Outlier Detecting Algorithm[J].Journal of Computer Research and Deve-lopment,2010(12):2110-2116
[12] Zhou A Y,Jin C Q,Wang G R,et al.A survey on the management of uncertain data[J].Chinese Journal of Computers,2009,32(1):1-16
[13] Yu Hao,Wang Bin,Xiao Gang,et al.Distance-Based Outlier Detection on Uncertain Data[J].Journal of Computer Research and Development,2010,47(3):474-484
[14] Guttman A.R-trees:A dynamic index structure for spatialsearching[M].ACM,1984
[15] Yu Min-min,Cheng Ning-jiang.Algorithm of Improved Top-k Query on Uncertain Data for Requirement Extension[J].Computer Science,2012,39(6):151-154

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!