Computer Science ›› 2015, Vol. 42 ›› Issue (5): 230-233.doi: 10.11896/j.issn.1002-137X.2015.05.046

Previous Articles     Next Articles

Density-based Outlier Detection on Uncertain Data

HONG Sha, LIN Jia-li and ZHANG Yue-liang   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Based on local information,a new outlier detection algorithm was designed to calculate density-based uncertain local outlier factor (ULOF) for each point in an uncertain dataset.Firstly,by establishing the possible world model,we calculated the probability of possible word for uncertain data.Then we combined the traditional LOF algorithm to derivate the ULOF algorithm formula,and judged the degree outlier of each data according to the ULOF value.We also did a detailed analysis for efficiency and accuracy of ULOF algorithm.At the same time,we proposed gird-based pruning strategy and k-nearest neighborhood query optimization to reduce the candidate dataset.At last the results of several experiments on synthetic data demonstrate the feasibility and effectiveness of the proposed approach.Optimized NLOF algorithm can improve the outlier detection accuracy,reduce the time complexity and improve the performance of outlier detection on uncertain data.

Key words: Uncertain data,Local outlier detection,Possible world model,k-nearest neighborhood

[1] Garces H,Sbarbaro D.Outliers Detection in Environmental Monitoring Databases [J].Engineering Applications of Artificial Intelligence,2011,24(2):341-349
[2] Jampani R,Xu F,Wu M.A Monte Carlo Approach to Managing Uncertain Data [C]∥Proc.SIGMOD,2008:687-700
[3] Widom J.Trio:A System for Integrated Management of Data,Accuracy,and Lineage [C]∥Proc.of the Second Biennial Conference on Innovative Data Systems Research.Asilomar,2005:262-276
[4] Li F F,Yi K,Jestes J.Ranking Distributed Probabilistic Data[C]∥Proc.SIGMOD Conference.ACM New York,NY,USA 2009:361-374
[5] 张晓峰,王丽珍,陆叶.一种基于属性加权的不确定K-means聚类算法[J].计算机研究与发展,2009,46(10):504-508
[6] Tsang S,Kao B,Yip K Y.Decision Trees for Uncertain Data[C]∥The 25th International Conference on Data Engineering New Jersey :IEEE Press,2009:441-444
[7] Kriegel H P,Pfeifle M.Density-based Clustering of UncertainData[C]∥ACM Knowledge Discovery and Data Mining.ACM Press,2005:672-677
[8] Aggarwal C C.Managing and Mining Uncertain Data[J].Advances in Database Systems,2009(35):75-89
[9] Ngai W K,Kao B,Chui C K,et al.Efficient Clustering of Uncertain Data[C]∥ICDM,IEEE Computer Society,2006:436-445
[10] Qin B,Xia Y,Li F.A Bayesian Classifier for Uncertain Data[C]∥SAC,ACM,2010:1010-1014
[11] 于浩,王斌,肖刚,等.基于距离的不确定离群点检测[C]∥NDBC2009(第26届中国数据库学术会议论文集(A集,2009))2009:15-18,143-150
[12] Charu C,Aggarwal,Philip S Y.Outlier Detection with Uncertain Data [R].IBM T.J,Watson Research Center.2008
[13] Wang B,Xiao G,Yu H,et al.Distance-based Outlier Detectionon Uncertain Data[C]∥CIT (1).IEEE Computer Society,2009:293-298
[14] Liu B,Yin J,Xiao Y,et al.Exploiting Local Data Uncertainty to Boost Global Outlier Detection[C]∥ICDM.IEEE Computer Society,2010:304-313
[15] Jiang B,Pei J.Outlier Detection on Uncertain Data:Objects,Instances,and Inferences[C]∥ICDE.IEEE Computer Society,2011:422-433
[16] Liu Jing,Deng Hui-fang.Outlier Detection on Uncertain Data Based on Local Information [J].Knowledge-based System,2013,7(51):60-71
[17] 李健,阎保平,李俊.基于记忆效应的局部异常检测算法[J].计算机工程,2008,4(12):4-6

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!