计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 25-30.doi: 10.11896/jsjkx.210600155
陈圆圆, 王志海
CHEN Yuan-yuan, WANG Zhi-hai
摘要: 对数据流中的潜在信息进行分析和利用是数据流挖掘工作的重要内容。然而,数据的分布会随着时间的推移发生变化,从而使学习假设发生更改,这就是概念漂移现象,它给数据流挖掘带来了巨大的挑战。检测数据分布的变化是一种直接且有效的概念漂移检测方法,目前,已有研究方法基于树型结构或网格结构建立直方图,实现对数据分布的描述,但是,此类方法在进行分布检测时容易产生检验盲点,其可解释性较差,并且在多维数据上的内存消耗较大。文中提出了一种基于等密度分区的概念漂移检测方法PUDC(Partition Based on Uniform Density Clusters),该方法基于改进的k-Means算法,对数据进行等密度分区,利用卡方检验对每个分区进行统计和计算,从而检测数据分布变化,以达到概念漂移检测的目的。为了验证方法的有效性,选取了4个人工数据集和3个真实数据集进行实验,对比分析了不同维度的数据下的I类错误率和II类错误率,实验结果表明,PUDC算法在多维数据流的概念漂移检测中相比几种较新的算法具有一定的优势。
中图分类号:
[1]BARDDAL J P,GOMES H M,ENEMBRECK F,et al.A survey on feature drift adaptation[J].Journal of Systems and Software,2017,127(C):278-294. [2]BEUTEL A,FALOUTSOS C.User behavior modeling andfraud detection[J].IEEE Intelligent Systems,2016,31(2):84-86. [3]MELIDIS D P,SPILIOPOULOU M,NTOUTSI E.Learning under feature drifts in textual streams[C]//Proceedings of the 27th ACM International Conference on Information and Know-ledge Management.2018:527-536. [4]PUSCHMANN D,BARNAGHI P,TAFAZOLLI R.Adaptiveclustering for dynamic IoT data streams[J].IEEE Internet of Things Journal,2016,4(1):64-74. [5]GAMA J,LIOBAITÉ I,BIFET A,et al.A survey on concept drift adaptation[J].ACM Computing Surveys(CSUR),2014,46(4):1-37. [6]LU J,LIU A,DONG F,et al.Learning under concept drift:A review[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(12):2346-2363. [7]HU H,KANTARDZIC M,SETHI T S.No Free Lunch Theorem for concept drift detection in streaming data classification:A review[J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2020,10(2):1327-1351. [8]KUNCHEVA L I.Change Detection in Streaming Multivariate Data Using Likelihood Detectors[J].IEEE Annals of the His-tory of Computing,2013(5):1175-1180. [9]BORACCHI G,CERVELLERA C,MACCIò D.Uniform histograms for change detection in multivariate data[C]//Procee-dings of the International Joint Conference on Neural Networks(IJCNN).IEEE,2017:1732-1739. [10]BORACCHI G,CARRERA D,CERVELLERA C,et al.QuantTree:histograms for change detection in multivariate data streams[C]//Proceedings of the International Conference on Machine Learning.2018:639-648. [11]GAMA J,MEDAS P,CASTILLO G,et al.Learning with drift detection[C]//Proceedings of Brazilian Symposium on Artificial Intelligence.Berlin:Springer,2004:286-295. [12]LIU A,ZHANG G,LU J.Fuzzy time windowing for gradual concept drift adaptation[C]//Proceedings of IEEE International Conference on Fuzzy Systems(FUZZ-IEEE).IEEE,2017:1-6. [13]BIFET A,READ J,LIOBAITÉ I,et al.Pitfalls in benchmar-king data stream classification and how to avoid them[C]//Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases.Berlin:Springer,2013:465-479. [14]BASSEVILLE M,NIKIFOROV I V.Detection of abrupt changes:theory and application[M].Florida:Englewood Cliffs:Prentice Hall,1993. [15]ALIPPI C,BORACCHI G,CARRERA D,et al.Change detection in multivariate datastreams:Likelihood and detectability loss[C]//Proceedings of the International Joint Conference on Artificial Intelligence(IJCAI).SF:Morgan Kaufmann,2016:1368-1374. [16]GAMA J.Knowledge discovery from data streams[M].London:CRC Press,2010. [17]SILVERMAN B W.Density estimation for statistics and data analysis[M].Routledge,2018. [18]SHENG Z,XIE S Q,PAN C Y.Probability theory and mathematical statistics[M].Beijing:Higher Education Press,2008. [19]HE J R,DING L X,HU Q,et al.Properties of high-dimensional data space and Metric choice[J].Journal of Computer Science,2014,3(41):212-217. [20]CARRERA D,BORACCHI G.Generating high-dimensionaldatastreams for change detection[J].Big Data Research,2018,11:11-21. [21]LIU A,LU J,ZHANG G.Concept Drift Detection via Equal Intensity k-Means Space Partitioning[J].IEEE Transactions on Cybernetics,2020,51(6):3198-3211. [22]DOS REIS D M,FLACH P,MATWIN S,et al.Fast unsupervised online drift detection using incremental kolmogorov-smirnov test[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016:1545-1554. |
[1] | 郝洁, 平萍, 付德银, 赵红泽. 压缩差值后的双直方图平移可逆信息隐藏方法 Bi-histogram Shifting Reversible Data Hiding Method After Compressed Differences 计算机科学, 2022, 49(9): 340-346. https://doi.org/10.11896/jsjkx.220300238 |
[2] | 曹扬晨, 朱国胜, 孙文和, 吴善超. 未知网络攻击识别关键技术研究 Study on Key Technologies of Unknown Network Attack Identification 计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044 |
[3] | 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞. 基于节点相似性和网络嵌入的复杂网络社区发现算法 Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding 计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009 |
[4] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[5] | 淡州阳, 刘粉林, 巩道福. 基于差分直方图中尾部信息的平滑滤波检测算法 Smoothing Filter Detection Algorithm Based on Middle and Tail Information of Differential Histogram 计算机科学, 2021, 48(11): 234-241. https://doi.org/10.11896/jsjkx.200900121 |
[6] | 金雨芳, 吴祥, 董辉, 俞立, 张文安. 基于改进YOLO v4的安全帽佩戴检测算法 Improved YOLO v4 Algorithm for Safety Helmet Wearing Detection 计算机科学, 2021, 48(11): 268-275. https://doi.org/10.11896/jsjkx.200900098 |
[7] | 谭建豪, 殷旺, 刘力铭, 王耀南. 采用多相关滤波策略的鲁棒长时自适应目标跟踪 Robust Long-term Adaptive Object Tracking Based onMulti-correlation Filtering Strategy 计算机科学, 2020, 47(12): 169-176. https://doi.org/10.11896/jsjkx.191000021 |
[8] | 丁荣莉, 李杰, 张曼, 刘艳丽, 伍伟. 基于S-HOG的遥感图像舰船目标检测 Ship Target Detection in Remote Sensing Image Based on S-HOG 计算机科学, 2020, 47(11A): 248-252. https://doi.org/10.11896/jsjkx.191200090 |
[9] | 王恰, 戚湧. 基于帧间差分和统计直方图的交通视频背景建模方法 Method for Traffic Video Background Modeling Based on Inter-frame Difference and Statistical Histogram 计算机科学, 2020, 47(10): 174-179. https://doi.org/10.11896/jsjkx.190800014 |
[10] | 郭兰英, 韩睿之, 程鑫. 基于可变形卷积神经网络的数字仪表识别方法 Digital Instrument Identification Method Based on Deformable Convolutional Neural Network 计算机科学, 2020, 47(10): 187-193. https://doi.org/10.11896/jsjkx.191000035 |
[11] | 王晓, 邹泽伟, 李勃勃, 王静. 基于多特征融合的彩色图像声呐目标检测 Target Detection in Colorful Imaging Sonar Based on Multi-feature Fusion 计算机科学, 2019, 46(6A): 177-181. |
[12] | 范蓉蓉, 樊佳庆, 刘青山. 实时高置信度更新补充学习跟踪 Real-time High-confidence Update Complementary Learner Tracking 计算机科学, 2019, 46(3): 137-141. https://doi.org/10.11896/j.issn.1002-137X.2019.03.020 |
[13] | 贾洪杰, 王良君, 宋和平. HMRF半监督近似核k-means算法 HMRF Semi-supervised Approximate Kernel k-means Algorithm 计算机科学, 2019, 46(12): 31-37. https://doi.org/10.11896/jsjkx.190600159 |
[14] | 杨秀璋, 夏换, 于小民. 一种基于水族濒危文字的图像增强及识别方法 Image Enhancement and Recognition Method Based on Shui-characters 计算机科学, 2019, 46(11A): 324-328. |
[15] | 毛峡, 王岚, 李建军. 一种基于RGB-D特征融合的人体行为识别框架 Human Action Recognition Framework with RGB-D Features Fusion 计算机科学, 2018, 45(8): 22-27. https://doi.org/10.11896/j.issn.1002-137X.2018.08.005 |
|