计算机科学 ›› 2021, Vol. 48 ›› Issue (11): 208-218.doi: 10.11896/jsjkx.201000097

• 数据库&大数据&数据科学 • 上一篇    下一篇

MLCPM-UC:一种基于模式实例分布均匀系数的多级co-location模式挖掘算法

刘新斌, 王丽珍, 周丽华   

  1. 云南大学信息学院 昆明650500
  • 收稿日期:2020-10-15 修回日期:2021-01-25 出版日期:2021-11-15 发布日期:2021-11-10
  • 通讯作者: 王丽珍(lzhwang@ynu.edu.cn)
  • 作者简介:xbliu126@163.com
  • 基金资助:
    国家自然科学基金项目(61966036,61662086,61762090);云南省创新团队基金项目(2018HC019);云南大学研究生科研创新基金项目(2020315)

MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution

LIU Xin-bin, WANG Li-zhen, ZHOU Li-hua   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2020-10-15 Revised:2021-01-25 Online:2021-11-15 Published:2021-11-10
  • About author:LIU Xin-bin,born in 1996,postgra-duate.His main research interests include spatial data mining and parallel computing.
    WANG Li-zhen,born in 1962,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.Her main research interests include spatial data mining,interactive data mining,big data analytics and their applications,etc.
  • Supported by:
    National Natural Science Foundation of China(61966036,61662086,61762090),Project of Innovative Research Team of Yunnan Province of China(2018HC019) and Yunnan University Graduate Research and Innovation Fund Project(2020315).

摘要: 空间co-location(并置)模式是一组空间特征的子集,其实例在空间中频繁地邻近出现。由于空间数据同时存在关联性和异质性,co-location模式实例的分布或在整个研究区域中全局出现(全局co-location模式),或在研究区域的局部区域出现(区域co-location模式),从而提出了多级co-location模式挖掘。当前的多级co-location模式挖掘方法存在两个问题:1)已有的多级co-location模式挖掘方法忽略了模式在空间中的分布特性,未能准确区分全局和区域co-location模式;2)已有的多级模式挖掘方法将全局非频繁co-location模式作为候选区域co-location模式,导致候选区域co-location模式数量过多。针对以上问题,首先,定义了模式的实例分布均匀系数,在考虑模式频繁性的同时考虑了模式在空间中的分布情况,从而正确、高效地识别出全局和区域co-location模式。其次,基于模式的实例分布均匀系数,设计了一个有效的多级co-location模式挖掘算法,提出了有效的剪枝策略以提高算法效率。最后,在真实和合成数据集上进行了广泛的实验,验证了所提方法的正确性和高效性。

关键词: 多级co-location模式, 均匀系数, 空间数据挖掘, 空间异质性

Abstract: The spatial co-location pattern is a set of spatial features,and the instances frequently appear together in the spatial region.Due to the correlation and heterogeneity of spatial data,the distribution of co-location instances may appear globally in the whole research area (global co-location pattern),or appear in a local area of the research area (regional co-location pattern),Thus the multi-level co-location pattern mining is proposed.There are two problems with current multi-level co-location pattern mining methods:1)the existing multi-level co-location pattern mining methods ignore the spatial distribution characteristics of patterns and fail to accurately distinguish global and regional co-location patterns;2)the existing multi-level pattern mining method uses global non-prevalent co-location patterns as candidate regional co-location patterns,and the number of candidate patterns is too large.In response to the above problems,firstly,we define the uniform coefficient of the instance distribution of the co-location pattern and consider the pattern distribution in space while considering the pattern prevalence,so as to correctly and efficiently identify the global and regional co-location patterns.Secondly,a novel multi-level co-location pattern mining algorithm is designed based on the uniformity coefficient of the instance distribution of the pattern.In this algorithm,an effective pruning strategy is proposed to improve the efficiency of the algorithm.Finally,extensive experiments are carried out on real and synthetic data sets,which verify the correctness and efficiency of the proposed method.

Key words: Multi-level co-location pattern, Spatial data mining, Spatial heterogeneity, Uniform coefficient

中图分类号: 

  • TP311
[1]GOREAUD F,PÉLISSIER R.Avoiding misinterpretation ofbiotic interactions with the intertype K12-function:population independence vs.random labelling hypotheses[J].Journal of Vegetation Science,2003,14(5):681-692.
[2]PHILLIPS P,LEE I.Mining co-distribution patterns for largecrime datasets[J].Expert Systems with Applications,2012,39(14):11556-11563.
[3]YUE H,ZHU X,YE X,et al.The local co-location patterns of crime and land-use features in Wuhan,China[J].ISPRS International Journal of Geo-Information,2017,6(10):307.
[4]SAINJU A M,AGHAJARIAN D.Parallel grid-based co-loca-tion mining algorithms on GPUs for big spatial event data[J].IEEE Transactions on Big Data,2020,6:107-118.
[5]YU W H.Spatial co-location pattern mining of facility points-of-interest improved by network neighborhood and distance decay effects[J].International Journal of Geographical Information Science,2017,31(2):280-296.
[6]LI Y,SHEKHAR S.Local co-location pattern detection:a summary of results[C]//Proceedings of the 10th International Conference on Geographic Information Science (GIScience 2018).Melbourne,Australia,2018:1-15.
[7]LIU Q,LIU W,DENG M,et al.An adaptive detection of multilevel co-location patterns based on natural neighborhoods[J].International Journal of Geographical Information Science,2020(5):1-26.
[8]DENG M.Multi-level method for discovery of regional co-location patterns[J].International Journal of Geographical Information Science,2017,31(9):1846-1870.
[9]SHEKHAR S,HUANG Y.Discovering spatial co-location patterns:a summary of results[C]//Advances in Spatial & Temporal Databases,International Symposium,SSTD.Redondo Beach,CA,USA,2001.
[10]MORIMOTO Y.Mining frequent neighboring class sets inspatial databases[C]//Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2001:353-358.
[11]YOO J S,SHEKHAR S.A partial join approach for mining co-location patterns[C]//Proceedings of the 12th ACM International Symposium on Advances in Geographic Information Systems.New York:ACM,2004:241-249.
[12]YOO J S,SHEKHAR S.A joinless approach for mining spatial colocation patterns[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(10):1323-1337.
[13]XIAO X Y.Density based co-location pattern discovery[C]//Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.ACM,2008:102-114.
[14]WANG L.An order-clique-based approach for mining maximalco-locations[J].Information Sciences,2009,179(19):3370-3382.
[15]YAO X J.A fast space-saving algorithm for maximal co-location pattern mining[J].Expert Systems with Applications,2016,63:310-323.
[16]YOO J S,BOULWARE D,KIMMEY D.A parallel spatial co-location mining algorithm based on MapReduce[C]//Proceedings of the 3rd IEEE International Congress on Big Data.New York:IEEE,2014:25-31.
[17]YAO X J.A co-location pattern-mining algorithm with a density-weighted distance thresholding consideration[J].Information Sciences,2017,396:144-161.
[18]CAI J N.Nonparametric significance test for discovery of network-constrained spatial co-location patterns[J].Geographical Analysis,2019,51 (1):3-22.
[19]ZHOU M.A visualization approach for discovering colocation patterns[J].International Journal of Geographical Information Science,2019,33 (3):567-592.
[20]WAN Y,ZHOU J.KNFCOM-T:a k-nearest features-based co-location pattern mining algorithm for large spatial data sets by using T-trees[J].International Journal of Business Intelligence and Data Mining,2008,3(4):375-389.
[21]SUNDARAM V M,THNAGAVELU A,PANEER P.Discovering co-location patterns from spatial domain using a Delaunay approach[J].Procedia Engineering,2012,38:2832-2845.
[22]CELIK M,KANG J M,SHEKHAR S.Zonal co-location patterndiscovery with dynamic parameters[C]//Proceedings of the 7th IEEE International Conference on Data Mining.IEEE,2007:28-31.
[23]QIAN F.Mining regional co-location patterns with kNNG[J].Journal of Intelligent Information Systems,2014,42(3):485-505.
[24]CAI J N.Adaptive detection of statistically significant regionalspatial co-location patterns[J].Computers,Environment and Urban Systems,2018,68:53-63.
[25]FANG Y.Mining high quality spatial co-location patterns[D].Kunming:Yunnan University,2018.
[26]ZHAO J S.Research on mining spatial co-location patternsbased on region partition[D].Kunming:Yunnan University,2018.
[27]ZHAO J,WANG L,YANG P,et al.Mining high utility co-location patterns based on importance of spatial region[C]//Proceedings of the International Conference on Geo-Spatial Know-ledge and Intelligence.Singapore:Springer,2017:720-731.
[28]LIANG Z L,YUAN C A,QIN X,et al.Hot regzion mining appoach based on improved specral clustering[J].Journal of Chongqing University of Technology(Natural Science),2021,35(1):129-137.
[29]ZHANG P Z,ZHANG H Y.A review of features and labels dimensionality reduction methods of multi label data[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2020,37(5):23-29.
[1] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[2] 周剑云,王丽珍,杨增芳.
基于加权欧氏距离的空间Co-location模式挖掘算法研究
Algorithm of Mining Spatial Co-location Patterns Based on Weighted Euclidean Distance
计算机科学, 2014, 41(Z6): 425-428.
[3] 崔阳,杨炳儒.
超图在数据挖掘领域中的几个应用
Application of Hypergraph in Data Mining
计算机科学, 2010, 37(6): 220-222.
[4] 胡彩平 秦小麟.
空间数据挖掘研究综述

计算机科学, 2007, 34(5): 14-19.
[5] 郭平 范丽 叶莲.
空间规则的可视化解释

计算机科学, 2004, 31(5): 169-171.
[6] 何彬彬 方涛 郭达志.
基于不确定性的空间聚类

计算机科学, 2004, 31(11): 196-198.
[7] 甄彤 范艳峰.
基于Agent的分布式空间数据挖掘模型及实现

计算机科学, 2004, 31(10): 96-97.
[8] 肖予钦 景宁 吴秋云 钟志农.
空间数据挖掘关键问题研究

计算机科学, 2003, 30(9): 49-53.
[9] 文俊浩 李立新 吴中福 吴红艳.
基于邻接关系的空间趋势检测算法研究

计算机科学, 2003, 30(12): 123-125.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!