计算机科学 ›› 2018, Vol. 45 ›› Issue (6A): 482-486.

• 大数据与数据挖掘 • 上一篇    下一篇

基于数据规范化的co-location模式挖掘算法

曾新,李晓伟,杨健   

  1. 大理大学数学与计算机学院 云南 大理671003
  • 出版日期:2018-06-20 发布日期:2018-08-03
  • 作者简介:曾 新(1986-),男,硕士,讲师,主要研究方向为空间数据挖掘,E-mail:hbzengxin@163.com;李晓伟(1985-),男,博士,讲师,主要研究方向为信息安全、计算机网络;杨 健(1976-),男,博士,副教授,CCF会员,主要研究方向为云计算、数据安全与隐私保护。
  • 基金资助:
    国家自然科学基金项目(71462001),云南省科技厅应用基础青年项目(2016FD071),云南省教育厅项目(2016ZZX192)资助

Co-location Pattern Mining Algorithm Based on Data Normalization

ZENG Xin,LI Xiao-wei,YANG Jian   

  1. College of Mathematics and Computer,Dali University,Dali,Yunnan 671003,China
  • Online:2018-06-20 Published:2018-08-03

摘要: 在实际应用中,空间特征不仅包含空间信息,其特征实例还伴随着属性信息,这些属性信息对知识发现和科学决策具有重大作用。在现有的co-location模式挖掘算法中,计算两个不同特征实例的邻近距离时并未考虑实例不同属性的取值在邻近距离中所占的权重,导致部分属性权重过大,从而影响co-location模式挖掘的结果。对属性取值进行规范化,赋予所有属性相等的权重,并提出基于join-based的数据规范化算法DNRA;同时,对距离阈值范围难以确定的问题进行了深入研究,推导出DNRA算法中距离阈值的取值范围,为用户选择适当的距离阈值提供帮助。最后,通过大量实验对DNRA算法的性能进行了分析比较。

关键词: co-location模式, 距离阈值, 数据规范化, 属性权重

Abstract: In the practical application,the spatial features not only contain the spatial information,but also the attribute information,which is important for the knowledge discovery and scientific decision.Existing co-location pattern mining algorithms do not consider the weight of instances of different attributes in the adjacent distance when calculating the adjacent distance of two different feature instances.It results in that the weight of partial attribute is too large and also affects the result of the co-location pattern mining.Standardizing the attribute values and giving an equal weight to all attributes,a data standardization algorithm DNRA based on join-based was put forward.Meanwhile,a deep research was given on the problem that the distance threshold was difficult to determine.The range of the distance threshold was derived in DNRA algorithm,helping the users to select the appropriate distance threshold.Finally,the performance of the DNRA algorithm was analyzed and compared by a large number of experiments.

Key words: Attribute weight, Co-location pattern, Data standardization, Distance threshold

中图分类号: 

  • TP311.13
[1]王丽珍,周丽华,陈红梅,等.数据仓库与数据挖掘原理及应用(第2版)[M].北京:科学出版社,2009:1-19.
[2]HAN J,KAMBER M,PEI J.Data mining:concept and tech- niques(Third Edition)[M].Beijing:China Machine Press,2006:1-23.
[3]HUANG Y,SHEKHAR S,XIONG H.Discovering Co-location Patterns from Spatial Data Sets:A General Approach[C]∥IEEE Transactions on knowledge and Data Enginnering (TKDE).2004:1472-1485.
[4]YOO J S,SHEKHAR S.A partial Join Approach for Mining Co-location Patterns[C]∥Proc.of the 12th Annual ACM Int.Workshop on Geographic Information Systems.Washington DC,USA,2004:241-249.
[5]YOO J S,SHEKHAR S,CELIK M.A join-less approach for co-location pattern mining:A summary of results[C]∥Proc.of the 5th IEEE Int.Conf.on Data Mining.Washington:IEEE Computer Society,2005:813-816.
[6]WANG L Z,BAO Y Z,LU J,et al.A new join-less approach for co-location pattern mining[C]∥IEEE International Conference on Computer and Information Technology (CIT2008).Washington,2008:197-202.
[7]WANG L Z,BAO Y Z,LU Z Y.Efficient discovery of spatial co-location patterns using the iCPI-tree[J].The Open Information Systems Journal,2009,3(1):69-80.
[8]WANG L Z,ZHOU L H,LU J,et al.An order-clique-based approach for mining maximal co-locations[J].InformationScien-ces,2009,179(19):3370-3382.
[9]欧阳志平,王丽珍,陈红梅.模糊对象的空间co-location模式挖掘研究[J].计算机学报,2011,34(10):1947-1955.
[10]姚华传,王丽珍,陈红梅,等.面向海量数据的空间co-location模式挖掘新算法[J].计算机科学与探索,2015,9(1):24-35.
[11]吴萍萍,王丽珍,周永恒.带模糊属性的空间co-location模式挖掘研究[J].计算机科学与探索,2013,7(4):348-358.
[12]芦俊丽,王丽珍,肖清,等.空间co-location模式增量挖掘及演化分析[J].软件学报,2014,12(25):190-199.
[13]曾新,杨健.带时间约束的co-location模式挖掘[J].计算机科学,2016,43(2):293-296.
[14]杨世晟,王丽珍,芦俊丽,等.空间高效用co-location模式挖掘技术初探[J].小型微型计算机系统,2014,35(10):2302-2307.
[15]江万国,王丽珍,方圆,等.领域驱动的高效用co-location模式挖掘方法[J].计算机应用,2017,37(2):322-328.
[16]HAN J W,KAMBER M,PEI J.数据挖掘概念与技术(第3版)[M].北京:机械工业出版社,2014:74-76.
[1] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[2] 刘新斌, 王丽珍, 周丽华.
MLCPM-UC:一种基于模式实例分布均匀系数的多级co-location模式挖掘算法
MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution
计算机科学, 2021, 48(11): 208-218. https://doi.org/10.11896/jsjkx.201000097
[3] 张其文,谢艳钊.
基于属性权重的时序模糊软集的群决策方法研究
Group Decision-making Method Research Based on Time-series Fuzzy Soft Sets of Attribute Weights
计算机科学, 2016, 43(12): 88-90. https://doi.org/10.11896/j.issn.1002-137X.2016.12.015
[4] 周剑云,王丽珍,杨增芳.
基于加权欧氏距离的空间Co-location模式挖掘算法研究
Algorithm of Mining Spatial Co-location Patterns Based on Weighted Euclidean Distance
计算机科学, 2014, 41(Z6): 425-428.
[5] 吕诚.
多分辨剪枝局部聚类算法挖掘空间co-location模式
Mining Spatial Co-location Pattern with Multiresolution Pruning and Local Clustering Algorithm
计算机科学, 2014, 41(Z11): 327-332.
[6] 温佛生,肖清,王丽珍,孔兵.
一种模糊对象的极大co-location模式挖掘算法
Algorithm of Mining Maximal Co-location Patterns for Fuzzy Objects
计算机科学, 2014, 41(1): 138-145.
[7] 张晓辉,蒋海华,邸瑞华.
基于属性权重的链接数据共指关系构建
Property Weight Based Co-reference Resolution for Linked Data
计算机科学, 2013, 40(2): 40-43.
[8] 张霄雁,孟样福,马宗民,张文博,张霄鹏.
基于近似函数依赖的关系数据属性权重评估方法
Attribute Weight Evaluation Approach Based on Approximate Functional Dependencies
计算机科学, 2013, 40(2): 172-176.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!