计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 57-65.doi: 10.11896/j.issn.1002-137X.2019.04.009
张昉, 赵书良, 武永亮
ZHANG Fang, ZHAO Shu-liang, WU Yong-liang
摘要: 多尺度挖掘在图形图像、地理信息、信号分析、数据挖掘等领域已有应用,多尺度数据挖掘在关联规则、聚类、分类挖掘领域也有相关研究与应用,但对如何对数据集进行普适性的多尺度划分以及如何构建多尺度数据集仍未展开研究,已有相关研究缺乏深度。文中从多尺度数据挖掘任务入手,定义了尺度概念,并给出了多尺度化数据集模型,以及基准尺度评分模型;依据概率密度估计的离散化方法提出了多尺度划分算法,扩展了可划分尺度的数据类型,划分结果更贴近数据的多尺度特性,且具有较低的时间复杂度;提出了多尺度化数据集方法、构建多尺度数据集算法和基准尺度选择算法,将多尺度熵与信息熵作为评价方法,在扩充多尺度化数据集方法的基础上,有效减弱了多尺度数据挖掘中因尺度推衍而产生的尺度效应,算法的时间复杂性也较为可控。利用H省真实人口数据集、UCI公用数据集和T10I4D100K数据集对所提算法和模型进行验证与实验分析,结果表明多尺度划分算法和多尺度化数据集方法是可行的,提出的多尺度化数据集方法和基准尺度评分模型是有效的,多尺度划分方法、构建多尺度数据集方法和基准尺度选择方法的应用平均提高了尺度推衍过程中1.6%的覆盖率、2.1%的F1-measure和3.7%的正确率,且具有较低的平均支持度误差。
中图分类号:
[1]SUN Q X,LI M T,LU J X,et al.Scale of geospatial data and its research progress [J].Geography and Geographic Information Science,2007,23(4):53-56,80.(in Chinese) 孙庆先,李茂堂,路京选,等.地理空间数据的尺度问题及其研究进展[J].地理与地理信息科学,2007,23(4):53-56,80. [2]LIU M M,ZHAO S L,HAN Y H,et al.Research on multi-scale data mining method[J].Journal of Software,2016,27(12):3030-3050.(in Chinese) 柳萌萌,赵书良,韩玉辉,等.多尺度数据挖掘方法[J],软件学报,2016,27(12):3030-3050. [3]HAN Y H,ZHAO S L,LIU M M,et al.Multi-scale Clustering Mining Algorithm [J].Computer Science,2016,43(8):244-248.(in Chinese) 韩玉辉,赵书良,柳萌萌,等.多尺度聚类挖掘算法[J].计算机科学,2016,43(8):244-248. [4]LIU Q,HANG R,SONG H,et al.Learning Multi-Scale Deep Features for High-Resolution Satellite Image Classification[J].IEEE Transactions on Geoscience & Remote Sensing,2016,PP(99):1-10. [5]AZAMI H,FERNÁNDEZ A,ESCUDERO J.Refined multiscale fuzzy entropy based on standard deviation for biomedical signal analysis[J].Medical & Biological Engineering & Computing,2017,55(11):2037-2052. [6]LI Z,WEI Z,WEN C,et al.Detail-Enhanced Multi-Scale Exposure Fusion[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2017,26(3):1243-1252. [7]SHEN L,SUN G,HUANG Q M,et al.Multi-Level Discriminative Dictionary Learning With Application to Large Scale Image Classification[J].IEEE Transactions on Image Processing,2015,24(10):3109-3123. [8]LIAO S,ZHU Q,QIAN Y,et al.Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs[OL].https://www.onacademic.com/detail/journal_1000040426607310_1fb6.html. [9]LANGARI B,VASEGHI S,PROCHAZKA A,et al.Edge- Guided Image Gap Interpolation Using Multi-Scale Transformation[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2016,25(9):4394-4405. [10]LIU M M,ZHAO S L,CHEN M,et al.Scaling-up mining algorithm of multi-scale association rules mining [J].Application Research of Computers,2015,32(10):2924-2929.(in Chinese) 柳萌萌,赵书良,陈敏,等.多尺度关联规则挖掘的尺度上推算法[J].计算机应用研究,2015,32(10):2924-2929. [11]LI C,ZHAO S L,ZHAO J P,et al.Scaling-up Algorithm of Multi-scale association rules [J].Computer Science,2017,44(8):285-289.(in Chinese) 李超,赵书良,赵骏鹏,等.多尺度关联规则尺度上推算法[J].计算机科学,2017,44(8):285-289. [12]LI J X,ZHAO S L,AN L,et al.Scaling-up Algorithm of Multi-scale Classification Based on Fractal Theory[J].Computer Scie-nce,2018,45(S1):453-459.(in Chinese) 李佳星,赵书良,安磊,等.基于分形理论的多尺度分类尺度上推算法[J].计算机科学,2018,45(S1):453-459. [13]LI J X,ZHAO S L,AN L,et al.Scaling-down Algorithm of Multi-scale Classification Based on Fractal Theory[J].Application Research of Computers,2019(7):1-3.(in Chinese) 李佳星,赵书良,安磊,等.基于广义分形插值理论的多尺度分类尺度下推算法[J].计算机应用研究,2019(7):1-3. [14]PETRY F E,YAGER R R.Fuzzy Concept Hierarchies and Evidence Resolution[J].IEEE Transactions on Fuzzy Systems,2014,22(5):1151-1161. [15]KANG X,MIAO D.A study on information granularity in formal concept analysis based on concept-bases[J].Knowledge-Based Systems,2016,105(C):147-159. [16]HAO C,LI J,FAN M,et al.Optimal scale selection in dynamic multi-scale decision tables based on sequential three-way decisions[J].Information Sciences,2017,415:213-232. [17]ZHAO J P,ZHAO S L,LI C,et al.A multi-scale clustering algorithm based on grain calculation [J].Application Research of Computers,2018,35(2):362-366.(in Chinese) 赵骏鹏,赵书良,李超,等.基于粒计算的多尺度聚类尺度上推算法[J].计算机应用研究,2018,35(2):362-366. [18]BIBA M,ESPOSITO F,FERILLI S,et al.Unsupervised discre- tization using kernel density estimation[C]∥Proceedings of the International Joint Conference on Artificial Intelligence,Hyderabad,India,January.DBLP,2008:696-701. [19]ZHOU C H,ZHANG J T.A geospatial data mining model based on information entropy [J].Chinese Journal of Image and Graphics,1999,4(11):946-951.(in Chinese) 周成虎,张健挺.基于信息熵的地学空间数据挖掘模型[J].中国图象图形学报,1999,4(11):946-951. [20]GOU J,LIU J Y,WEI Z B,et al.Analysis of power energy flow complexity based on multi-scale entropy [J].Acta Physica Sinica,2014(20):347-354.(in Chinese) 苟竞,刘俊勇,魏震波,等.基于多尺度熵的电力能量流复杂性分析[J].物理学报,2014(20):347-354. [21]BRUNI R,BIANCHI G.Effective Classification Using a Small Training Set Based on Discretization and Statistical Analysis[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(9):2349-2361. |
[1] | 夏源, 赵蕴龙, 范其林. 基于信息熵更新权重的数据流集成分类算法 Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight 计算机科学, 2022, 49(3): 92-98. https://doi.org/10.11896/jsjkx.210200047 |
[2] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[3] | 赵钦炎, 李宗民, 刘玉杰, 李华. 基于信息熵的级联Siamese网络目标跟踪 Cascaded Siamese Network Visual Tracking Based on Information Entropy 计算机科学, 2020, 47(9): 157-162. https://doi.org/10.11896/jsjkx.190800160 |
[4] | 刘子琦, 郭炳晖, 程臻, 杨小博, 殷子樵. 基于熵值模糊层次分析法的科技战略评价 Science and Technology Strategy Evaluation Based on Entropy Fuzzy AHP 计算机科学, 2020, 47(6A): 1-5. https://doi.org/10.11896/JsJkx.190700078 |
[5] | 刘俊琦, 李智, 张学阳. 基于信息熵和残差神经网络的多层次船只目标鉴别方法 Multi-level Ship Target Discrimination Method Based on Entropy and Residual Neural Network 计算机科学, 2020, 47(11A): 253-257. https://doi.org/10.11896/jsjkx.191100006 |
[6] | 王亚鸽, 康晓东, 郭军, 洪睿, 李博, 张秀芳. 一种联合Canny边缘检测和SPIHT的图像压缩方法 Image Compression Method Combining Canny Edge Detection and SPIHT 计算机科学, 2019, 46(6A): 222-225. |
[7] | 朱佩佩, 龙敏. 基于用户间接信任及高斯填充的推荐算法 Recommendation Methods Considering User Indirect Trust and Gaussian Filling 计算机科学, 2019, 46(11A): 178-184. |
[8] | 李佳星,赵书良,安磊,李长镜. 基于分形理论的多尺度分类尺度上推算法 Scaling-up Algorithm of Multi-scale Classification Based on Fractal Theory 计算机科学, 2018, 45(6A): 453-459. |
[9] | 郑书富,余高锋. 基于形式背景的属性转移与知识发现 Attribute Transfer and Knowledge Discovery Based on Formal Context 计算机科学, 2018, 45(6A): 117-119. |
[10] | 王锋, 刘吉超, 魏巍. 基于信息熵的半监督特征选择算法 Semi-supervised Feature Selection Algorithm Based on Information Entropy 计算机科学, 2018, 45(11A): 427-430. |
[11] | 邹娜, 田金文. 多特征融合红外舰船尾流检测方法研究 Research on Multi Feature Fusion Infrared Ship Wake Detection 计算机科学, 2018, 45(11A): 172-175. |
[12] | 曹峰,唐超,张婧. 一种结合二元蚁群和粗糙集的连续属性离散化算法 Algorithm of Continuous Attribute Discretization Based on Binary Ant Colony and Rough Sets 计算机科学, 2017, 44(9): 222-226. https://doi.org/10.11896/j.issn.1002-137X.2017.09.041 |
[13] | 袁小艳,王安志,潘刚,王明辉. 多尺度下幅度谱与相位谱相融合的视觉注意建模 Visual Attention Modeling Based on Multi-scale Fusion of Amplitude Spectrum and Phase Spectrum 计算机科学, 2017, 44(7): 293-298. https://doi.org/10.11896/j.issn.1002-137X.2017.07.053 |
[14] | 曹如胜,倪世宏,张鹏,奚显阳. 一种基于云模型的贝叶斯网络EM参数学习算法 EM Parameter Learning Algorithm of Bayesian Network Based on Cloud Model 计算机科学, 2016, 43(8): 194-198. https://doi.org/10.11896/j.issn.1002-137X.2016.08.039 |
[15] | 陈旻骋,袁景凌,王啸岩,朱赛. 基于弱相关化特征子空间选择的离散化随机森林并行分类算法 Parallelization of Random Forest Algorithm Based on Discretization and Selection of Weak-correlation Feature Subspaces 计算机科学, 2016, 43(6): 55-58. https://doi.org/10.11896/j.issn.1002-137X.2016.06.011 |
|