Computer Science ›› 2019, Vol. 46 ›› Issue (4): 57-65.doi: 10.11896/j.issn.1002-137X.2019.04.009

• Big Data & Data Science • Previous Articles     Next Articles

Data Scaling Method for Multi-scale Data Mining

ZHANG Fang, ZHAO Shu-liang, WU Yong-liang   

  1. College of Mathematics & Information Science,Hebei Normal University,Shijiazhuang 050024,China
    Hebei Key Laboratory of Computational Mathematics & Applications,Hebei Normal University,Shijiazhuang,050024,China
  • Received:2018-09-04 Online:2019-04-15 Published:2019-04-23

Abstract: Multi-scale mining has been applied in the fields of graphic images,geographic information,signal analysis,data mining,etc,and also has related research and application in the fields of association rules,clustering and classification mining.Nevertheless how to divide datasets into common scales and how to construct multi-scale datasets have not been studied in depth.Starting with the task of multi-scale data mining,this paper defined the concept of scale and gave a multi-scale dataset model and a benchmark scale scoring model.This paper proposed a multi-scale partition algorithm based on the discretization method of probability density estimation,which extends the data types of divisible scales,and its partition results are closer to the multi-scale characteristics of data with lower time complexity.This paper also proposed a multi-scale dataset method,a multi-scale data set algorithm and a benchmark scale selection algorithm.Multi-scale entropy and information entropy were used as evaluation methods.On the basis of expanding the multi-scale dataset method,the scale effect produced by the meso-scale derivation of multi-scale data mining can be effectively reduced,and the time complexity can be controlled.The proposed algorithm and model were validated and analyzed by using the real population dataset of H province,UCI common dataset and IBM dataset.The experimental results show that the proposed method is feasible and the proposed model is effective.The application of the proposed methods improvescoverage by 1.6%,F1-measure by 2.1% andaccuracy by 3.7% in scale deduction process,and has low average support error.

Key words: Construction of multi-scale datasets, Discretization, Information entropy, Multi-scale data mining, Multi-scale entropy, Multi-scale scaling, Reference scale selection

CLC Number: 

  • TP391
[1]SUN Q X,LI M T,LU J X,et al.Scale of geospatial data and its research progress [J].Geography and Geographic Information Science,2007,23(4):53-56,80.(in Chinese) 孙庆先,李茂堂,路京选,等.地理空间数据的尺度问题及其研究进展[J].地理与地理信息科学,2007,23(4):53-56,80.
[2]LIU M M,ZHAO S L,HAN Y H,et al.Research on multi-scale data mining method[J].Journal of Software,2016,27(12):3030-3050.(in Chinese) 柳萌萌,赵书良,韩玉辉,等.多尺度数据挖掘方法[J],软件学报,2016,27(12):3030-3050.
[3]HAN Y H,ZHAO S L,LIU M M,et al.Multi-scale Clustering Mining Algorithm [J].Computer Science,2016,43(8):244-248.(in Chinese) 韩玉辉,赵书良,柳萌萌,等.多尺度聚类挖掘算法[J].计算机科学,2016,43(8):244-248.
[4]LIU Q,HANG R,SONG H,et al.Learning Multi-Scale Deep Features for High-Resolution Satellite Image Classification[J].IEEE Transactions on Geoscience & Remote Sensing,2016,PP(99):1-10.
[5]AZAMI H,FERNÁNDEZ A,ESCUDERO J.Refined multiscale fuzzy entropy based on standard deviation for biomedical signal analysis[J].Medical & Biological Engineering & Computing,2017,55(11):2037-2052.
[6]LI Z,WEI Z,WEN C,et al.Detail-Enhanced Multi-Scale Exposure Fusion[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2017,26(3):1243-1252.
[7]SHEN L,SUN G,HUANG Q M,et al.Multi-Level Discriminative Dictionary Learning With Application to Large Scale Image Classification[J].IEEE Transactions on Image Processing,2015,24(10):3109-3123.
[8]LIAO S,ZHU Q,QIAN Y,et al.Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs[OL].https://www.onacademic.com/detail/journal_1000040426607310_1fb6.html.
[9]LANGARI B,VASEGHI S,PROCHAZKA A,et al.Edge- Guided Image Gap Interpolation Using Multi-Scale Transformation[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2016,25(9):4394-4405.
[10]LIU M M,ZHAO S L,CHEN M,et al.Scaling-up mining algorithm of multi-scale association rules mining [J].Application Research of Computers,2015,32(10):2924-2929.(in Chinese) 柳萌萌,赵书良,陈敏,等.多尺度关联规则挖掘的尺度上推算法[J].计算机应用研究,2015,32(10):2924-2929.
[11]LI C,ZHAO S L,ZHAO J P,et al.Scaling-up Algorithm of Multi-scale association rules [J].Computer Science,2017,44(8):285-289.(in Chinese) 李超,赵书良,赵骏鹏,等.多尺度关联规则尺度上推算法[J].计算机科学,2017,44(8):285-289.
[12]LI J X,ZHAO S L,AN L,et al.Scaling-up Algorithm of Multi-scale Classification Based on Fractal Theory[J].Computer Scie-nce,2018,45(S1):453-459.(in Chinese) 李佳星,赵书良,安磊,等.基于分形理论的多尺度分类尺度上推算法[J].计算机科学,2018,45(S1):453-459.
[13]LI J X,ZHAO S L,AN L,et al.Scaling-down Algorithm of Multi-scale Classification Based on Fractal Theory[J].Application Research of Computers,2019(7):1-3.(in Chinese) 李佳星,赵书良,安磊,等.基于广义分形插值理论的多尺度分类尺度下推算法[J].计算机应用研究,2019(7):1-3.
[14]PETRY F E,YAGER R R.Fuzzy Concept Hierarchies and Evidence Resolution[J].IEEE Transactions on Fuzzy Systems,2014,22(5):1151-1161.
[15]KANG X,MIAO D.A study on information granularity in formal concept analysis based on concept-bases[J].Knowledge-Based Systems,2016,105(C):147-159.
[16]HAO C,LI J,FAN M,et al.Optimal scale selection in dynamic multi-scale decision tables based on sequential three-way decisions[J].Information Sciences,2017,415:213-232.
[17]ZHAO J P,ZHAO S L,LI C,et al.A multi-scale clustering algorithm based on grain calculation [J].Application Research of Computers,2018,35(2):362-366.(in Chinese) 赵骏鹏,赵书良,李超,等.基于粒计算的多尺度聚类尺度上推算法[J].计算机应用研究,2018,35(2):362-366.
[18]BIBA M,ESPOSITO F,FERILLI S,et al.Unsupervised discre- tization using kernel density estimation[C]∥Proceedings of the International Joint Conference on Artificial Intelligence,Hyderabad,India,January.DBLP,2008:696-701.
[19]ZHOU C H,ZHANG J T.A geospatial data mining model based on information entropy [J].Chinese Journal of Image and Graphics,1999,4(11):946-951.(in Chinese) 周成虎,张健挺.基于信息熵的地学空间数据挖掘模型[J].中国图象图形学报,1999,4(11):946-951.
[20]GOU J,LIU J Y,WEI Z B,et al.Analysis of power energy flow complexity based on multi-scale entropy [J].Acta Physica Sinica,2014(20):347-354.(in Chinese) 苟竞,刘俊勇,魏震波,等.基于多尺度熵的电力能量流复杂性分析[J].物理学报,2014(20):347-354.
[21]BRUNI R,BIANCHI G.Effective Classification Using a Small Training Set Based on Discretization and Statistical Analysis[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(9):2349-2361.
[1] XIA Yuan, ZHAO Yun-long, FAN Qi-lin. Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight [J]. Computer Science, 2022, 49(3): 92-98.
[2] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[3] ZHAO Qin-yan, LI Zong-min, LIU Yu-jie, LI Hua. Cascaded Siamese Network Visual Tracking Based on Information Entropy [J]. Computer Science, 2020, 47(9): 157-162.
[4] LIU Zi-qi, GUO Bing-hui, CHENG Zhen, YANG Xiao-bo and YIN Zi-qiao. Science and Technology Strategy Evaluation Based on Entropy Fuzzy AHP [J]. Computer Science, 2020, 47(6A): 1-5.
[5] WANG Ya-ge, KANG Xiao-dong, GUO Jun, HONG Rui, LI Bo, ZHANG Xiu-fang. Image Compression Method Combining Canny Edge Detection and SPIHT [J]. Computer Science, 2019, 46(6A): 222-225.
[6] ZHU Pei-pei, LONG Min. Recommendation Methods Considering User Indirect Trust and Gaussian Filling [J]. Computer Science, 2019, 46(11A): 178-184.
[7] LI Jia-xing, ZHAO Shu-liang,AN Lei,LI Chang-jing. Scaling-up Algorithm of Multi-scale Classification Based on Fractal Theory [J]. Computer Science, 2018, 45(6A): 453-459.
[8] ZHENG Shu-fu,YU Gao-feng. Attribute Transfer and Knowledge Discovery Based on Formal Context [J]. Computer Science, 2018, 45(6A): 117-119.
[9] ZOU Na, TIAN Jin-wen. Research on Multi Feature Fusion Infrared Ship Wake Detection [J]. Computer Science, 2018, 45(11A): 172-175.
[10] WANG Feng, LIU Ji-chao, WEI Wei. Semi-supervised Feature Selection Algorithm Based on Information Entropy [J]. Computer Science, 2018, 45(11A): 427-430.
[11] CAO Feng, TANG Chao and ZHANG Jing. Algorithm of Continuous Attribute Discretization Based on Binary Ant Colony and Rough Sets [J]. Computer Science, 2017, 44(9): 222-226.
[12] YUAN Xiao-yan, WANG An-zhi, PAN Gang and WANG Ming-hui. Visual Attention Modeling Based on Multi-scale Fusion of Amplitude Spectrum and Phase Spectrum [J]. Computer Science, 2017, 44(7): 293-298.
[13] CAO Ru-sheng, NI Shi-hong, ZHANG Peng and XI Xian-yang. EM Parameter Learning Algorithm of Bayesian Network Based on Cloud Model [J]. Computer Science, 2016, 43(8): 194-198.
[14] CHEN Min-cheng, YUAN Jing-ling, WANG Xiao-yan and ZHU Sai. Parallelization of Random Forest Algorithm Based on Discretization and Selection of Weak-correlation Feature Subspaces [J]. Computer Science, 2016, 43(6): 55-58.
[15] XU Tong-de. High-dimensional Data Discretization Method Based on Improved LLE [J]. Computer Science, 2015, 42(Z6): 146-150.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!