计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 482-487.

• 大数据与数据挖掘 • 上一篇    下一篇

一种确定滑动窗口规模的边界距离算法

彭成1,2, 贺婧1, 池昊1   

  1. 湖南工业大学计算机学院 湖南 株洲4120071;
    中南大学自动化学院 长沙4100832
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 贺 婧(1989-),女,硕士生,主要研究领域为工业装备数据处理,E-mail:492508527@qq.com(通信作者)。
  • 作者简介:彭 成(1982-),男,博士,硕士生导师,主要研究领域为工业大数据处理;
  • 基金资助:
    本文受国家自然科学基金面上项目(61871432),湖南省自然科学基金青年项目(20173065)资助。

Boundary Distance Algorithm for Determining Sliding Window Size

PENG Cheng1,2, HE Jing1, CHI Hao1   

  1. School of Computer,Hunan University of Technology,Zhuzhou,Hunan 412007,China1;
    School of Automation,Central South University,Changsha 410083,China2
  • Online:2019-06-14 Published:2019-07-02

摘要: 由于大多装备的原始测量数据采集信息量大、密度高,现有的时间序列滑动窗口的降维方法采用经验值确定窗口大小,无法最大限度地保留数据的重要信息点,并且计算复杂度高。为此,文中研究了实际应用中滑动窗口对时间序列相似性技术的影响,提出了一种确定滑动窗口初始规模的算法。该算法构建拟合度更高的上下边界曲线,将趋势加权引入LB_Hust距离计算方法中,从而降低了数学建模难度,提高了装备数据相似性聚类与状态评估的效率。

关键词: LB_Hust距离, 滑动窗口, 数据挖掘

Abstract: Due to a large amount of information and high density of the original measurement data collected by most equipment,the existing time series sliding window dimension reduction method uses the empirical value to determine the window size,which cannot retain important information points of the data to the utmost extent,and has high computational complexity.To this end,the influence of sliding window on time series similarity technology in practical applications was discussed,and an algorithm for determining the initial scale of sliding window was proposed.The upper and lower boundary curves with higher fitting degree are constructed,and the trend weighting is introduced into the LB_Hust distance calculation method,which reduces the difficulty of mathematical modeling and improves efficiency of equipment data similarity classification and state evaluation.

Key words: Data mining, LB_Hust distance, Sliding window

中图分类号: 

  • TP301
[1]CHERNICK M R.Wavelet Methods for Time Series Analysis[J].Technometrics,2016,43(4):491-497.
[2]ANDREW B,KELVYN J.Explaining Fixed Effects:Random Effects Modeling of Time-Series Cross-Sectional and Panel Data*[J].Political Science Research & Methods,2015,3(1):133-153.
[3]BULLMORE E,LONG C,SUCKLING J,et al.Colored noise and computational inference in neurophysiological (fMRI) time series analysis:Resampling methods in time and wavelet domains[J].Human Brain Mapping,2015,12(2):61-78.
[4]李正欣,张凤鸣,张晓丰,等.多元时间序列特征降维方法研究[J].小型微型计算机系统,2013,34(2):338-344.
[5]ADWAN S,ALSALEH I,MAJED R.A new approach for image stitching technique using Dynamic Time Warping (DTW) algorithm towards scoliosisX-ray diagnosis[J].Measurement,2016,84:32-46.
[6]CHEN T L,CHEN F Y.An intelligent pattern recognition mo-del for supporting investment decisions in stock market[J].Information Sciences,2016,346:261-274.
[7]刘芬,郭躬德.基于符号化聚合近似的时间序列相似性复合度量方法[J].计算机应用,2013,33(1):192-198.
[8]XIAO J,BAI L,LI F,et al.Sizing of Energy Storage and Diesel Generators in an Isolated Microgrid Using Discrete Fourier Transform (DFT)[J].IEEE Transactions on Sustainable Energy,2014,5(3):907-916.
[9]李正欣,张凤鸣,李克武,等.一种支持DTW距离的多元时间序列索引结构[J].软件学报,2014,25(3):560-575.
[10]HU B,DIXON P C,JACOBS J V,et al.Machine learning algorithms based on signals from a single wearable inertial sensor can detect surface- and age-related differences in walking[J].Journal of Biomechanics,2018,71:37-42.
[11]YAO R,LIN G S,SHI Q F,et al.Efficient Dense Labelling of Human Activity Sequences from Wearables using Fully Convolutional Networks[J].Pattern Recognition,2017,78:252-266.
[12]薛钰,梅雪,支有冉,等.基于时间序列数据挖掘的地铁车门亚健康状态识别方法[J].计算机应用,2018,38(3):905-910.
[13]余宇峰,朱跃龙,万定生,等.基于滑动窗口预测的水文时间序列异常检测[J].计算机应用,2014,34(8):2217-2220.
[14]李海峰,章宁,朱建明,等.时间敏感数据流上的频繁项集挖掘算法[J].计算机学报.2012,35(11):2283-2293.
[15]LEE G,YUN U,RYU K H.Sliding window based weighted maximal frequent pattern mining over data streams[J].Expert Systems with Applications,2014,41(2):694-708.
[16]陈树广,李俊奎,陈胜利.CSDTW:一种时间序列流上的受限动态弯曲距离[J].计算机应用研究,2012,29(8):2939-2942.
[17]李俊奎.时间序列相似性问题研究[D].武汉:华中科技大学,2008.
[18]BIAN W,TAO D.Max-Min Distance Analysis by Using Se-quential SDP Relaxation for Dimension Reduction[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2011,33(5):1037-1050.
[19]NIENNATTRAKUL V,RUENGRONGHIRUNYA P,RA-TANAMAHATANA C A.Exact indexing for massive time series databases under time warping distance[J].Data Mining & Knowledge Discovery,2010,21(3):509-541.
[20]KEOGH E,RATANAMAHATANA C A.Exact indexing of dynamic time warping[J].Knowledge & Information Systems,2005,7(3):358-386.
[21]KDD’ Datasets.The UCI KDD Archive[Z].1999.
[22]KOU G,PENG Y,WANG G.Evaluation of clustering algo-rithms for financial risk analysis using MCDM methods[J].Information Sciences,2014,275(11):1-12.
[1] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[2] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[3] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[4] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[5] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[6] 龚建锋.
抗能量分析的带符号滑动窗口标量乘算法
Resisting Power Analysis Algorithm of Scalar Multiplication Based on Signed Sliding Window
计算机科学, 2021, 48(6A): 533-537. https://doi.org/10.11896/jsjkx.191200097
[7] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[8] 张岩金, 白亮.
一种基于符号关系图的快速符号数据聚类算法
Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph
计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[9] 张寒烁, 杨冬菊.
基于关系图谱的科技数据分析算法
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[10] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[11] 刘新斌, 王丽珍, 周丽华.
MLCPM-UC:一种基于模式实例分布均匀系数的多级co-location模式挖掘算法
MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution
计算机科学, 2021, 48(11): 208-218. https://doi.org/10.11896/jsjkx.201000097
[12] 刘晓楠, 宋慧超, 王洪, 江舵, 安家乐.
Grover算法改进与应用综述
Survey on Improvement and Application of Grover Algorithm
计算机科学, 2021, 48(10): 315-323. https://doi.org/10.11896/jsjkx.201100141
[13] 张煜, 陆亿红, 黄德才.
基于密度峰值的加权犹豫模糊聚类算法
Weighted Hesitant Fuzzy Clustering Based on Density Peaks
计算机科学, 2021, 48(1): 145-151. https://doi.org/10.11896/jsjkx.200400043
[14] 游兰, 韩雪薇, 何正伟, 肖丝雨, 何渡, 潘筱萌.
基于改进Seq2Seq的短时AIS轨迹序列预测模型
Improved Sequence-to-Sequence Model for Short-term Vessel Trajectory Prediction Using AIS Data Streams
计算机科学, 2020, 47(9): 169-174. https://doi.org/10.11896/jsjkx.190800060
[15] 张素梅, 张波涛.
一种基于量子耗散粒子群的评估模型构建方法
Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization
计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!