一种基于始末距离的时间序列符号聚合近似表示方法

doi:10.11896/j.issn.1002-137X.2018.06.039

Abstract

Abstract: The feature representation method of time series data is the key technology of time series data mining task,and the symbolic aggregate approximation (SAX) method is most commonly used in feature representation methods.A symbolic aggregate approximation method based on beginning and end distance (SAX_SM) was proposed because SAX algorithm can not distinguish the similarity between time series when the symbol is consistent in each sequence segment of time series.Time series data have a strong morphological trend,so the proposed method uses the beginning point and the end point to represent the morphological feature of each sequence segment,and then uses the morphological feature and representation symbol of each sequence segment to approximate the time series data,in order to map it from high-dimensional space to low-dimensional space.Next,in order to calculate the morphological distance between the two sequences,this paper constructed beginning and end distance based on the beginning point and the end point.Finally,to measure the similarity between time series more objectively,a new distance metric approach was defined by combining the beginning and end distance and the symbol distance.The theoretical analysis shows that the new distance measure satisfies the lower bound theorem.Experiments on 20 sets of UCR time series data sets show that the proposed SAX_SM method achieves the highest classification accuracy (including the largest side by side) in 13 data sets,while SAX only gets the largest classification accuracy in 6 data sets (including the largest side by side).Therefore,SAX_SM has better classification result than SAX.

Key words: Beginning and end distance, Sequence segment, Symbol distance, Time series data

CLC Number:

TP391

JI Hai-juan, ZHOU Cong-hua, LIU Zhi-feng. Symbolic Aggregate Approximation Method of Time Series Based on Beginning and End Distance[J].Computer Science, 2018, 45(6): 216-221.

References

[1]CRYER J D,CHAN K S,时间序列分析及应用:R 语言[M].潘红宇,译.北京:机械工业出版社,2011:25-29.
[2]HAN J,PEI J,KAMBER M.Data mining:concepts and techniques[M].Amsterdam:Elsevier,2011:20-23.
[3]FU T.A review on time series data mining[J].Engineering Applications of Artificial Intelligence,2011,24(1):168-181.
[4]LI H L.Research on Feature Representation and Similarity Meaure Methods in Time Series Data Mining[D].Dalian:Dalian University of Technology,2012.(in Chinese)
李海林.时间序列数据挖掘中的特征表示与相似性度量方法研究[D].大连:大连理工大学,2012.
[5]ESLING P,AGON C.Time-series data mining[J].ACM Computing Surveys (CSUR),2012,45(1):12.
[6]LI H L,GUO C H.Survey of feature representations and similarity measurements in time series data mining[J].Application Research of Computers,2013,30(5):1285-1291.(in Chinese)
李海林,郭崇慧.时间序列数据挖掘中特征表示与相似性度量研究综述[J].计算机应用研究,2013,30(5):1285-1291.
[7]YUAN J D,WANG Z H.Review of Time Series Representation and ClassificationTechniques[J].Computer Science,2015,42(3):1-7.(in Chinese)
原继东,王志海.时间序列的表示与分类算法综述[J].计算机科学,2015,42(3):1-7.
[8]AGRAWAL R,FALOUTSOS C,SWAMI A.Efficient similarity search in sequence databases[C]//International Conference on Foundations of Data Organization and Algorithms.1993:69-84.
[9]RATANAMAHATANA C,KEOGH E,BAGNALL A J,et al.A novel bit level time series representation with implication of similarity search and clustering[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining.Springer Berlin Heidelberg,2005:771-777.
[10]LIN J,KEOGH E,WEI L,et al.Experiencing SAX:a novel symbolic representation of time series[J].Data Mining and knowledge discovery,2007,15(2):107-144.
[11]AZZOUZI M,NABNEY I T.Analysing time series structure with Hidden Markov Models[C]//Neural Networks for Signal Processing VIII,1998.Proceedings of the 1998 IEEE Signal Processing Society Workshop.IEEE,1998:402-408.
[12]KALPAKIS K,GADA D,PUTTAGUNTA V.Distance measures for effective clustering of ARIMA time-series[C]//Proceedings IEEE International Conference on Data Mining,2001(ICDM 2001).IEEE,2001:273-280.
[13]KEOGH E,CHAKRABARTI K,PAZZANI M,et al.Dimensionality reduction for fast similarity search in large time series databases[J].Knowledge and Information Systems,2001,3(3):263-286.
[14]ZHU Y.High performance data mining in time series:techniques and case studies[D].New York:New York University,2004.
[15]LKHAGVA B,SUZUKI Y,KAWAGOE K.New time series data representation ESAX for financial applications[C]//22nd International Conference on Data Engineering Workshops.IEEE,2006:115-115.
[16]ZHONG Q L,CAI Z X.The Symbolic Algorithm for Time Series Data Based on Statistic Feature[J].Chinese Journal of Com-puters,2008,31(10):1857-1864.(in Chinese)
钟清流,蔡自兴.基于统计特征的时序数据符号化算法[J].计算机学报,2008,31(10):1857-1864.
[17]ESMAEL B,ARNAOUT A,FRUHWIRTH R,et al.Multivariate time series classification by combining trend-based andvaluebased approximations[M]//Computational Science and Its Applications-ICCSA 2012.Springer Berlin Heibelberg,2012:392-403.
[18]SUN Y,LI J,LIU J,et al.An improvement of symbolic aggregate approximation distance measure for time series[J].Neurocomputing,2014,138(11):189-198.
[19]ZHENG X,SHENG L H,CUI X Y.A Piecewise Aggregation Approximation of Time Series Based on Wavelet Entropy [J].Computer Simulation,2015,32(1):411-415.(in Chinese)
郑旭,盛立辉,崔宵语.基于小波熵的时间序列分段聚合近似表示[J].计算机仿真,2015,32(1):411-415.
[20]WANG Y,AN Y.Composite similarity measure algorithm[C]//2016 12th International Conference on Natural Computation,Fuzzy Systems and Knowledge Discovery(ICNC-FSKD).IEEE,2016:1254-1258.
[21]LARSEN R J,MARX M L.An introduction to mathematical statistics and its applications[M].Prentice-Hall Englewood,1986:470-481.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Symbolic Aggregate Approximation Method of Time Series Based on Beginning and End Distance

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 3

Metrics

Comments

Recommended 0

[1]	DING Wu, MA Yuan, DU Shi-lei, LI Hai-chen, DING Gong-bo, WANG Chao. Mining Trend Similarity of Multivariate Hydrological Time Series Based on XGBoost Algorithm [J]. Computer Science, 2020, 47(11A): 459-463.
[2]	WANG Yi-bo, PENG Guang-ju, HE Yuan-duo, WANG Ya-sha, ZHAO Jun-feng, WANG Jiang-tao. Time Series Motif Discovery Algorithm of Variable Length Based on Domain Preference [J]. Computer Science, 2019, 46(11): 251-259.
[3]	LI Hai-lin and YANG Li-bin. Similarity Measure for Time Series Based on Incremental Dynamic Time Warping [J]. Computer Science, 2013, 40(4): 227-230.