计算机科学 ›› 2018, Vol. 45 ›› Issue (11A): 431-435.

• 大数据与数据挖掘 • 上一篇    下一篇

基于knnVAR模型的地理传感数据预测

廖仁健, 周丽华, 肖清, 杜国王   

  1. 云南大学信息学院 昆明650000
  • 出版日期:2019-02-26 发布日期:2019-02-26
  • 作者简介:廖仁健(1991-),男,硕士生,主要研究方向为数据挖掘;周丽华(1968-),女,博士,教授,主要研究方向为数据挖掘、社会网络分析,E-mail:lhzhou@ynu.edu.cn(通信作者);肖 清(1975-),女,硕士,讲师,主要研究方向为数据挖掘;杜国王(1994-),男,硕士生,主要研究方向为数据挖掘。
  • 基金资助:
    本文受国家自然科学基金项目(61262069,61472346,61662086,61762090),云南省自然科学基金项目(2016FA026,2015FB114),云南省创新团队,云南省高校科技创新团队(IRTSTYN),云南大学创新团队发展计划(XT412011),云南大学谱传感和边疆安全重点实验室(C6165903)资助。

Prediction of Geosensor Data Based on knnVAR Model

LIAO Ren-jian, ZHOU Li-hua, XIAO Qing, DU Guo-wang   

  1. School of Information Science & Engineering,Yunnan University,Kunming 650000,China
  • Online:2019-02-26 Published:2019-02-26

摘要: 地理传感数据的预测在经济、工程、自然科学和社会科学中被广泛应用。数据中不同站点的空间相关性和同一站点的时间相关性给传统的预测方法带来了极大的挑战。文中提出了一种将数据中时间信息和空间信息有效融合,同时考虑了各传感序列独特性的knnVAR模型,来对地理传感数据进行预测。该模型通过计算时空距离量化数据中的时间信息和空间信息,并基于时空距离寻找K近邻,最后再将近邻结果应用于向量自回归模型中完成预测。knn-VAR模型采用寻找时空近邻的方式将数据中时间维度和空间维度的相关性进行有效融合,同时使用在时空上具有高度相关性的近邻对传感序列进行预测,充分考虑了各地理序列的独特性。实验结果表明,knnVAR模型能有效提高地理传感数据的预测精度。

关键词: K近邻, 地理传感数据, 时空距离, 向量自回归模型

Abstract: The prediction of geosensor data is widely used in economy,engineering,natural science and social sciences.The spatial correlation of different sites and the time correlation of the same site in the data pose great challenges to traditional forecasting models.In this paper,a knnVAR model which computes the relevance of the space-time information effectively and considers the uniqueness of each sensing sequence at the same time was proposed to predict the geosensor data.This model quantifies the time information and spatial information of the data by calculating the space-time distance,and then searches for the K nearest neighbor based on space-time distance.Finally,the nearest neighbor sequences were applied to the vector autoregressive model.By searching for space-time nearest neighbors,knnVAR model computes the relevance of the time dimension and space dimension effectively.At the same time,knnVAR model uses the space-time nearest neighbor sequences which are highly correlated to predict the sensing sequence.The experimental results show that the knnVAR model can improve the prediction accuracy of geosensor data effectively.

Key words: Geosensor data, K nearest neighbor, Space-time distance, Vector autoregressive model

中图分类号: 

  • TP301
[1]EGRIOGLU E,YOLCU U,ALADAG C H,et al.Recurrent Multiplicative Neuron Model Artificial Neural Network for Non-linear Time Series Forecasting[J].Neural Processing Letters,2015,41(2):249-258.
[2]HYNDMAN R J,KHANDAKAR Y.Automatic Time Series Forecasting: The forecast Package for R[J].Journal of Statistical Software,2008,27(3):1-22.
[3]LÜTKEPOHL H.New introduction to multiple time series analysis[M].Springer Science & Business Media,2005:88-89.
[4]PRAVILOVIC S,APPICE A,MALERBA D.Integrating cluster analysis to the ARIMA model for forecasting geosensor data[C]∥International Symposium on Methodologies for Intelligent Systems.Cham:Springer,2014: 234-243.
[5]PRAVILOVIC S,BILANCIA M,APPICE A,et al.Using multiple time series analysis for geosensor data forecasting[J].Information Sciences,2017,380:31-52.
[6]BOX G E P,JENKINS G M.Time Series Analysis: Forecasting and Control[J].Journal of Time,2010,31(4):303-303.
[7]TSAY R S.Multivariate time series analysis.With R and financial applications[M].Wiley,2013:1-40.
[8]KAMARIANAKIS Y,PRASTACOS P.Space-time modeling of traffic flow[J].Computers & Geosciences,2005,31(2):119-133.
[9]POKRAJAC D,OBRADOVIC Z.Improved spatial-temporal for-ecasting through modelling of spatial residuals in recent history[C]∥Proceedings of the 2001 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2001:1-17.
[10]SAENGSEEDAM P,KANTANANTHA N.Spatial time series forecasts based on Bayesian linear mixed models for rice yields in Thailand[C]∥Proceedings of the International Multi Confe-rence of Engineers and Computer Scientists.2014:1007-1012.
[11]QIN K,CHEN Y,ZHAN Y,et al.Spatial clustering considering spatio-temporal correlation[C]∥International Conference on Geoinformatics.2011:1-4.
[12]BIRANT D,KUT.ST-DBSCAN:An algorithm for clustering spatial-temporal data[J].Data & Knowledge Engineering,2007,60(1):208-221.
[13]APPICE A, CIAMPI A, MALERBAD.Summarizing numeric spatial data streams by trend cluster discovery[J].Data Mining and Knowledge Discovery,2015,29(1):84-136.
[14]APPICE A,GUCCIONE P,MALERBA D,et al.Dealing with temporal and spatial correlations to classify outliers in geophysical data streams[J].Information Sciences,2014,285(1):162-180.
[15]REYNOLDS A P,RICHARDS G,IGLESIA B D L,et al.Clustering Rules:A Comparison of Partitioning and Hierarchical Clustering Algorithms[J].Journal of Mathematical Modelling & Algorithms,2006,5(4):475-504.
[16]ZIVOT E,WANG J.Modeling Financial Time Series with S-PLUS?[M].New York:Springer,2006:296.
[1] 董明刚, 黄宇扬, 敬超.
基于遗传实例和特征选择的K近邻训练集优化方法
K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection
计算机科学, 2020, 47(8): 178-184. https://doi.org/10.11896/jsjkx.190700089
[2] 张彤,秦小麟.
时间依赖路网上的移动对象K近邻查询算法
K Nearest Neighbors Queries of Moving Objects in Time-dependent Road Networks
计算机科学, 2020, 47(1): 79-86. https://doi.org/10.11896/jsjkx.181102231
[3] 王颖,杨余旺.
基于堆和邻域共存信息的KNN相似图算法
KNN Similarity Graph Algorithm Based on Heap and Neighborhood Coexistence
计算机科学, 2018, 45(5): 196-200. https://doi.org/10.11896/j.issn.1002-137X.2018.05.033
[4] 冯贵兰, 周文刚.
基于Spark平台的并行KNN异常检测算法
Spark-based Parallel Outlier Detection Algorithm of K-nearest Neighbor
计算机科学, 2018, 45(11A): 349-352.
[5] 陈静杰,车洁.
基于标准欧氏距离的燃油流量缺失数据填补算法
Fuel Flow Missing-value Imputation Method Based on Standardized Euclidean Distance
计算机科学, 2017, 44(Z6): 109-111. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.023
[6] 王佳楠,陈默,巩树凤,于戈.
地理社交网络中基于K近邻的兴趣组查询
K-nearest Neighbor Based Interest Group Query in Geo-social Networks
计算机科学, 2017, 44(9): 200-207. https://doi.org/10.11896/j.issn.1002-137X.2017.09.038
[7] 苟杰,马自堂,张喆程.
PODKNN:面向大数据集的并行离群点检测算法
PODKNN:A Parallel Outlier Detection Algorithm for Large Dataset
计算机科学, 2016, 43(7): 251-254. https://doi.org/10.11896/j.issn.1002-137X.2016.07.045
[8] 古凌岚,彭利民.
基于相对密度和流形上k近邻的聚类算法
Clustering Algorithm Based on Relative Density and k-nearest Neighbors over Manifolds
计算机科学, 2016, 43(12): 213-217. https://doi.org/10.11896/j.issn.1002-137X.2016.12.039
[9] 肖春宝,冯大政.
基于K近邻一致性的特征匹配内点选择算法
Inlier Selection Algorithm for Feature Matching Based on K Nearest Neighbor Consistency
计算机科学, 2016, 43(1): 290-293. https://doi.org/10.11896/j.issn.1002-137X.2016.01.062
[10] 党兰学,侯彦娥,孔云峰.
时空相关的混载校车路径问题邻域搜索
Spatiotemporal Neighborhood Search for Solving Mixed-load School Bus Routing Problem
计算机科学, 2015, 42(4): 221-225. https://doi.org/10.11896/j.issn.1002-137X.2015.04.045
[11] 朱庆生,唐汇,冯骥.
一种基于自然最近邻的离群检测算法
Outlier Detection Algorithm Based on Natural Nearest Neighbor
计算机科学, 2014, 41(3): 276-278.
[12] 戚铭尧,张金金,任丽.
基于时空聚类的带时间窗车辆路径规划算法
Vehicle Routing Algorithm Based on Spatiotemporal Clustering
计算机科学, 2014, 41(3): 218-222.
[13] 郑希源,张化祥.
基于局部近邻相关性的多标记算法
Multiple Label Approach Based on Local Correlation of Neighbors
计算机科学, 2014, 41(2): 123-126.
[14] 赵海峰,余强,曹俞旦.
基于粒计算的多标签懒惰学习算法
Multi-label Learning Algorithm Based on Granular Computing
计算机科学, 2014, 41(12): 160-163. https://doi.org/10.11896/j.issn.1002-137X.2014.12.034
[15] 蔡朝晖,夏溪,胡波,范丹玫.
室内信号强度指纹定位算法改进
Improvements of Indoor Signal Strength Fingerprint Location Algorithm
计算机科学, 2014, 41(11): 178-181. https://doi.org/10.11896/j.issn.1002-137X.2014.11.035
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!