计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 459-463.doi: 10.11896/jsjkx.200500128

• 大数据&数据科学 • 上一篇    下一篇

基于XGBoost算法的多元水文时间序列趋势相似性挖掘

丁武1,3, 马媛2, 杜诗蕾2, 李海辰3, 丁公博3, 王超3   

  1. 1 华中科技大学水电与数字化工程学院 武汉 430074
    2 太湖流域管理局水文局(信息中心) 上海 200434
    3 中国水利水电科学研究院 北京 100038
  • 出版日期:2020-11-15 发布日期:2020-11-17
  • 通讯作者: 王超(wangchao@iwhr.com)
  • 作者简介:M201873779@hust.edu.cn
  • 基金资助:
    青年人才托举工程(2019QNRC001);中国水利水电科学研究院基本科研业务费专项(WR0145B012020)

Mining Trend Similarity of Multivariate Hydrological Time Series Based on XGBoost Algorithm

DING Wu1,3, MA Yuan2, DU Shi-lei2, LI Hai-chen3, DING Gong-bo3, WANG Chao3   

  1. 1 School of Hydropower and Information Engineering,Hua Zhong University of Science and Technology,Wuhan 430074,China
    2 Taihu Basin Authority of Ministry of Water Resources(Information Center),Shanghai 200434,China
    3 China Institute of Water Resources and Hydropower Research,Beijing 100038,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:DING Wu,born in 1996,postgraduate.His main research interests include hydrological big data analysis and optimized operation of hydropower.
    WANG Chao,born in 1989,Ph.D,senior engineer.His main interests research include basin water resources scheduling and intelligent water conservancy.
  • Supported by:
    This work was supported by the Young Elite Scientists Sponsorship Program by the CAST (2019QNRC001)and Fundamental Research Funds of China Institute of Water Resources and Hydropower Research (WR0145B012020).

摘要: 针对传统的利用神经网络等工具进行水文趋势预测得出结果不具备解释性等不足,文中提出一种基于机器学习算法的水文趋势预测方法,该方法旨在利用XGBOOST机器学习算法建立参照期与水文预见期之间各水文特征的相似度映射模型,从而在历史水文时间序列中匹配出与预见期水文趋势最相似的序列,从而达到水文趋势预测的目的。为了证明所提方法的高效性和可行性,以太湖水文时间序列数据为对象进行了验证。分析结果表明,基于机器学习的多元水文时间序列趋势相似性分析可以满足调度人员对未来水文趋势预测效果的要求。

关键词: 多元时间序列, 机器学习, 时间序列数据挖掘, 水文趋势预测, 相似性度量

Abstract: In view of the shortcomings of the traditional hydrological trend prediction using neural networks and other tools,the results are not interpretable and so on.This paper proposes a method of hydrological trend prediction based on machine learning algorithms,which aims to use the XGBOOST machine learning algorithm to establish a similarity mapping model for each hydrological feature between the reference period and the hydrological prediction period,thus,the most similar sequence to the hydrological trend in the foreseeing period is matched in the historical hydrological time series,so as to achieve the purpose of hydrological trend prediction.In order to prove the efficiency and feasibility of the proposed method,it was verified with the Taihu hydrological time series data as the object.The analysis results show that the multi-variable hydrological time series trend simila-rity analysis based on machine learning can meet therequirements of dispatchers for the prediction effect of future hydrological trends.

Key words: Hydrological trend prediction, Machine learning, Multivariate time series, Similarity measure, Time series data mining

中图分类号: 

  • TV121
[1] ZHANG J Y,PAN Q,ZHANG P,et al.Time series similarity measurement method based on slope representation [J].Pattern Recognition and Artificial Intelligence,2007,20(2):271-274.
[2] DONG X L,GU C K,WANG Z G.Research on morphology-based time series similarity measurement [J].Journal of Electronics and Information Technology,2007,29 (5):1228-1231.
[3] LI H L,YANG L B.Time series similarity measurement method based on incremental dynamic time warping [J].Computer Science,2013,40(4):227-230.
[4] BAGNALL A,HILLS J,LINES J.Finding Motif Sets in Time Series[J].Bmc Public Health,2014,12(1):1-11.
[5] LI Z X,LI K W,WU H S.Similarity measure for multivariate time series based on dynamic time warping[C]//The 2016 International Conference.2016.
[6] DUCHNE F,GARBAY C,RIALLE V.Similarity measure for heterogeneous multivariate time-series[C]//Proceeding of the 12th European Signal Processing Conference.2004:7-1.
[7] SHEN J Y,HUANG W P,ZHU D Y,et al.A Novel Similarity Measure Model for Multivariate Time Series Based on LMNN and DTW[J].Neural Processing Letters,2017,45(3):925-937.
[8] LI S J,ZHU Y L,ZHANG X H,et al.Similarity analysis ofmultiple hydrological time series based on BORDA counting method [J].Journal of Hydraulic Engineering,2009,40(3):378-384.
[9] KARAMITOPOULOS L,EVANGELIDIS G,DERVOS D.PCA-based Time Series Similarity Search[C]//Data Mining.2010:255-276.
[10] VAN DER MAATEN L.Accelerating t-SNE using tree-basedalgorithms[J].The Journal of Machine Learning Research,2014,15(1):3221-3245.
[11] TANG J,LIU J Z,ZHANG M,et al.Visualizing Large-scale and High-dimensional Data[C]//International Conference on World Wide Web.2016:287-297.
[12] CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2016.
[13] MURATA N,YOSHIZAWA S,AMARI S.Learning curves,model selection and complexity of neural networks[C]//Proceedings of the 1992 Conference.San Mateo.CA.Morgan Kaufmann,1993: 607-614.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[4] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[5] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[6] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[7] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[8] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[9] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[10] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[11] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[12] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[13] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[14] 李野, 陈松灿.
基于物理信息的神经网络:最新进展与展望
Physics-informed Neural Networks:Recent Advances and Prospects
计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158
[15] 章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江.
基于眼前节相干光断层扫描成像的核性白内障分类算法
Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image
计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!