计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 178-183.doi: 10.11896/jsjkx.200500145

• 数据库&大数据&数据科学 • 上一篇    下一篇

不平衡油耗数据的区间预测方法

陈静杰1,2,3, 王琨2,4   

  1. 1 中国民航大学电子信息与自动化学院 天津300300
    2 中国民航环境与可持续发展中心(智库) 天津300300
    3 综合交通大数据应用技术国家工程实验室 天津300300
    4 中国民航大学计算机科学与技术学院 天津300300
  • 收稿日期:2020-05-28 修回日期:2020-10-27 出版日期:2021-07-15 发布日期:2021-07-02
  • 通讯作者: 陈静杰(jjchen@cauc.edu.cn)
  • 基金资助:
    中美绿色航线项目(GH201661279)

Interval Prediction Method for Imbalanced Fuel Consumption Data

CHEN Jing-jie1,2,3, WANG Kun2,4   

  1. 1 College of Electronic Information and Automation,Civil Aviation University of China,Tianjin 300300,China
    2 Research Center for Environment and Sustainable Development of CAAC,Tianjin 300300,China
    3 National Engineering Laboratory for Integrated Traffic Data Application Technology,Tianjin 300300,China
    4 College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China
  • Received:2020-05-28 Revised:2020-10-27 Online:2021-07-15 Published:2021-07-02
  • About author:CHEN Jing-jie,born in 1967,Ph.D,professor.His main research interests include energy efficiency management and carbon emission control in civil aviation transportation.
  • Supported by:
    Sino-US Green Route Pilot Program(GH201661279).

摘要: 对油耗数据进行区间预测时,数据的不平衡性会导致一般的区间预测方法得到的预测区间质量较低。针对上述问题,提出了基于SMOTE-XGBoost算法的区间预测模型。采用SMOTE算法增加训练集中少数类样本的数量,消除了训练集数据的不平衡性;对XGBoost算法的分位数损失函数进行改进,平滑其一阶导数原点周围的小区域,解决了分位数损失函数对树分裂的影响;通过训练区间预测模型,得到预测区间的上下界。最后基于QAR数据集进行对比实验,结果表明,该方法使预测区间具有较高的区间覆盖率和较窄的区间宽度,提高了预测区间的质量。

关键词: Quick Access Recorder(QAR)数据, SMOTE, XGBoost, 不平衡数据, 区间预测, 油耗

Abstract: Fuel consumption data is imbalanced,which leads to the lower quality prediction interval.Aiming at this problem,an interval prediction model based on SMOTE-XGBoost algorithm is proposed.From the perspective of oversampling,the SMOTE algorithm is used to increase the number of minority samples in the training set,so that the imbalance of data in the training set is eliminated.For the interval prediction task,the quantile loss function is used as the loss function of the XGBoost algorithm.At the same time,by smoothing the small area around the origin of its first derivative,the quantile loss function is improved to solve the problem that the quantile loss function causes the tree in the XGBoost algorithm to not split.Based on the above work,the XGBoost algorithm and SMOTE algorithm are combined to train the interval prediction model,and finally the upper and lower bound of the prediction interval are obtained respectively.Conducting experiments based on the QAR data set,the experiment results indicate that compared with other methods,this method makes the prediction interval have higher interval coverage and narrower interval width,which improves the quality of the prediction interval.

Key words: Fuel consumption, Imbalanced data, Interval prediction, Quick Access Recorder(QAR) data, SMOTE, XGBoost

中图分类号: 

  • TP391
[1]MICHAELOWA A.Tackling CO2 emissions from international aviation:challenges and opportunities generated by the market mechanism ‘CORSIA’[J].EDA Insight,2016,2(11):1-7.
[2]STROUHAL M.CORSIA-Carbon Offsetting and ReductionScheme for International Aviation[J].MAD-Magazine of Aviation Development,2020,8(1):21-26.
[3]VILAR J,ANEIROS G,RAÑA P.Prediction intervals for electricity demand and price using functional data[J].International Journal of Electrical Power & Energy Systems,2018,96(3):457-472.
[4]NOWOTARSKI J,WERON R.Computing electricity spot price prediction intervals using quantile regression and forecast averaging[J].Computational Statistics,2015,30(3):791-803.
[5]MENG Y,ZHANG B,YAN Y M.Prediction Interval Estimation Model of User Concurrent Requests for Cloud Service in Cloud Environment[J].Chinese Journal of Computers,2017,40(2):378-396.
[6]ROY M H,LAROCQUE D.Prediction intervals with random forests[J].Statistical Methods in Medical Research,2020,29(1):205-229.
[7]VERBOIS H,RUSYDI A,THIERY A.Probabilistic forecasting of day-ahead solar irradiance using quantile gradient boosting[J].Solar Energy,2018,173:313-327.
[8]PENG Z,WANG L Q,GUO H.Parallel Text Categorization of Random Forest[J].Computer Science,2018,45(12):148-152.
[9]ZHANG H,ZIMMERMAN J,NETTLETON D,et al.Random forest prediction intervals[J].The American Statistician,2020,74(4):392-406.
[10]HUANG J,ZHU L,FAN B,et al.Large-Scale Price Interval Prediction at OTA Sites[J].IEEE Access,2018,6:69807-69817.
[11]CHEN T,GUESTRIN C.XGBoost:A scalable tree boostingsystem[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining.2016:785-794.
[12]KAUR H,PANNU H S,MALHI A K.A systematic review on imbalanced data challenges in machine learning:Applications and solutions[J].ACM Computing Surveys (CSUR),2019,52(4):1-36.
[13]GUO H X,LI Y J,SHANG J,et al.Learning from class-imba-lanced data:Review of methods and applications[J].Expert Systems With Applications,2016,73:220-239.
[14]ZHENG Z,CAI Y,LI Y.Oversampling method for imbalanced classification[J].Computing and Informatics,2016,34(5):1017-1037.
[15]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[16]FERNÁNDEZ A,GARCIA S,HERRERA F,et al.SMOTE for learning from imbalanced data:progress and challenges,marking the 15-year anniversary[J].Journal of Artificial Intelligence Research,2018,61:863-905.
[17]KOENKER R,BASSETT J G.Regression quantiles.Econo-metrica[J].Journal of the Econometric Society,1978,46(1) 1:33-50.
[18]QUAN H,KHOSRAVI A,YANG D,et al.A survey of computational intelligence techniques for wind power uncertainty quantification in smart grids[J].IEEE Transactions on Neural Networks and Learning Systems,2019,31(11):4582-4599.
[1] 孙福权, 梁莹.
基于XGBoost算法的水稻基因组6mA位点识别研究
Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm
计算机科学, 2022, 49(6A): 309-313. https://doi.org/10.11896/jsjkx.210700262
[2] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[3] 周志豪, 陈磊, 伍翔, 丘东亮, 梁广升, 曾凡巧.
基于SMOTE-SDSAE-SVM的车载CAN总线入侵检测算法
SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm
计算机科学, 2022, 49(6A): 562-570. https://doi.org/10.11896/jsjkx.210700106
[4] 董奇达, 王喆, 吴松洋.
结合注意力机制与几何信息的特征融合框架
Feature Fusion Framework Combining Attention Mechanism and Geometric Information
计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180
[5] 李京泰, 王晓丹.
基于代价敏感激活函数XGBoost的不平衡数据分类方法
XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function
计算机科学, 2022, 49(5): 135-143. https://doi.org/10.11896/jsjkx.210400064
[6] 郑建华, 李小敏, 刘双印, 李迪.
融合级联上采样与下采样的改进随机森林不平衡数据分类算法
Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling
计算机科学, 2021, 48(7): 145-154. https://doi.org/10.11896/jsjkx.200800120
[7] 张人之, 朱焱.
基于主动学习的社交网络恶意用户检测方法
Malicious User Detection Method for Social Network Based on Active Learning
计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151
[8] 刘全明, 李尹楠, 郭婷, 李岩纬.
基于Borderline-SMOTE和双Attention的入侵检测方法
Intrusion Detection Method Based on Borderline-SMOTE and Double Attention
计算机科学, 2021, 48(3): 327-332. https://doi.org/10.11896/jsjkx.200600025
[9] 龚追飞, 魏传佳.
基于拓扑相似和XGBoost的复杂网络链路预测方法
Complex Network Link Prediction Method Based on Topology Similarity and XGBoost
计算机科学, 2021, 48(12): 226-230. https://doi.org/10.11896/jsjkx.200800026
[10] 王晓迪, 刘鑫, 于晓.
用于多元时间序列预测的自适应频域模型
Adaptive Frequency Domain Model for Multivariate Time Series Forecasting
计算机科学, 2021, 48(11A): 204-210. https://doi.org/10.11896/jsjkx.210500129
[11] 王萧萧, 王亭雯, 马玉玲, 范佳奕, 崔超然.
基于深度森林的P2P网贷借款人信用风险评估方法
Credit Risk Assessment Method of P2P Online Loan Borrowers Based on Deep Forest
计算机科学, 2021, 48(11A): 429-434. https://doi.org/10.11896/jsjkx.201000013
[12] 王茂光, 杨行.
一种基于AP-Entropy选择集成的风控模型和算法
Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble
计算机科学, 2021, 48(11A): 71-76. https://doi.org/10.11896/jsjkx.210200110
[13] 宋玲玲, 王时绘, 杨超, 盛潇.
改进的XGBoost在不平衡数据处理中的应用研究
Application Research of Improved XGBoost in Imbalanced Data Processing
计算机科学, 2020, 47(6): 98-103. https://doi.org/10.11896/jsjkx.191200138
[14] 向伟, 王新维.
基于多类邻域三支决策模型的不平衡数据分类
Imbalance Data Classification Based on Model of Multi-class Neighbourhood Three-way Decision
计算机科学, 2020, 47(5): 103-109. https://doi.org/10.11896/jsjkx.180601099
[15] 王晓晖, 张亮, 李俊清, 孙玉翠, 田捷, 韩睿毅.
基于遗传算法与随机森林的XGBoost改进方法研究
Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest
计算机科学, 2020, 47(11A): 454-458. https://doi.org/10.11896/jsjkx.200600002
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!