计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 178-183.doi: 10.11896/jsjkx.200500145
陈静杰1,2,3, 王琨2,4
CHEN Jing-jie1,2,3, WANG Kun2,4
摘要: 对油耗数据进行区间预测时,数据的不平衡性会导致一般的区间预测方法得到的预测区间质量较低。针对上述问题,提出了基于SMOTE-XGBoost算法的区间预测模型。采用SMOTE算法增加训练集中少数类样本的数量,消除了训练集数据的不平衡性;对XGBoost算法的分位数损失函数进行改进,平滑其一阶导数原点周围的小区域,解决了分位数损失函数对树分裂的影响;通过训练区间预测模型,得到预测区间的上下界。最后基于QAR数据集进行对比实验,结果表明,该方法使预测区间具有较高的区间覆盖率和较窄的区间宽度,提高了预测区间的质量。
中图分类号:
[1]MICHAELOWA A.Tackling CO2 emissions from international aviation:challenges and opportunities generated by the market mechanism ‘CORSIA’[J].EDA Insight,2016,2(11):1-7. [2]STROUHAL M.CORSIA-Carbon Offsetting and ReductionScheme for International Aviation[J].MAD-Magazine of Aviation Development,2020,8(1):21-26. [3]VILAR J,ANEIROS G,RAÑA P.Prediction intervals for electricity demand and price using functional data[J].International Journal of Electrical Power & Energy Systems,2018,96(3):457-472. [4]NOWOTARSKI J,WERON R.Computing electricity spot price prediction intervals using quantile regression and forecast averaging[J].Computational Statistics,2015,30(3):791-803. [5]MENG Y,ZHANG B,YAN Y M.Prediction Interval Estimation Model of User Concurrent Requests for Cloud Service in Cloud Environment[J].Chinese Journal of Computers,2017,40(2):378-396. [6]ROY M H,LAROCQUE D.Prediction intervals with random forests[J].Statistical Methods in Medical Research,2020,29(1):205-229. [7]VERBOIS H,RUSYDI A,THIERY A.Probabilistic forecasting of day-ahead solar irradiance using quantile gradient boosting[J].Solar Energy,2018,173:313-327. [8]PENG Z,WANG L Q,GUO H.Parallel Text Categorization of Random Forest[J].Computer Science,2018,45(12):148-152. [9]ZHANG H,ZIMMERMAN J,NETTLETON D,et al.Random forest prediction intervals[J].The American Statistician,2020,74(4):392-406. [10]HUANG J,ZHU L,FAN B,et al.Large-Scale Price Interval Prediction at OTA Sites[J].IEEE Access,2018,6:69807-69817. [11]CHEN T,GUESTRIN C.XGBoost:A scalable tree boostingsystem[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining.2016:785-794. [12]KAUR H,PANNU H S,MALHI A K.A systematic review on imbalanced data challenges in machine learning:Applications and solutions[J].ACM Computing Surveys (CSUR),2019,52(4):1-36. [13]GUO H X,LI Y J,SHANG J,et al.Learning from class-imba-lanced data:Review of methods and applications[J].Expert Systems With Applications,2016,73:220-239. [14]ZHENG Z,CAI Y,LI Y.Oversampling method for imbalanced classification[J].Computing and Informatics,2016,34(5):1017-1037. [15]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357. [16]FERNÁNDEZ A,GARCIA S,HERRERA F,et al.SMOTE for learning from imbalanced data:progress and challenges,marking the 15-year anniversary[J].Journal of Artificial Intelligence Research,2018,61:863-905. [17]KOENKER R,BASSETT J G.Regression quantiles.Econo-metrica[J].Journal of the Econometric Society,1978,46(1) 1:33-50. [18]QUAN H,KHOSRAVI A,YANG D,et al.A survey of computational intelligence techniques for wind power uncertainty quantification in smart grids[J].IEEE Transactions on Neural Networks and Learning Systems,2019,31(11):4582-4599. |
[1] | 孙福权, 梁莹. 基于XGBoost算法的水稻基因组6mA位点识别研究 Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm 计算机科学, 2022, 49(6A): 309-313. https://doi.org/10.11896/jsjkx.210700262 |
[2] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[3] | 周志豪, 陈磊, 伍翔, 丘东亮, 梁广升, 曾凡巧. 基于SMOTE-SDSAE-SVM的车载CAN总线入侵检测算法 SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm 计算机科学, 2022, 49(6A): 562-570. https://doi.org/10.11896/jsjkx.210700106 |
[4] | 董奇达, 王喆, 吴松洋. 结合注意力机制与几何信息的特征融合框架 Feature Fusion Framework Combining Attention Mechanism and Geometric Information 计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180 |
[5] | 李京泰, 王晓丹. 基于代价敏感激活函数XGBoost的不平衡数据分类方法 XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function 计算机科学, 2022, 49(5): 135-143. https://doi.org/10.11896/jsjkx.210400064 |
[6] | 郑建华, 李小敏, 刘双印, 李迪. 融合级联上采样与下采样的改进随机森林不平衡数据分类算法 Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling 计算机科学, 2021, 48(7): 145-154. https://doi.org/10.11896/jsjkx.200800120 |
[7] | 张人之, 朱焱. 基于主动学习的社交网络恶意用户检测方法 Malicious User Detection Method for Social Network Based on Active Learning 计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151 |
[8] | 刘全明, 李尹楠, 郭婷, 李岩纬. 基于Borderline-SMOTE和双Attention的入侵检测方法 Intrusion Detection Method Based on Borderline-SMOTE and Double Attention 计算机科学, 2021, 48(3): 327-332. https://doi.org/10.11896/jsjkx.200600025 |
[9] | 龚追飞, 魏传佳. 基于拓扑相似和XGBoost的复杂网络链路预测方法 Complex Network Link Prediction Method Based on Topology Similarity and XGBoost 计算机科学, 2021, 48(12): 226-230. https://doi.org/10.11896/jsjkx.200800026 |
[10] | 王晓迪, 刘鑫, 于晓. 用于多元时间序列预测的自适应频域模型 Adaptive Frequency Domain Model for Multivariate Time Series Forecasting 计算机科学, 2021, 48(11A): 204-210. https://doi.org/10.11896/jsjkx.210500129 |
[11] | 王萧萧, 王亭雯, 马玉玲, 范佳奕, 崔超然. 基于深度森林的P2P网贷借款人信用风险评估方法 Credit Risk Assessment Method of P2P Online Loan Borrowers Based on Deep Forest 计算机科学, 2021, 48(11A): 429-434. https://doi.org/10.11896/jsjkx.201000013 |
[12] | 王茂光, 杨行. 一种基于AP-Entropy选择集成的风控模型和算法 Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble 计算机科学, 2021, 48(11A): 71-76. https://doi.org/10.11896/jsjkx.210200110 |
[13] | 宋玲玲, 王时绘, 杨超, 盛潇. 改进的XGBoost在不平衡数据处理中的应用研究 Application Research of Improved XGBoost in Imbalanced Data Processing 计算机科学, 2020, 47(6): 98-103. https://doi.org/10.11896/jsjkx.191200138 |
[14] | 向伟, 王新维. 基于多类邻域三支决策模型的不平衡数据分类 Imbalance Data Classification Based on Model of Multi-class Neighbourhood Three-way Decision 计算机科学, 2020, 47(5): 103-109. https://doi.org/10.11896/jsjkx.180601099 |
[15] | 王晓晖, 张亮, 李俊清, 孙玉翠, 田捷, 韩睿毅. 基于遗传算法与随机森林的XGBoost改进方法研究 Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest 计算机科学, 2020, 47(11A): 454-458. https://doi.org/10.11896/jsjkx.200600002 |
|