Computer Science ›› 2021, Vol. 48 ›› Issue (7): 178-183.doi: 10.11896/jsjkx.200500145

• Database & Big Data & Data Science • Previous Articles     Next Articles

Interval Prediction Method for Imbalanced Fuel Consumption Data

CHEN Jing-jie1,2,3, WANG Kun2,4   

  1. 1 College of Electronic Information and Automation,Civil Aviation University of China,Tianjin 300300,China
    2 Research Center for Environment and Sustainable Development of CAAC,Tianjin 300300,China
    3 National Engineering Laboratory for Integrated Traffic Data Application Technology,Tianjin 300300,China
    4 College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China
  • Received:2020-05-28 Revised:2020-10-27 Online:2021-07-15 Published:2021-07-02
  • About author:CHEN Jing-jie,born in 1967,Ph.D,professor.His main research interests include energy efficiency management and carbon emission control in civil aviation transportation.
  • Supported by:
    Sino-US Green Route Pilot Program(GH201661279).

Abstract: Fuel consumption data is imbalanced,which leads to the lower quality prediction interval.Aiming at this problem,an interval prediction model based on SMOTE-XGBoost algorithm is proposed.From the perspective of oversampling,the SMOTE algorithm is used to increase the number of minority samples in the training set,so that the imbalance of data in the training set is eliminated.For the interval prediction task,the quantile loss function is used as the loss function of the XGBoost algorithm.At the same time,by smoothing the small area around the origin of its first derivative,the quantile loss function is improved to solve the problem that the quantile loss function causes the tree in the XGBoost algorithm to not split.Based on the above work,the XGBoost algorithm and SMOTE algorithm are combined to train the interval prediction model,and finally the upper and lower bound of the prediction interval are obtained respectively.Conducting experiments based on the QAR data set,the experiment results indicate that compared with other methods,this method makes the prediction interval have higher interval coverage and narrower interval width,which improves the quality of the prediction interval.

Key words: Fuel consumption, Imbalanced data, Interval prediction, Quick Access Recorder(QAR) data, SMOTE, XGBoost

CLC Number: 

  • TP391
[1]MICHAELOWA A.Tackling CO2 emissions from international aviation:challenges and opportunities generated by the market mechanism ‘CORSIA’[J].EDA Insight,2016,2(11):1-7.
[2]STROUHAL M.CORSIA-Carbon Offsetting and ReductionScheme for International Aviation[J].MAD-Magazine of Aviation Development,2020,8(1):21-26.
[3]VILAR J,ANEIROS G,RAÑA P.Prediction intervals for electricity demand and price using functional data[J].International Journal of Electrical Power & Energy Systems,2018,96(3):457-472.
[4]NOWOTARSKI J,WERON R.Computing electricity spot price prediction intervals using quantile regression and forecast averaging[J].Computational Statistics,2015,30(3):791-803.
[5]MENG Y,ZHANG B,YAN Y M.Prediction Interval Estimation Model of User Concurrent Requests for Cloud Service in Cloud Environment[J].Chinese Journal of Computers,2017,40(2):378-396.
[6]ROY M H,LAROCQUE D.Prediction intervals with random forests[J].Statistical Methods in Medical Research,2020,29(1):205-229.
[7]VERBOIS H,RUSYDI A,THIERY A.Probabilistic forecasting of day-ahead solar irradiance using quantile gradient boosting[J].Solar Energy,2018,173:313-327.
[8]PENG Z,WANG L Q,GUO H.Parallel Text Categorization of Random Forest[J].Computer Science,2018,45(12):148-152.
[9]ZHANG H,ZIMMERMAN J,NETTLETON D,et al.Random forest prediction intervals[J].The American Statistician,2020,74(4):392-406.
[10]HUANG J,ZHU L,FAN B,et al.Large-Scale Price Interval Prediction at OTA Sites[J].IEEE Access,2018,6:69807-69817.
[11]CHEN T,GUESTRIN C.XGBoost:A scalable tree boostingsystem[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining.2016:785-794.
[12]KAUR H,PANNU H S,MALHI A K.A systematic review on imbalanced data challenges in machine learning:Applications and solutions[J].ACM Computing Surveys (CSUR),2019,52(4):1-36.
[13]GUO H X,LI Y J,SHANG J,et al.Learning from class-imba-lanced data:Review of methods and applications[J].Expert Systems With Applications,2016,73:220-239.
[14]ZHENG Z,CAI Y,LI Y.Oversampling method for imbalanced classification[J].Computing and Informatics,2016,34(5):1017-1037.
[15]CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[16]FERNÁNDEZ A,GARCIA S,HERRERA F,et al.SMOTE for learning from imbalanced data:progress and challenges,marking the 15-year anniversary[J].Journal of Artificial Intelligence Research,2018,61:863-905.
[17]KOENKER R,BASSETT J G.Regression quantiles.Econo-metrica[J].Journal of the Econometric Society,1978,46(1) 1:33-50.
[18]QUAN H,KHOSRAVI A,YANG D,et al.A survey of computational intelligence techniques for wind power uncertainty quantification in smart grids[J].IEEE Transactions on Neural Networks and Learning Systems,2019,31(11):4582-4599.
[1] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[2] SUN Fu-quan, LIANG Ying. Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm [J]. Computer Science, 2022, 49(6A): 309-313.
[3] ZHOU Zhi-hao, CHEN Lei, WU Xiang, QIU Dong-liang, LIANG Guang-sheng, ZENG Fan-qiao. SMOTE-SDSAE-SVM Based Vehicle CAN Bus Intrusion Detection Algorithm [J]. Computer Science, 2022, 49(6A): 562-570.
[4] DONG Qi-da, WANG Zhe, WU Song-yang. Feature Fusion Framework Combining Attention Mechanism and Geometric Information [J]. Computer Science, 2022, 49(5): 129-134.
[5] LI Jing-tai, WANG Xiao-dan. XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function [J]. Computer Science, 2022, 49(5): 135-143.
[6] JIANG Hao-chen, WEI Zi-qi, LIU Lin, CHEN Jun. Imbalanced Data Classification:A Survey and Experiments in Medical Domain [J]. Computer Science, 2022, 49(1): 80-88.
[7] ZHANG Ren-zhi, ZHU Yan. Malicious User Detection Method for Social Network Based on Active Learning [J]. Computer Science, 2021, 48(6): 332-337.
[8] LIU Quan-ming, LI Yin-nan, GUO Ting, LI Yan-wei. Intrusion Detection Method Based on Borderline-SMOTE and Double Attention [J]. Computer Science, 2021, 48(3): 327-332.
[9] GONG Zhui-fei, WEI Chuan-jia. Complex Network Link Prediction Method Based on Topology Similarity and XGBoost [J]. Computer Science, 2021, 48(12): 226-230.
[10] WANG Mao-guang, YANG Hang. Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble [J]. Computer Science, 2021, 48(11A): 71-76.
[11] WANG Xiao-di, LIU Xin, YU Xiao. Adaptive Frequency Domain Model for Multivariate Time Series Forecasting [J]. Computer Science, 2021, 48(11A): 204-210.
[12] LU Shu-xia, ZHANG Zhen-lian. Imbalanced Data Classification of AdaBoostv Algorithm Based on Optimum Margin [J]. Computer Science, 2021, 48(11): 184-191.
[13] CUI Wei, JIA Xiao-lin, FAN Shuai-shuai and ZHU Xiao-yan. New Associative Classification Algorithm for Imbalanced Data [J]. Computer Science, 2020, 47(6A): 488-493.
[14] SONG Ling-ling, WANG Shi-hui, YANG Chao, SHENG Xiao. Application Research of Improved XGBoost in Imbalanced Data Processing [J]. Computer Science, 2020, 47(6): 98-103.
[15] ZHAO Rui-jie, SHI Yong, ZHANG Han, LONG Jun, XUE Zhi. Webshell File Detection Method Based on TF-IDF [J]. Computer Science, 2020, 47(11A): 363-367.
Full text



No Suggested Reading articles found!