计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 599-603.
王利君, 支志英, 贾鹿, 李伟
WANG Li-jun, ZHI Zhi-ying, JIA Lu, LI Wei
摘要: 在油田生产过程中,油井受各种因素的影响容易发生结蜡。油井结蜡通常会降低油井产生,造成油井阻塞,甚至会造成停井及烧电机等现象,大大增加采油成本。对抽油井结蜡状态进行提前预测,实现抽油井设备预见性维护对油田降本增效及智能化管理具有重要意义。针对基于不平衡数据集构建结蜡预测模型预测效果不理想的问题,文中提出了一种面向非平衡数据的集成学习方法SCRF(SMOTE CLUSTER RANDOM FOREST)。该方法首先使用SMOTE方法对原数据集中的少数类进行过采样以增加少数类的数量,缩小不平衡比例;然后对新的数据集采用CLUSTER聚类方法分层欠采样,生成训练数据集;最后采用基于bagging技术的随机森林算法对训练数据集进行集成学习,从而生成预测模型。实验结果表明,样本均衡后模型预测效果更佳,预测精度和效率都有一定程度的提高。
中图分类号:
[1]吴大康,吴学庆,李媛.油井清蜡周期预测方法探讨[J].广东化工,2013,39(16):53-55. [2]王利中.油井结蜡速度及清蜡周期预测[J].西部探矿工程,2003,15(11):54-55. [3]支志英,王利君,蔡志强.基于大数据分析的抽油井结蜡预测方法研究[J].信息化建设,2016(2):28-29. [4]向鸿鑫,杨云.不平衡数据挖掘方法综述[J].计算机工程与应用,2019,55(4):1-16. [5]JIANG K,LU J,XIA K L.A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE[J].Arabian Journal for Science & Engineering,2016,41(8):3255-3266. [6]李艳霞,柴毅,胡友强,等.不平衡数据分类方法综述[J].控制与决策,2019,34(4):673-688. [7]王伟,谢耀滨,尹青.针对不平衡数据的决策树改进方法[J].计算机应用,2019(3):623-628. [8]WANG C X,PAN Z M,MA C S,et al.Classification for Imbanlanceddataset of Impoved Weighted KNN Algorithm[J].Computer Engineering,2012,38(20):160-163. [9]于化龙,祁云嵩,杨习贝,等.类不平衡模糊加权极限学习机算法研究[J].计算机科学与探索,2017,11(4):619-632. [10]REN S,LIAO B,ZHU W,et al.The Gradual Resampling Ensemble For Mining Inbalanced Data Steams With Concept Drift[J].Neurocomputing,2018,286:150-166. [11]CHAWLA N V,BOWYER K W,HALL L O,et al.Smote:Synthetic Minority Over-SamplingTechnique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357. [12]LIN W C,TSAI C F,HU Y H,et al.Clustering-Based Under-Sampling In Class-Inbalanced Data[J].Information Sciences,2017,409/410:17-26. [13]GEAPA B,RC P,MC M.A study of the behavior of several methods for balancing machine learning training data[J].ACM Sigkdd Explorations Newsletter,2004,6(1):20-29. [14]IRTAZA A,ADNAN S M,AHMED K T,et al.An ensemblebased evolutionary approach to the class imbalance problem with applications in CBIR[J].Applied Sciences,2018,8(4):495. [15]GALAR M,FERNANDEZ A,BARRENECHEA E,et al.EUSBoost:enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J].Pattern Recognition,2013,46(12):3460-3471. [16]魏勋,蒋凡.基于大规模不平衡数据集的糖尿病诊断研究[J].计算机系统应用,2018,27(1):219-224. [17]李克文,杨磊,刘文英,等.基于RSBoost算法的不平衡数据分类方法[J].计算机科学,2015,42(9):249-252,267. [18]于玲,吴铁军.集成学习:Boosting算法综述[J].模式识别与人工智能,2004,17(1):52-59. [19]GAO S.An ensemble classifier learning approach to ROC optimizationPattern Recognition;Patttern Recognition[C]∥18th International Conference on ICPR.2006:679-782. [20]HAND D J,TILL R J.A Simple Generalisation of the Area Un-der the ROC Curve for Multiple Class Classification Problems[J].Machine Learning,2001,45(2):171-186. |
[1] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[2] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
[3] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[4] | 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳. 基于共同子空间分类学习的跨媒体检索研究 Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning 计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157 |
[5] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[6] | 陈伟, 李杭, 李维华. 核小体定位预测的集成学习方法 Ensemble Learning Method for Nucleosome Localization Prediction 计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195 |
[7] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
[8] | 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究 Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method 计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220 |
[9] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[10] | 戴宗明, 胡凯, 谢捷, 郭亚. 基于直觉模糊集的集成学习算法 Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets 计算机科学, 2021, 48(6A): 270-274. https://doi.org/10.11896/jsjkx.200700036 |
[11] | 郇文明, 林海涛. 基于采样集成算法的入侵检测系统设计 Design of Intrusion Detection System Based on Sampling Ensemble Algorithm 计算机科学, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101 |
[12] | 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲. FS-CRF:基于特征切分与级联随机森林的异常点检测模型 FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest 计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162 |
[13] | 钟熙, 孙祥娥. 基于Kmeans++聚类的朴素贝叶斯集成方法研究 Research on Naive Bayes Ensemble Method Based on Kmeans++ Clustering 计算机科学, 2019, 46(6A): 439-441. |
[14] | 曹雅茜, 黄海燕. 基于概率采样和集成学习的不平衡数据分类算法 Imbalanced Data Classification Algorithm Based on Probability Sampling and Ensemble Learning 计算机科学, 2019, 46(5): 203-208. https://doi.org/10.11896/j.issn.1002-137X.2019.05.031 |
[15] | 胡海根, 孔祥勇, 周乾伟, 管秋, 陈胜勇. 基于深层卷积残差网络集成的黑色素瘤分类方法 Melanoma Classification Method by Integrating Deep Convolutional Residual Network 计算机科学, 2019, 46(5): 247-253. https://doi.org/10.11896/j.issn.1002-137X.2019.05.038 |
|