计算机科学 ›› 2019, Vol. 46 ›› Issue (11A): 599-603.

• 综合、交叉与应用 • 上一篇    下一篇

基于SCRF的抽油井结蜡预测方法优化研究

王利君, 支志英, 贾鹿, 李伟   

  1. (中国石油新疆油田分公司数据公司 新疆 克拉玛依834000)
  • 出版日期:2019-11-10 发布日期:2019-11-20
  • 通讯作者: 王利君(1987-),女,硕士,工程师,主要研究方向为油田信息化规划研究、大数据分析、系统规划与设计,E-mail:wanglj_xj@petrochina.com。
  • 作者简介:支志英(1966-),女,高级工程师,主要研究方向为油田信息化规划研究、系统规划与设计,E-mail:zzy@petrochina.com.cn。
  • 基金资助:
    本文受新疆油田公司2018年信息科研项目(1.2)资助。

Study on Optimized Method for Predicting Paraffin Deposition of Pumping Wells Based on SCRF

WANG Li-jun, ZHI Zhi-ying, JIA Lu, LI Wei   

  1. (Data Company of Petrochina Xinjiang Oilfield Company,Karamay,Xinjiang 834000,China)
  • Online:2019-11-10 Published:2019-11-20

摘要: 在油田生产过程中,油井受各种因素的影响容易发生结蜡。油井结蜡通常会降低油井产生,造成油井阻塞,甚至会造成停井及烧电机等现象,大大增加采油成本。对抽油井结蜡状态进行提前预测,实现抽油井设备预见性维护对油田降本增效及智能化管理具有重要意义。针对基于不平衡数据集构建结蜡预测模型预测效果不理想的问题,文中提出了一种面向非平衡数据的集成学习方法SCRF(SMOTE CLUSTER RANDOM FOREST)。该方法首先使用SMOTE方法对原数据集中的少数类进行过采样以增加少数类的数量,缩小不平衡比例;然后对新的数据集采用CLUSTER聚类方法分层欠采样,生成训练数据集;最后采用基于bagging技术的随机森林算法对训练数据集进行集成学习,从而生成预测模型。实验结果表明,样本均衡后模型预测效果更佳,预测精度和效率都有一定程度的提高。

关键词: 不平衡数据分类, 集成学习, 结蜡预测模型, 样本均衡

Abstract: In the production process of oil field,paraffin deposition is easy to occur for oil wells affected by various factors.Paraffin deposition usually causes blockage of oil wells,and even causes well stuck or overload burning of electric motors,which will greatly reduce oil well production and increase the cost of oil production.So predicting the paraffin deposition state of pumping wells in advance and realizing predictive maintenance for pumping wells equipment,can reduce the cost and increase efficiency for oil fields,which have great significance on intelligent management.In order to improve the accuracy of paraffin deposition prediction based on unbalanced data set for pumping wells,this paper proposed an integrated learning method named SCRF for unbalanced data.Firstly,SMOTE method is used to oversample a few classes in the original data set to increase the number of minority classes and reduce the unbalanced proportion.Then CLUSTER clustering method is used to stratify and undersample the new data set to generate the training data set.Finally,random forest algorithm based on bagging technology is used to integrate the training data set,so as to ge-nerate the prediction model.The experimental results show that the prediction effect of the model is better after sample equalization,whilethe prediction efficiency and accuracy are improved to a certain extent.

Key words: Integration algorithm, Paraffindeposition prediction model, Sample balance processing, Unbalance dataset classification

中图分类号: 

  • TP3
[1]吴大康,吴学庆,李媛.油井清蜡周期预测方法探讨[J].广东化工,2013,39(16):53-55.
[2]王利中.油井结蜡速度及清蜡周期预测[J].西部探矿工程,2003,15(11):54-55.
[3]支志英,王利君,蔡志强.基于大数据分析的抽油井结蜡预测方法研究[J].信息化建设,2016(2):28-29.
[4]向鸿鑫,杨云.不平衡数据挖掘方法综述[J].计算机工程与应用,2019,55(4):1-16.
[5]JIANG K,LU J,XIA K L.A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE[J].Arabian Journal for Science & Engineering,2016,41(8):3255-3266.
[6]李艳霞,柴毅,胡友强,等.不平衡数据分类方法综述[J].控制与决策,2019,34(4):673-688.
[7]王伟,谢耀滨,尹青.针对不平衡数据的决策树改进方法[J].计算机应用,2019(3):623-628.
[8]WANG C X,PAN Z M,MA C S,et al.Classification for Imbanlanceddataset of Impoved Weighted KNN Algorithm[J].Computer Engineering,2012,38(20):160-163.
[9]于化龙,祁云嵩,杨习贝,等.类不平衡模糊加权极限学习机算法研究[J].计算机科学与探索,2017,11(4):619-632.
[10]REN S,LIAO B,ZHU W,et al.The Gradual Resampling Ensemble For Mining Inbalanced Data Steams With Concept Drift[J].Neurocomputing,2018,286:150-166.
[11]CHAWLA N V,BOWYER K W,HALL L O,et al.Smote:Synthetic Minority Over-SamplingTechnique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[12]LIN W C,TSAI C F,HU Y H,et al.Clustering-Based Under-Sampling In Class-Inbalanced Data[J].Information Sciences,2017,409/410:17-26.
[13]GEAPA B,RC P,MC M.A study of the behavior of several methods for balancing machine learning training data[J].ACM Sigkdd Explorations Newsletter,2004,6(1):20-29.
[14]IRTAZA A,ADNAN S M,AHMED K T,et al.An ensemblebased evolutionary approach to the class imbalance problem with applications in CBIR[J].Applied Sciences,2018,8(4):495.
[15]GALAR M,FERNANDEZ A,BARRENECHEA E,et al.EUSBoost:enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling[J].Pattern Recognition,2013,46(12):3460-3471.
[16]魏勋,蒋凡.基于大规模不平衡数据集的糖尿病诊断研究[J].计算机系统应用,2018,27(1):219-224.
[17]李克文,杨磊,刘文英,等.基于RSBoost算法的不平衡数据分类方法[J].计算机科学,2015,42(9):249-252,267.
[18]于玲,吴铁军.集成学习:Boosting算法综述[J].模式识别与人工智能,2004,17(1):52-59.
[19]GAO S.An ensemble classifier learning approach to ROC optimizationPattern Recognition;Patttern Recognition[C]∥18th International Conference on ICPR.2006:679-782.
[20]HAND D J,TILL R J.A Simple Generalisation of the Area Un-der the ROC Curve for Multiple Class Classification Problems[J].Machine Learning,2001,45(2):171-186.
[1] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[2] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[3] 王宇飞, 陈文.
基于DECORATE集成学习与置信度评估的Tri-training算法
Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment
计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043
[4] 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳.
基于共同子空间分类学习的跨媒体检索研究
Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning
计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157
[5] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[6] 陈伟, 李杭, 李维华.
核小体定位预测的集成学习方法
Ensemble Learning Method for Nucleosome Localization Prediction
计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195
[7] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[8] 周新民, 胡宜桂, 刘文洁, 孙荣俊.
基于多模态多层级数据融合方法的城市功能识别研究
Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method
计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220
[9] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[10] 戴宗明, 胡凯, 谢捷, 郭亚.
基于直觉模糊集的集成学习算法
Ensemble Learning Algorithm Based on Intuitionistic Fuzzy Sets
计算机科学, 2021, 48(6A): 270-274. https://doi.org/10.11896/jsjkx.200700036
[11] 郇文明, 林海涛.
基于采样集成算法的入侵检测系统设计
Design of Intrusion Detection System Based on Sampling Ensemble Algorithm
计算机科学, 2021, 48(11A): 705-712. https://doi.org/10.11896/jsjkx.201100101
[12] 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲.
FS-CRF:基于特征切分与级联随机森林的异常点检测模型
FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest
计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162
[13] 钟熙, 孙祥娥.
基于Kmeans++聚类的朴素贝叶斯集成方法研究
Research on Naive Bayes Ensemble Method Based on Kmeans++ Clustering
计算机科学, 2019, 46(6A): 439-441.
[14] 曹雅茜, 黄海燕.
基于概率采样和集成学习的不平衡数据分类算法
Imbalanced Data Classification Algorithm Based on Probability Sampling and Ensemble Learning
计算机科学, 2019, 46(5): 203-208. https://doi.org/10.11896/j.issn.1002-137X.2019.05.031
[15] 胡海根, 孔祥勇, 周乾伟, 管秋, 陈胜勇.
基于深层卷积残差网络集成的黑色素瘤分类方法
Melanoma Classification Method by Integrating Deep Convolutional Residual Network
计算机科学, 2019, 46(5): 247-253. https://doi.org/10.11896/j.issn.1002-137X.2019.05.038
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!