计算机科学 ›› 2019, Vol. 46 ›› Issue (6A): 392-396.

• 大数据与数据挖掘 • 上一篇    下一篇

基于GBDT的电力计量设备故障预测

刘金硕1, 刘必为2, 张密3, 刘卿4   

  1. 武汉大学国家网络安全学院 武汉4300701;
    武汉大学计算机学院 武汉4300702;
    中国电力科学研究院 北京1000893;
    天津电力科学研究院 天津3000414
  • 出版日期:2019-06-14 发布日期:2019-07-02
  • 通讯作者: 刘金硕 博士,副教授,主要研究方向为数据挖掘、大数据,E-mail:liujinshuo@whu.edu.cn(通信作者)。
  • 基金资助:
    本文受国网公司总部科技项目,国家自然科学基金(61672393)资助。

Fault Prediction of Power Metering Equipment Based on GBDT

LIU Jin-shuo1, LIU Bi-wei2, ZHANG Mi3, LIU Qing4   

  1. School of Cyber Science and Engineering,Wuhan University,Wuhan 430070,China1;
    School of Computer Science,Wuhan University,Wuhan 430070,China2;
    China Electric Power Research Institute,Beijing 100089,China3;
    Electric Power Science & Research Institute of Tianjin Electric Power Company,Tianjin 300041,China4
  • Online:2019-06-14 Published:2019-07-02

摘要: 电力计量设备的故障风险预测可以减少国家电网因为故障风险带来的损失。文中首先进行了数据的预处理和特征选取;其次,设计了基于GBDT的故障大类、故障小类以及设备寿命周期的预测;最后,对设计的模型进行了有效性和先进性的验证。实验在中国电力科研研究院提供的数据上进行。由实验结果可知,所提算法对6种故障类型的预测准确率为90.56%,查全率为92.95%,F1值为91.71%。相比回归、BP神经网络、Adaboost、决策树算法,梯度提升决策树算法在参数调优条件下的性能最优。

关键词: GBDT, 计量风险预测, 数据清洗

Abstract: The fault risk prediction of power metering equipment can reduce the loss caused by the fault risk of the national grid.Firstly,the data preprocessing and feature selection are carried out.Secondly,the GBDT-based fault categories,fault subclasses and equipment life cycle prediction are designed.Finally,the validity and advancement of the designed model are verified.Data used in the experiment are provided by China Electric Power Research Institute.The experimental results show that the prediction accuracy of the six fault types by using the proposed algorithm is 90.56%,the recall rate is 92.95%,and the F1 value is 91.71%.Compared with regression,BP neural network,Adaboost and decision tree algorithm,the gradient lifting decision tree algorithm has the best performance under parameter tuning conditions.

Key words: Data cleaning, GBDT, Measurement risk prediction

中图分类号: 

  • TP206+.3
[1]LIU J S.Analyzing Electricity Consumption via Date Mining .Journal of Wuhan University,2015,12(10):7-8.
[2]NIKOVSKI D N,WANG Z,ESENTHER A,et al.Smart meter data analysis for power theft detection∥Machine Learning and Data Mining in Pattern Recognition.Springer Berlin Heidelberg,2013.
[3]SAHOO S,NIKOVSKI D,MUSO T,et al.Electricity theft detection using smart meter data ∥IEEE Innovative Smart Grid Technologies Conference.2015:1-5.
[4]DEPURU S,et al.Support Vector Machine Based Data Classification for Detection of Electricity Theft .Power Systems Conference & Exposition,2011:1-8.
[5]ZAKARIA Z,LO K L.Two-stage fuzzy clustering approach for load profiling∥2009 Proceedings of the 44th International Universities Power Engineering Conference (UPEC).IEEE,2009.
[6]周开乐,沈超,丁帅.基于遗传算法得微电网负荷优化分配.中国管理科学,2014,22(3):68-73.
[7]刘永光,孙超亮,牛贞贞,等.改进型模糊C均值聚类算法的电力负荷特性分类技术研究.电测与仪表,2014,51(18):5-9.
[8]董瑞,黄民翔.基于减法聚类的FCM算法在电力负荷分类中的应用.华东电力,2014,42(5):917-921.
[9]BIDOKI S,MAHMOUDI-KOHAN N,GERAMI S.Comparison of several clustering methods in the case of electrical load curves classification ∥IEEE Electrical Power Distribution Networks.2011:1-7.
[10]MONEDERO I,et al.Detection of Frauds and Other Non-technical Losses in A Power Utility Using Pearson Networks and Decision Trees .International Journal & Energy Systems,2012,34(1):90-98.
[11]王立平,邓芳明.基于小波包和GBDT的瓦斯传感器故障诊断.测控技术,2016,35(12):30-33.
[1] 陈丹红, 彭张林, 万德全, 杨善林.
众包平台用户价值识别与细分:基于改进的RFM模型
Identification and Segmentation of User Value in Crowdsourcing Platforms:An Improved RFMModel
计算机科学, 2022, 49(4): 37-42. https://doi.org/10.11896/jsjkx.210800255
[2] 王俊, 王修来, 庞威, 赵鸿飞.
面向科技前瞻预测的大数据治理研究
Research on Big Data Governance for Science and Technology Forecast
计算机科学, 2021, 48(9): 36-42. https://doi.org/10.11896/jsjkx.210500207
[3] 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲.
FS-CRF:基于特征切分与级联随机森林的异常点检测模型
FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest
计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162
[4] 徐鹤, 吴昊, 李鹏.
面向物联网的时空数据处理算法设计
Design of Temporal-spatial Data Processing Algorithm for IoT
计算机科学, 2020, 47(11): 310-315. https://doi.org/10.11896/jsjkx.200400045
[5] 王晓霞, 孙德才.
一种基于Q-sample的局部相似连接并行算法
Q-sample-based Local Similarity Join Parallel Algorithm
计算机科学, 2019, 46(12): 38-44. https://doi.org/10.11896/jsjkx.190100240
[6] 孙德才,王晓霞.
一种基于MapReduce的大数据集相似自连接算法
MapReduce Based Similarity Self-join Algorithm for Big Dataset
计算机科学, 2017, 44(5): 20-25. https://doi.org/10.11896/j.issn.1002-137X.2017.05.004
[7] 顾韵华,高宝,张俊勇,杜杰.
基于标签速度和滑动子窗口的RFID数据清洗算法
RFID Data Cleaning Algorithm Based on Tag Velocity and Sliding Sub-window
计算机科学, 2015, 42(1): 144-148. https://doi.org/10.11896/j.issn.1002-137X.2015.01.034
[8] 王万良,顾熙仁,赵燕伟.
一种基于动态标签的RFID不确定性数据清洗算法
RFID Uncertain Data Cleaning Algorithm Based on Dynamic Tags
计算机科学, 2014, 41(Z6): 383-386.
[9] 陈静云,周良,丁秋林.
基于改进卡尔曼滤波的RFID数据清洗方法研究
Cleaning Method Research of RFID Data Stream Based on Improved Kalman Filter
计算机科学, 2014, 41(3): 202-204.
[10] 曹建军,刁兴春,陈 爽,邵衍振.
数据清洗及其一般性系统框架
Data Cleaning and its General System Framework
计算机科学, 2012, 39(Z11): 207-211.
[11] 林印华,张春海,刘 洁.
基于清洗规则和主数据的数据修复算法实现
Realization of Data Cleaning Based on Editing Rules and Master Data
计算机科学, 2012, 39(Z11): 174-176.
[12] 曹建军,刁兴春,汪挺,王芳潇.
领域无关数据清洗研究综述
Research on Domain-independent Data Cleaning: A Survey
计算机科学, 2010, 37(5): 26-29.
[13] 杨梦宁,赵鹏,张小洪,李朋.
一种基于总线模型的数据清洗方法
Data Clean Method Based on Bus Model
计算机科学, 2010, 37(4): 224-.
[14] 胡艳丽,张维明.
条件依赖理论及其应用展望
Theory of Conditional Functional Dependencies and its Application for Improving Data Quality
计算机科学, 2009, 36(12): 115-118.
[15] 胡艳丽,张维明,罗旭辉,肖卫东,汤大权.
基于数据依赖的数据修复研究进展
Dependencies Theory and its Application for Repairing Inconsistent Data
计算机科学, 2009, 36(10): 11-15.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!