计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 790-794.doi: 10.11896/jsjkx.210800032

• 交叉&应用 • 上一篇    下一篇

Grassberger熵随机森林在窃电行为检测的应用

阙华坤1, 冯小峰1, 刘盼龙2, 郭文翀1, 李健1, 曾伟良2, 范竞敏2   

  1. 1 广东电网有限责任公司计量中心 广州 518049
    2 广东工业大学自动化学院 广州 510006
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 曾伟良(weiliangzeng@gdut.edu.cn)
  • 作者简介:(quehuakun@126.com)
  • 基金资助:
    中国南方电网有限责任公司科技项目(GDKJXM20185800);国家自然科学基金(61803100)

Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection

QUE Hua-kun1, FENG Xiao-feng1, LIU Pan-long2, GUO Wen-chong1, LI Jian1, ZENG Wei-liang2, FAN Jing-min2   

  1. 1 Metrology Center of Guangdong Power Grid Corporation,Guangzhou 518049,China
    2 School of Automation,Guangdong University of Technology,Guangzhou 510006,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:QUE Hua-kun,born in 1986,senior engineer.His main research interests include metering automation and charging strategy.
    ZENG Wei-liang,born in 1986,Ph.D,associate professor.His main research interests include routing problem in complex network,traffic simulation and big data visualization for smart city.
  • Supported by:
    Science and Technology Project of China Southern Power Grid Co. Ltd(GDKJXM20185800) and National NaturalScience Foundation of China(61803100).

摘要: 窃电行为严重危害电网安全,为了提高对窃电行为的检测效率,提出一种新型的基于Grassberger熵随机森林的电网用户窃电检测方法。首先,采用核主成分分析方法(Kernel Principal Componemt Analysis,KPCA)对用户的原始用电量的时间序列向量进行降维,提取用户的用电特征;接着,考虑到窃电样本和正常样本数量相差较大时,窃电检测的分类器训练效果较差,因此,采用数据欠采样方法建立多个数量平衡的样本子集,并采用改进的Grassberger熵随机森林(Random Forest,RF)算法计算信息增益,对各样本子集进行训练再集成,从而提高模型对窃电检测的准确度。以中国南方电网的专变用户窃电检测为案例,将各用户的电表采集电量数据作为模型输入,验证所提模型的窃电检测效果。

关键词: Grassberger熵, 核主成分分析, 窃电检测, 随机森林

Abstract: Power stealing seriously endangers the grid security.In order to improve the efficiency of electricity theft detection,this paper proposes a novel method for electricity stealing detection based on Grassberger entropy random forest.First,KPCA is applied to reduce the dimensionality of the original power time series for extracting the user power consumption characteristics.Then,considering the unbalance of the number of theft samples and normal samples,the data under sampling method is used to establish multiple quantitatively balanced sample subsets.The random forest with improved Grassberger entropy is used tocompute informantion gain,so as to improve the accuracy of the model in power theft detection.Finally,the electricity consumption dataset of China Southern Power Grid is used to verify the power stealing detection effect of the proposed model.

Key words: Grassberger entropy, Kernel principal component analysis, Power stealing detection, Random Forest

中图分类号: 

  • F407.6
[1] TIAN L,XIANG M.Abnormal Power Consumption Analysis Based on Density-based Spatial Clustering of Applications with Noise in Power Systems[J].Automation of Electric Power Systems,2017,41(5):64-70.
[2] WANG G L,ZHOU G L,ZHAO H S,et al.Fast Clustering and Anomaly Detection Technique for Large-scale Power Data Stream[J].Automation of Electric Power Systems,2016,40(24):27-33.
[3] FAHIM M,SILLITTI A.Analyzing Load Profiles of EnergyConsumption to Infer.Household Characteristics Using Smart Meters[J].Energies,2019,12:169-173.
[4] LEANDRO A P J,CAIO C O R,RODRIGUES D,et al.Unsupervised non-technical losses identification through optimum-path forest[J].Electric Power Systems Research,2016,140:413-423.
[5] ZANETTI,M,JAMHOUR E,PELLENZ M,et al.A tunable fraud detection system for advanced metering infrastructure using short-lived patterns[J].IEEE Transactions on Smart grid,2019,10(1):830-840.
[6] MUNIZ C,FIGUEIREDO K,VELLASCO M,et al.Irregularity detection on low tension electric installations by neural network ensembles[C]//2009 International Joint Conference on Neural Networks.IEEE,2016:2176-2182.
[7] COSTA B C,LA ALBERTO B,PORTELA A,et al.Fraud detection in electric power distribution networks using an ann-based knowledge-discovery process[J].International Journal of Artificial Intelligence & Applications,2019,4(6):17.
[8] LIN J N,CHENG Z H,LIN B X.Study on identification method of stolen electricity based on MEA-BP[J].Electronic Design Engineering,2021,29(11):175-180.
[9] SPIRI'C J V,STANKOVI'C S S,BDOˇCI'C M,et al.Using the rough set theoryto detect fraud committed by electricity custo-mers[J].International Journal of ElectricalPower & Energy Systems,2014,62(1):727-734.
[10] YANG X L,TAO X F,XIONG X,et al.Detection Method for Electricity Theft Based on Deep Forest Algorithm[J].Smart Power,2019,47(10):85-92.
[11] CAI J H,WANG K,DONG K,et.al.Power user stealing detection based on DenseNet and random forest[J].Journal of Computer Applications,2021,41(S1):75-80.
[12] CIESLAK D A,CHAWLA N V,STRIEGEL A.Combating imbalance in network intrusion datasets[C]//IEEE International Conference on Granular Computing.2006:732-737.
[13] ZHAO Z X,WANG G L,LI X D.An Improved SVM Based Under-Sampling Method for Classifying Imbalanced Data[J].Acta Scientiarum Naturalium Universitatis Sunyatseni,2012,51(6):10-16.
[14] YANG J,YAN X F,ZHANG D P.Cost-sensitive Software Defect Prediction Method Based on Boosting[J].Computer Scien-ce,2017,44(8):176-180.
[15] LIU X Y,WU J,ZHOU Z H.Exploratory under sampling for class-imbalance learning[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2009,39(2):539-550.
[16] CHEN S Z,ZHU J P,YOU T G.Study on Unbalanced Custo-mer Loss Based on SMOTERF Algorithm[J].Journal of Mathematics in Practice and Theory,2019,1(9):204-210.
[17] BARUA S,ISLAM M M,YAO X,et al.MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(2):405-425.
[18] LIU X Y,WANG S T,ZHANG M L.Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J].Frontiers of Computer Science,2019,13(5):406-415.
[19] DEL RÍO S,LÓPEZ V,BENÍ-TEZ J M,et al.On the use ofMapReduce for imbalanced big data using Random Forest[J].Information Sciences,2014,12(1):235-239.
[20] ZHANG M,HU X H,WU J X.Imbalanced Data Processing Algorithm Based on Mixed Sampling[J].Computer Engineering and Applications,2019,55(17):68-75.
[21] MA J J,PAN Q,LIANG Y,et al.Object Detection Based on Improved Grassberger Entropy Random Forest Classifier[J].Chinese Journal of Lasers,2019,46(7):238-246.
[1] 李其烨, 邢红杰.
基于最大相关熵的KPCA异常检测方法
KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion
计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175
[2] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[3] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[4] 王文强, 贾星星, 李朋.
自适应的集成定序算法
Adaptive Ensemble Ordering Algorithm
计算机科学, 2022, 49(6A): 242-246. https://doi.org/10.11896/jsjkx.210200108
[5] 章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江.
基于眼前节相干光断层扫描成像的核性白内障分类算法
Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image
计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085
[6] 刘振宇, 宋晓莹.
一种可用于分类型属性数据的多变量回归森林
Multivariate Regression Forest for Categorical Attribute Data
计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[7] 肖丁, 张玙璠, 纪厚业.
基于多头注意力机制的用户窃电行为检测
Electricity Theft Detection Based on Multi-head Attention Mechanism
计算机科学, 2022, 49(1): 140-145. https://doi.org/10.11896/jsjkx.210100177
[8] 杨小琴, 刘国军, 郭建慧, 马文涛.
基于随机森林的空域-频域联合特征全参考彩色图像质量评价方法
Full Reference Color Image Quality Assessment Method Based on Spatial and Frequency Domain Joint Features with Random Forest
计算机科学, 2021, 48(8): 99-105. https://doi.org/10.11896/jsjkx.200700106
[9] 郑建华, 李小敏, 刘双印, 李迪.
融合级联上采样与下采样的改进随机森林不平衡数据分类算法
Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling
计算机科学, 2021, 48(7): 145-154. https://doi.org/10.11896/jsjkx.200800120
[10] 李娜娜, 王勇, 周林, 邹春明, 田英杰, 郭乃网.
基于特征重要度二次筛选的DDoS攻击随机森林检测方法
DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance
计算机科学, 2021, 48(6A): 464-467. https://doi.org/10.11896/jsjkx.200900101
[11] 曹扬晨, 朱国胜, 祁小云, 邹洁.
基于随机森林的入侵检测分类研究
Research on Intrusion Detection Classification Based on Random Forest
计算机科学, 2021, 48(6A): 459-463. https://doi.org/10.11896/jsjkx.200600161
[12] 徐佳庆, 胡小月, 唐付桥, 王强, 何杰.
基于随机森林的高性能互连网络阻塞故障检测
Detecting Blocking Failure in High Performance Interconnection Networks Based on Random Forest
计算机科学, 2021, 48(6): 246-252. https://doi.org/10.11896/jsjkx.201200142
[13] 周益旻, 刘方正, 王勇.
基于混合方法的IPSec VPN加密流量识别
IPSec VPN Encrypted Traffic Identification Based on Hybrid Method
计算机科学, 2021, 48(4): 295-302. https://doi.org/10.11896/jsjkx.200700189
[14] 张天瑞, 魏铭琦, 高秀秀.
基于IPSO-WRF的选择性激光烧结件气泡溶解时间预测模型
Prediction Model of Bubble Dissolution Time in Selective Laser Sintering Based on IPSO-WRF
计算机科学, 2021, 48(11A): 638-643. https://doi.org/10.11896/jsjkx.210300080
[15] 刘振鹏, 苏楠, 秦益文, 卢家欢, 李小菲.
FS-CRF:基于特征切分与级联随机森林的异常点检测模型
FS-CRF:Outlier Detection Model Based on Feature Segmentation and Cascaded Random Forest
计算机科学, 2020, 47(8): 185-188. https://doi.org/10.11896/jsjkx.190600162
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!