计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230700070-5.doi: 10.11896/jsjkx.230700070

• 大数据&数据科学 • 上一篇    下一篇

基于自编码的改进K-means光伏能源数据清洗方法

彭勃, 李耀东, 龚贤夫   

  1. 广东电网有限责任公司电网规划研究中心 广州 510080
  • 发布日期:2024-06-06
  • 通讯作者: 彭勃(1339512151@qq.com)
  • 作者简介:(1339512151@qq.com)
  • 基金资助:
    中国南方电网有限责任公司科技项目037700KK52220042(GDKJXM20220906)

Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder

PENG Bo, LI Yaodong, GONG Xianfu   

  1. The Grid Planning and Research Center of Guangdong Power Grid Corporation Limited,Guangzhou 510080,China
  • Published:2024-06-06
  • About author:PENG Bo,born in 1991,master,engineer.His main research interest is po-wer grid planning.
  • Supported by:
    Science and Technology Project of China Southern Power Grid Co. Ltd 037700KK52220042(GDKJXM20220906).

摘要: 智能电网的发展带来了海量能源数据,数据质量是开展数据价值挖掘等任务的基础。然而,多源海量光伏能源数据的采集与传输过程中不可避免地存在异常数据,因此需要进行数据清洗。目前,基于传统统计机器学习的数据清洗模型存在一定的局限性。文中提出了一种基于Transformer自编码结构的改进型K-means聚类模型,用于能源大数据清洗。该模型通过肘部法则自适应地确定聚类簇数,并利用自编码网络对聚类内数据进行压缩和重构,从而实现异常数据的检测和恢复。同时,模型利用Transformer的多头注意力机制学习数据间的相关特征,提高了对异常数据的筛查能力。在光伏发电公开数据集上的实验证明,与其他方法相比,该模型具有更好的异常数据检测效果,筛查准确率可达96%以上。此外,所提模型能在一定程度上恢复异常数据,为能源大数据应用提供了有效的支持。

关键词: 自编码, 数据清洗, 异常检测, Transformer, K-means

Abstract: The development of smart grids has brought about a massive amount of energy data,and data quality is the foundation for tasks such as data value mining.However,during the collection and transmission process of large-scale photovoltaic energy data from multiple sources,it is inevitable to encounter abnormal data,thus requiring data cleaning.Currently,traditional statistical machine learning-based data cleaning models have certain limitations.This paper proposes an improved K-means clustering model based on the Transformer autoencoder structure for energy big data cleaning.It adaptively determines the number of clusters using the elbow method and utilizes autoencoder networks to compress and reconstruct data within clusters,thereby detecting and recovering abnormal data.Additionally,the proposed model employs the multi-head attention mechanism of Transformer to learn the relevant features among the data,enhancing the screening capability for abnormal data.Experimental results on a publicly available photovoltaic power generation dataset demonstrate that,compared to other methods,the proposed model achieves better performance in detecting abnormal data,with a screening accuracy of over 96%.Moreover,it is capable of recovering abnormal data to a certain extent,providing effective support for the application of energy big data.

Key words: Autoencoder, Data cleaning, Anomaly detection, Transformer, K-means

中图分类号: 

  • TP391
[1]WU Y,LIU Y,AHMED S H,et al.Dominant Data Set Selection Algorithms for Electricity Consumption Time-Series Data Ana-lysis Based on Affine Transformation[J].IEEE Internet of Things Journal,2020,7(5):4347-4360.
[2]KUMAR V,KHOSLA C.Data Cleaning-A thorough analysisand survey on unstructured data[C]//2018 8th International Conference on Cloud Computing,Data Science & Engineering(Confluence).IEEE,2018:305-309.
[3]SHEN X,FU X,ZHOU C.A combined algorithm for cleaning abnormal data of wind turbine power curve based on change point grouping algorithm and quartile algorithm[J].IEEE Transactions on Sustainable Energy,2018,10(1):46-54.
[4]GUO Z,LV Z,CHEN C.Research ontypical model of network intrusion and attack in power industrial control system[J].Information Technology and Network Security,2018,37:37-39.
[5]LV Z,DENG W,ZHANG Z,et al.A Data Fusion and DataCleaning System for Smart Grids Big Data[C]//2019 IEEE International Conference on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking.IEEE,2019:802-807.
[6]MENG X,ZHOU L,WANG H,et al.Research on MapReduce Data Computing Technology for Intelligent Power Grid Based on Hadoop Cloud Platform[J].Electrical Measurement & Instrumentation,2015(10):66-72.
[7]QU Z,ZHANG Y,WANG Y,et al.Big Data cleaning model of energy Internet based on Spark framework[J].Electrical Mea-surement and Instrumentation,2018,55(2):39-44.
[8]XING Y,MENG C,LI C,et al.Lean Operation and Maintenance Evaluation Technology of Power Grid Equipment Based on Improved Big Data Cleaning Method[C]//2020 IEEE 4thConfe-rence on Energy Internet and Energy System Integration.IEEE,2020:2749-2752.
[9]XU B.Power Station Abnormal Data Cleaning Method Based On Big Data Mining[C]//2021 IEEE Sustainable Power and Energy Conference.2021:3809-3814.
[10]LV Z,HU Z,NING B,et al.The Data Cleaning of Electric Industrial Control Terminal Based on the iForest and Genetic BP Neural Network Algorithms[C]//2019 IEEE 2nd International Conference on Information Communication and Signal Proces-sing.IEEE,2019:490-494.
[11]HUANG F H C.Research on automatic repair method of power data missing based on order dependence[J].Automation and Instrumentation,2020(12):233-236.
[12]ZHANG X,LIN R,XU H.An Adaptive Parameters DensityCluster Algorithm for Data Cleaning in Big Data[C]//Artificial Intelligence and Security:6th International Conference.2020:543-553.
[13]LIN N,WU Y.A Big Data Cleaning Method Based on Improved K-means[J].Journal of Microcomputer Applications,2021,37(11):133-136.
[14]CHEN X,ZHANG X.Extract-transform-load of data cleaningmethod in electric company[C]//2010 International Conference on Artificial Intelligence and Computational Intelligence.IEEE,2010,3:345-349.
[15]CUI M.Introduction to the k-means clustering algorithm based on the elbow method[J].Accounting,Auditing and Finance,2020,1(1):5-8.
[16]ZHAI J,ZHANG S,CHEN J,et al.Autoencoder and its various variants[C]//2018 IEEE International Conference on Systems,Man,and Cybernetics(SMC).IEEE,2018:415-419.
[17]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J/OL].Advances in Neural Information Processing Systems,2017,30.https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[18]LIN J,SHENG G,YAN Y,et al.Online monitoring data clea-ning of transformer considering time series correlation[C]//2018 IEEE/PES Transmission and Distribution Conference and Exposition(T&D).IEEE,2018:1-9.
[19]SO D,LE Q,LIANG C.The evolved transformer[C]//International Conference on Machine Learning.PMLR,2019:5877-5886.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!