Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230700070-5.doi: 10.11896/jsjkx.230700070

• Big Data & Data Science • Previous Articles     Next Articles

Improved K-means Photovoltaic Energy Data Cleaning Method Based on Autoencoder

PENG Bo, LI Yaodong, GONG Xianfu   

  1. The Grid Planning and Research Center of Guangdong Power Grid Corporation Limited,Guangzhou 510080,China
  • Published:2024-06-06
  • About author:PENG Bo,born in 1991,master,engineer.His main research interest is po-wer grid planning.
  • Supported by:
    Science and Technology Project of China Southern Power Grid Co. Ltd 037700KK52220042(GDKJXM20220906).

Abstract: The development of smart grids has brought about a massive amount of energy data,and data quality is the foundation for tasks such as data value mining.However,during the collection and transmission process of large-scale photovoltaic energy data from multiple sources,it is inevitable to encounter abnormal data,thus requiring data cleaning.Currently,traditional statistical machine learning-based data cleaning models have certain limitations.This paper proposes an improved K-means clustering model based on the Transformer autoencoder structure for energy big data cleaning.It adaptively determines the number of clusters using the elbow method and utilizes autoencoder networks to compress and reconstruct data within clusters,thereby detecting and recovering abnormal data.Additionally,the proposed model employs the multi-head attention mechanism of Transformer to learn the relevant features among the data,enhancing the screening capability for abnormal data.Experimental results on a publicly available photovoltaic power generation dataset demonstrate that,compared to other methods,the proposed model achieves better performance in detecting abnormal data,with a screening accuracy of over 96%.Moreover,it is capable of recovering abnormal data to a certain extent,providing effective support for the application of energy big data.

Key words: Autoencoder, Data cleaning, Anomaly detection, Transformer, K-means

CLC Number: 

  • TP391
[1]WU Y,LIU Y,AHMED S H,et al.Dominant Data Set Selection Algorithms for Electricity Consumption Time-Series Data Ana-lysis Based on Affine Transformation[J].IEEE Internet of Things Journal,2020,7(5):4347-4360.
[2]KUMAR V,KHOSLA C.Data Cleaning-A thorough analysisand survey on unstructured data[C]//2018 8th International Conference on Cloud Computing,Data Science & Engineering(Confluence).IEEE,2018:305-309.
[3]SHEN X,FU X,ZHOU C.A combined algorithm for cleaning abnormal data of wind turbine power curve based on change point grouping algorithm and quartile algorithm[J].IEEE Transactions on Sustainable Energy,2018,10(1):46-54.
[4]GUO Z,LV Z,CHEN C.Research ontypical model of network intrusion and attack in power industrial control system[J].Information Technology and Network Security,2018,37:37-39.
[5]LV Z,DENG W,ZHANG Z,et al.A Data Fusion and DataCleaning System for Smart Grids Big Data[C]//2019 IEEE International Conference on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking.IEEE,2019:802-807.
[6]MENG X,ZHOU L,WANG H,et al.Research on MapReduce Data Computing Technology for Intelligent Power Grid Based on Hadoop Cloud Platform[J].Electrical Measurement & Instrumentation,2015(10):66-72.
[7]QU Z,ZHANG Y,WANG Y,et al.Big Data cleaning model of energy Internet based on Spark framework[J].Electrical Mea-surement and Instrumentation,2018,55(2):39-44.
[8]XING Y,MENG C,LI C,et al.Lean Operation and Maintenance Evaluation Technology of Power Grid Equipment Based on Improved Big Data Cleaning Method[C]//2020 IEEE 4thConfe-rence on Energy Internet and Energy System Integration.IEEE,2020:2749-2752.
[9]XU B.Power Station Abnormal Data Cleaning Method Based On Big Data Mining[C]//2021 IEEE Sustainable Power and Energy Conference.2021:3809-3814.
[10]LV Z,HU Z,NING B,et al.The Data Cleaning of Electric Industrial Control Terminal Based on the iForest and Genetic BP Neural Network Algorithms[C]//2019 IEEE 2nd International Conference on Information Communication and Signal Proces-sing.IEEE,2019:490-494.
[11]HUANG F H C.Research on automatic repair method of power data missing based on order dependence[J].Automation and Instrumentation,2020(12):233-236.
[12]ZHANG X,LIN R,XU H.An Adaptive Parameters DensityCluster Algorithm for Data Cleaning in Big Data[C]//Artificial Intelligence and Security:6th International Conference.2020:543-553.
[13]LIN N,WU Y.A Big Data Cleaning Method Based on Improved K-means[J].Journal of Microcomputer Applications,2021,37(11):133-136.
[14]CHEN X,ZHANG X.Extract-transform-load of data cleaningmethod in electric company[C]//2010 International Conference on Artificial Intelligence and Computational Intelligence.IEEE,2010,3:345-349.
[15]CUI M.Introduction to the k-means clustering algorithm based on the elbow method[J].Accounting,Auditing and Finance,2020,1(1):5-8.
[16]ZHAI J,ZHANG S,CHEN J,et al.Autoencoder and its various variants[C]//2018 IEEE International Conference on Systems,Man,and Cybernetics(SMC).IEEE,2018:415-419.
[17]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J/OL].Advances in Neural Information Processing Systems,2017,30.https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[18]LIN J,SHENG G,YAN Y,et al.Online monitoring data clea-ning of transformer considering time series correlation[C]//2018 IEEE/PES Transmission and Distribution Conference and Exposition(T&D).IEEE,2018:1-9.
[19]SO D,LE Q,LIANG C.The evolved transformer[C]//International Conference on Machine Learning.PMLR,2019:5877-5886.
[1] WANG Yingjie, ZHANG Chengye, BAI Fengbo, WANG Zumin. Named Entity Recognition Approach of Judicial Documents Based on Transformer [J]. Computer Science, 2024, 51(6A): 230500164-9.
[2] LIU Xiaohu, CHEN Defu, LI Jun, ZHOU Xuwen, HU Shan, ZHOU Hao. Speaker Verification Network Based on Multi-scale Convolutional Encoder [J]. Computer Science, 2024, 51(6A): 230700083-6.
[3] ZHANG Jie, LU Miaoxin, LI Jiakang, XU Dayong, HUANG Wenxiao, SHI Xiaoping. Residual Dense Convolutional Autoencoder for High Noise Image Denoising [J]. Computer Science, 2024, 51(6A): 230400073-7.
[4] QIAO Hong, XING Hongjie. Attention-based Multi-scale Distillation Anomaly Detection [J]. Computer Science, 2024, 51(6A): 230300223-11.
[5] ZHAO Ziqi, YANG Bin, ZHANG Yuanguang. Hierarchical Traffic Flow Prediction Model Based on Graph Autoencoder and GRU Network [J]. Computer Science, 2024, 51(6A): 230400148-6.
[6] SI Jia, LIANG Jianfeng, XIE Shuo, DENG Yingjun. Research Progress of Anomaly Detection in IaaS Cloud Operation Driven by Deep Learning [J]. Computer Science, 2024, 51(6A): 230400016-8.
[7] WANG Li, CHEN Gang, XIA Mingshan, HU Hao. DUWe:Dynamic Unknown Word Embedding Approach for Web Anomaly Detection [J]. Computer Science, 2024, 51(6A): 230300191-5.
[8] WU Yibo, HAO Yingguang, WANG Hongyu. Rice Defect Segmentation Based on Dual-stream Convolutional Neural Networks [J]. Computer Science, 2024, 51(6A): 230600107-8.
[9] YUAN Zhen, LIU Jinfeng. Denoising Autoencoders Based on Lossy Compress Coding [J]. Computer Science, 2024, 51(6A): 230400172-7.
[10] WU Nannan, GUO Zehao, ZHAO Yiming, YU Wei, SUN Ying, WANG Wenjun. Study on Anomalous Evolution Pattern on Temporal Networks [J]. Computer Science, 2024, 51(6): 118-127.
[11] WU Huinan, XING Hongjie, LI Gang. Deep Multiple-sphere Support Vector Data Description Based on Variational Autoencoder with Mixture-of-Gaussians Prior [J]. Computer Science, 2024, 51(6): 135-143.
[12] YU Bihui, TAN Shuyue, WEI Jingxuan, SUN Linzhuang, BU Liping, ZHAO Yiman. Vision-enhanced Multimodal Named Entity Recognition Based on Contrastive Learning [J]. Computer Science, 2024, 51(6): 198-205.
[13] LI Zekai, BAI Zhengyao, XIAO Xiao, ZHANG Yihan, YOU Yilin. Point Cloud Upsampling Network Incorporating Transformer and Multi-stage Learning Framework [J]. Computer Science, 2024, 51(6): 231-238.
[14] LIAO Junshuang, TAN Qinhong. DETR with Multi-granularity Spatial Attention and Spatial Prior Supervision [J]. Computer Science, 2024, 51(6): 239-246.
[15] LIU Jiasen, HUANG Jun. Center Point Target Detection Algorithm Based on Improved Swin Transformer [J]. Computer Science, 2024, 51(6): 264-271.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!