计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230700070-5.doi: 10.11896/jsjkx.230700070
彭勃, 李耀东, 龚贤夫
PENG Bo, LI Yaodong, GONG Xianfu
摘要: 智能电网的发展带来了海量能源数据,数据质量是开展数据价值挖掘等任务的基础。然而,多源海量光伏能源数据的采集与传输过程中不可避免地存在异常数据,因此需要进行数据清洗。目前,基于传统统计机器学习的数据清洗模型存在一定的局限性。文中提出了一种基于Transformer自编码结构的改进型K-means聚类模型,用于能源大数据清洗。该模型通过肘部法则自适应地确定聚类簇数,并利用自编码网络对聚类内数据进行压缩和重构,从而实现异常数据的检测和恢复。同时,模型利用Transformer的多头注意力机制学习数据间的相关特征,提高了对异常数据的筛查能力。在光伏发电公开数据集上的实验证明,与其他方法相比,该模型具有更好的异常数据检测效果,筛查准确率可达96%以上。此外,所提模型能在一定程度上恢复异常数据,为能源大数据应用提供了有效的支持。
中图分类号:
[1]WU Y,LIU Y,AHMED S H,et al.Dominant Data Set Selection Algorithms for Electricity Consumption Time-Series Data Ana-lysis Based on Affine Transformation[J].IEEE Internet of Things Journal,2020,7(5):4347-4360. [2]KUMAR V,KHOSLA C.Data Cleaning-A thorough analysisand survey on unstructured data[C]//2018 8th International Conference on Cloud Computing,Data Science & Engineering(Confluence).IEEE,2018:305-309. [3]SHEN X,FU X,ZHOU C.A combined algorithm for cleaning abnormal data of wind turbine power curve based on change point grouping algorithm and quartile algorithm[J].IEEE Transactions on Sustainable Energy,2018,10(1):46-54. [4]GUO Z,LV Z,CHEN C.Research ontypical model of network intrusion and attack in power industrial control system[J].Information Technology and Network Security,2018,37:37-39. [5]LV Z,DENG W,ZHANG Z,et al.A Data Fusion and DataCleaning System for Smart Grids Big Data[C]//2019 IEEE International Conference on Parallel & Distributed Processing with Applications,Big Data & Cloud Computing,Sustainable Computing & Communications,Social Computing & Networking.IEEE,2019:802-807. [6]MENG X,ZHOU L,WANG H,et al.Research on MapReduce Data Computing Technology for Intelligent Power Grid Based on Hadoop Cloud Platform[J].Electrical Measurement & Instrumentation,2015(10):66-72. [7]QU Z,ZHANG Y,WANG Y,et al.Big Data cleaning model of energy Internet based on Spark framework[J].Electrical Mea-surement and Instrumentation,2018,55(2):39-44. [8]XING Y,MENG C,LI C,et al.Lean Operation and Maintenance Evaluation Technology of Power Grid Equipment Based on Improved Big Data Cleaning Method[C]//2020 IEEE 4thConfe-rence on Energy Internet and Energy System Integration.IEEE,2020:2749-2752. [9]XU B.Power Station Abnormal Data Cleaning Method Based On Big Data Mining[C]//2021 IEEE Sustainable Power and Energy Conference.2021:3809-3814. [10]LV Z,HU Z,NING B,et al.The Data Cleaning of Electric Industrial Control Terminal Based on the iForest and Genetic BP Neural Network Algorithms[C]//2019 IEEE 2nd International Conference on Information Communication and Signal Proces-sing.IEEE,2019:490-494. [11]HUANG F H C.Research on automatic repair method of power data missing based on order dependence[J].Automation and Instrumentation,2020(12):233-236. [12]ZHANG X,LIN R,XU H.An Adaptive Parameters DensityCluster Algorithm for Data Cleaning in Big Data[C]//Artificial Intelligence and Security:6th International Conference.2020:543-553. [13]LIN N,WU Y.A Big Data Cleaning Method Based on Improved K-means[J].Journal of Microcomputer Applications,2021,37(11):133-136. [14]CHEN X,ZHANG X.Extract-transform-load of data cleaningmethod in electric company[C]//2010 International Conference on Artificial Intelligence and Computational Intelligence.IEEE,2010,3:345-349. [15]CUI M.Introduction to the k-means clustering algorithm based on the elbow method[J].Accounting,Auditing and Finance,2020,1(1):5-8. [16]ZHAI J,ZHANG S,CHEN J,et al.Autoencoder and its various variants[C]//2018 IEEE International Conference on Systems,Man,and Cybernetics(SMC).IEEE,2018:415-419. [17]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J/OL].Advances in Neural Information Processing Systems,2017,30.https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. [18]LIN J,SHENG G,YAN Y,et al.Online monitoring data clea-ning of transformer considering time series correlation[C]//2018 IEEE/PES Transmission and Distribution Conference and Exposition(T&D).IEEE,2018:1-9. [19]SO D,LE Q,LIANG C.The evolved transformer[C]//International Conference on Machine Learning.PMLR,2019:5877-5886. |
|