计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 63-74.doi: 10.11896/jsjkx.230600103

• 数据库&大数据&数据科学 • 上一篇    下一篇

融合AP聚类算法和宽度学习系统的分布外硬盘故障预测

王屹阳1, 刘发贵1,2, 彭玲霞1, 钟国祥1   

  1. 1 华南理工大学计算机科学与工程学院 广州 510006
    2 鹏城实验室 广东 深圳 518066
  • 收稿日期:2023-06-12 修回日期:2023-11-28 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 钟国祥(cszhongguoxiang111@mail.scut.edu.cn)
  • 作者简介:(202220143286@mail.scut.edu.cn)
  • 基金资助:
    鹏城实验室重大项目(PCL2023A09);广东省基础与应用基础研究重大项目(2019B030302002);广州市重点领域研发计划项目(202007030006);广东省省级科技计划项目(2021B1111600001)

Out-of-Distribution Hard Disk Failure Prediction with Affinity Propagation Clustering and Broad Learning Systems

WANG Yiyang1, LIU Fagui1,2, PENG Lingxia1, ZHONG Guoxiang1   

  1. 1 School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,China
    2 Peng Cheng Laboratory,Shenzhen,Guangdong 518066,China
  • Received:2023-06-12 Revised:2023-11-28 Online:2024-08-15 Published:2024-08-13
  • About author:WANG Yiyang,born in 1999,postgra-duate.His main research interest is cloud computing.
    ZHONG Guoxiang,born in 1994,Ph.D.His main research interests include cloud computing,AIOps and machine learning.
  • Supported by:
    Major Key Project of PCL,China(PCL2023A09),Guangdong Major Project of Basic and Applied Basic Research(2019B030302002),Science and Technology Major Project of Guangzhou(202007030006) and Science and Technology Project of Guangdong Province(2021B1111600001).

摘要: 硬盘是云数据中心最主要的存储设备,硬盘故障预测是保障数据安全的重要手段。但是,硬盘的故障与健康样本之间存在着极端的数量不平衡问题,这会导致模型偏差;此外,不同型号的硬盘数据分布存在一定的差异,在特定硬盘数据上训练的模型往往不适用于其他硬盘。对于这两个问题,文中提出了一种融合AP 聚类算法和宽度学习系统的分布外硬盘故障预测方法。针对样本不平衡问题,文中使用AP聚类算法对硬盘故障出现前一阶段的样本集进行聚类,将与故障样本处于同一聚类簇的样本扩充为故障样本。针对不同型号硬盘分布存在差异的问题,文中结合流形正则化框架和宽度学习系统来学习硬盘数据的低维结构,提高模型对未知分布数据的泛化能力。实验结果表明,在AP聚类算法重采样的样本集上,相较于用于对比的重采样方法得到的样本集,多种故障预测方法的F1_Score取得了平均0.2的提升。此外,在分布外硬盘故障预测任务上,所提模型的F1_Score相比对比方法提升了0.1~0.2。

关键词: 硬盘故障预测, 类不平衡, 分布外泛化, AP聚类, 宽度学习系统, 流形学习

Abstract: Hard disk is the primary storage device in cloud data centers,and hard disk failure prediction is crucial for ensuring data security.However,there is a significant imbalance between failure and healthy SMART samples,which can lead to model bias.Moreover,hard disk models exhibit varying data distributions.Prediction models trained on specific hard disk data may not be suitable for other hard disks.To address these issues,this paper proposes a method for out-of-distribution hard disk failure prediction by combining the AP clustering algorithm and the broad learning system.To tackle the sample imbalance problem,this paper uses the AP clustering algorithm to cluster samples close to failure and treats all samples in the cluster containing determined failure instances as additional failure samples.To address the distribution differences of hard disk models,this paper combines the manifold regularization framework and the broad learning system to learn the low-dimensional structure of hard disk data,thereby improving the model’s generalization ability to unknown data.Experimental results show that,on the dataset resampled by the AP clustering algorithm,the F1_Score of multiple methods increases by an average of 0.2 compared to the datasets resampled by comparative methods.Additionally,in the task of predicting out-of-distribution hard disk failures,the F1_Score of the proposed model increases by 0.1~0.2 compared to other methods.

Key words: Hard disk failure prediction, Class imbalance, Out-of-distribution generalization, Affinity propagation clustering, Broad learning system, Manifold learning

中图分类号: 

  • TP302
[1]GHEMAWAT S,GOBIOFF H,LEUNG S T.The Google file system[C]//Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles.New York:Association for Computing Machinery,2003:29-43.
[2]ZHANG H,TANG D,CAI H L.Study on Predictive Erasure Codes in Distributed Storage System[J].Computer Science,2021,48(5):130-139.
[3]MURRAY J F,HUGHES G F,KREUTZ-DELGADO K,et al.Machine Learning Methods for Predicting Failures in Hard Drives:A Multiple-Instance Application[J].Journal of Machine Learning Research,2005,6(27):783-816.
[4]TOMER V,SHARMA V,GUPTA S,et al.Hard disk drive fai-lure prediction using SMART attribute[J].Materials Today:Proceedings,2021,46(20):11258-11262.
[5]GAO X,ZHA S,LI X,et al.Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples[J].IEEE Access,2019,7:114285-114296.
[6]BACKBLAZE.BackblazeHard Drive Data and Stats[EB/OL].https://www.backblaze.com/b2/hard-drive-test-data.html.
[7]ZHAO R,GUAN D,JIN Y,et al.Hard Disk Failure Prediction via Transfer Learning[C]//Big Data and Security.Singapore:Springer,2021:522-536.
[8]WANG J,ZHANG R,QI G,et al.A Heuristic-IRM Method on Hard Disk Failure Prediction in Out-of-distribution Environments[C]//2021 IEEE International Conference on Industrial Engineering and Engineering Management.Singapore:IEEE,2021:1661-1664.
[9]ZHANG J,HUANG P,ZHOU K,et al.Hddse:Enabling high-dimensional disk state embedding for generic failure detection system of heterogeneous disks in large data centers[C] //Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference.USA:USENIX Association,2020:111-126.
[10]ZHAO N,ZHANG X F,ZHANG L J.Overview of Imbalanced Data Classification[J].Computer Science,2018,45(6A):22-27.
[11]SMITH R W,DIETRICH D L.The bathtub curve:an alternative explanation[C]//Proceedings of Annual Reliability and Maintainability Symposium.USA:IEEE,1994:241-247.
[12]SCHROEDER B,GIBSON G A.Understanding disk failurerates[J].ACM Transactions on Storage,2007,3(3):8.
[13]ZHOU Y,WANG F,FENG D.ASLDP:An Active Semi-supervised Learning method for Disk Failure Prediction[C]//50th International Conference on Parallel Processing.New York:Association for Computing Machinery,2021:1-11.
[14]ZHOU H,NIU Z,WANG G,et al.A Proactive Failure Tolerant Mechanismfor SSDs Storage Systems based on Unsupervised Learning[C]//2021 IEEE/ACM 29th International Symposium on Quality of Service.Tokyo:IEEE,2021:1-10.
[15]ZHU B,WANG G,LIU X,et al.Proactive drive failure prediction for large scale storage systems[C]//2013 IEEE 29th Symposium on Mass Storage Systems and Technologies.Long Beach:IEEE,2013:1-5.
[16]SUN X,CHAKRABARTY K,HUANG R,et al.System-level hardware failure prediction using deep learning[C]//2019 56th ACM/IEEE Design Automation Conference.Las Vegas:IEEE,2019:1-6.
[17]BURRELLO A,PAGLIARI D J,BARTOLINI A,et al.Predicting Hard Disk Failures in Data Centers Using Temporal Convolutional Neural Networks[C]//Euro-Par 2020:Parallel Proces-sing Workshops.Cham:Springer,2021:277-289.
[18]ZÜFLE M,KRUPITZER C,ERHARD F,et al.To fail or not to fail:Predicting hard disk drive failure time windows[C]//Mea-surement,Modelling and Evaluation of Computing Systems.Cham:Springer,2020:19-36.
[19]JIA J,WU P,ZHANG K,et al.Imbalanced Disk Failure Data Processing Method Based on CTGAN[C]//Intelligent Computing Theories and Application.Cham:Springer,2022:638-649.
[20]SHEN J,WAN J,LIM S J,et al.Random-forest-based failure prediction for hard disk drives[J].International Journal of Distributed Sensor Networks,2018,14(11):1-15.
[21]BOTEZATU M,GIURGIU I,BOGOJESKA J,et al.Predicting disk replacement towards reliable data centers[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.New York:Association for Computing Machinery,2016:39-48.
[22]RINCÓN C A C,PARIS J F,VILALTA R,et al.Disk failure prediction in heterogeneous environments[C]//2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems.Seattle:IEEE,2017:113-119.
[23]XIAO J,YI Y,XIONG Z,et al.Disk failure prediction in data centers via online learning[C]//Proceedings of the 47th International Conference on Parallel Processing.New York:Association for Computing Machinery,2018:1-10.
[24]XU Y,SUI K,YAO R,et al.Improving serviceavailability ofcloud systems by predicting disk error[C]//Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Confe-rence.USA:USENIX Association,2018:481-494.
[25]LI J,JI X,JIA Y,et al.Hard drive failure prediction using classification and regression trees[C]//2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.Atlanta:IEEE,2014:383-394.
[26]PEREIRA F L F,DOS SANTOS LIMA F D,DE MOURA LEITE L G,et al.Transfer learning for Bayesian networks with application on hard disk drives failure prediction[C]//2017 Brazi-lian Conference on Intelligent Systems.Uberlandia:IEEE,2017:228-233.
[27]XIE Y,FENG D,WANG F,et al.OME:An Optimized Modeling Engine for Disk Failure Prediction in Heterogeneous Data Center[C]//2018 IEEE 36th International Conference on Computer Design.Orlando:IEEE,2019:561-564.
[28]WANG J,LAN C,LIU C,et al.Generalizing to Unseen Do-mains:A Survey on Domain Generalization[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(8):8052-8072.
[29]FREY B J,DUECK D.Clustering by Passing Messages Between Data Points[J].Science,2007,315(5814):972-976.
[30]CHEN C L P,LIU Z.Broad Learning System:An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(1):10-24.
[31]PAO Y H,PARK G H,SOBAJIC D J.Learning and generalization characteristics of the random vector functional-link net[J].Neurocomputing,1994,6(2):163-180.
[32]CAI X Y,FENG X,YU H Q.Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes[J].Compu-ter Science,2022,49(6):134-141.
[33]PENG C,CHUNHAO D.Monitoring multi-domain batchprocess state based on fuzzy broad learning system[J].Expert Systems with Applications,2022,187:115851.
[34]LIU B,ZENG X,TIAN F,et al.Domain Transfer Broad Lear-ning System for Long-Term Drift Compensation in Electronic Nose Systems[J].IEEE Access,2019,7:143947-143959.
[35]BELKIN M,NIYOGI P,SINDHWANI V.Manifold regularization:A geometric framework for learning from labeled and unlabeled examples[J].The Journal of Machine Learning Research,2006,7:2399-2434.
[36]NG N,HULKUND N,CHO K,et al.Predicting Out-of-Domain Generalization with Local Manifold Smoothness[J].arXiv:2207.02093,2022.
[37]LU W,WANG J,SUN X,et al.Out-of-distribution Representation Learning for Time Series Classification[C]//The Eleventh International Conference on Learning Representations.Kigali:OpenReview.net,2023:1-21.
[38]PENG Y,XU J,ZHAO N.Noise Feature Selection Method in PAKDD 2020 Alibaba AI Ops Competition:Large-Scale Disk Failure Prediction[C]//Large-Scale Disk Failure Prediction.Singapore:Springer,2020:109-118.
[39]CAHYADI,FORSHAW M.Hard Disk Failure Prediction onHighly Imbalanced Data using LSTMNetwork[C]//2021 IEEE International Conference on Big Data.Orlando:IEEE,2021:3985-3991.
[40]PITAKRAT T,VAN HOORN A,GRUNSKE L.A comparison of machine learning algorithms for proactive hard disk drive fai-lure detection[C]//Proceedings of the 4th International ACM Sigsoft Symposium on Architecting Critical Systems.New York:Association for Computing Machinery,2013:1-10.
[41]FRIEDMAN J H.Greedy function approximation:a gradientboosting machine[J].Annals of Statistics,2001,29(5):1189-1232.
[42]CHEN T,GUESTRIN C.XGBoost:A scalable tree boostingsystem[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:Association for Computing Machinery,2016:785-794.
[43]KE G,MENG Q,FINLEY T,et al.LightGBM:A Highly Efficient Gradient Boosting Decision Tree[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Long Beach:Curran Associates Inc,2017:30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!