计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 63-74.doi: 10.11896/jsjkx.230600103
王屹阳1, 刘发贵1,2, 彭玲霞1, 钟国祥1
WANG Yiyang1, LIU Fagui1,2, PENG Lingxia1, ZHONG Guoxiang1
摘要: 硬盘是云数据中心最主要的存储设备,硬盘故障预测是保障数据安全的重要手段。但是,硬盘的故障与健康样本之间存在着极端的数量不平衡问题,这会导致模型偏差;此外,不同型号的硬盘数据分布存在一定的差异,在特定硬盘数据上训练的模型往往不适用于其他硬盘。对于这两个问题,文中提出了一种融合AP 聚类算法和宽度学习系统的分布外硬盘故障预测方法。针对样本不平衡问题,文中使用AP聚类算法对硬盘故障出现前一阶段的样本集进行聚类,将与故障样本处于同一聚类簇的样本扩充为故障样本。针对不同型号硬盘分布存在差异的问题,文中结合流形正则化框架和宽度学习系统来学习硬盘数据的低维结构,提高模型对未知分布数据的泛化能力。实验结果表明,在AP聚类算法重采样的样本集上,相较于用于对比的重采样方法得到的样本集,多种故障预测方法的F1_Score取得了平均0.2的提升。此外,在分布外硬盘故障预测任务上,所提模型的F1_Score相比对比方法提升了0.1~0.2。
中图分类号:
[1]GHEMAWAT S,GOBIOFF H,LEUNG S T.The Google file system[C]//Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles.New York:Association for Computing Machinery,2003:29-43. [2]ZHANG H,TANG D,CAI H L.Study on Predictive Erasure Codes in Distributed Storage System[J].Computer Science,2021,48(5):130-139. [3]MURRAY J F,HUGHES G F,KREUTZ-DELGADO K,et al.Machine Learning Methods for Predicting Failures in Hard Drives:A Multiple-Instance Application[J].Journal of Machine Learning Research,2005,6(27):783-816. [4]TOMER V,SHARMA V,GUPTA S,et al.Hard disk drive fai-lure prediction using SMART attribute[J].Materials Today:Proceedings,2021,46(20):11258-11262. [5]GAO X,ZHA S,LI X,et al.Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples[J].IEEE Access,2019,7:114285-114296. [6]BACKBLAZE.BackblazeHard Drive Data and Stats[EB/OL].https://www.backblaze.com/b2/hard-drive-test-data.html. [7]ZHAO R,GUAN D,JIN Y,et al.Hard Disk Failure Prediction via Transfer Learning[C]//Big Data and Security.Singapore:Springer,2021:522-536. [8]WANG J,ZHANG R,QI G,et al.A Heuristic-IRM Method on Hard Disk Failure Prediction in Out-of-distribution Environments[C]//2021 IEEE International Conference on Industrial Engineering and Engineering Management.Singapore:IEEE,2021:1661-1664. [9]ZHANG J,HUANG P,ZHOU K,et al.Hddse:Enabling high-dimensional disk state embedding for generic failure detection system of heterogeneous disks in large data centers[C] //Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference.USA:USENIX Association,2020:111-126. [10]ZHAO N,ZHANG X F,ZHANG L J.Overview of Imbalanced Data Classification[J].Computer Science,2018,45(6A):22-27. [11]SMITH R W,DIETRICH D L.The bathtub curve:an alternative explanation[C]//Proceedings of Annual Reliability and Maintainability Symposium.USA:IEEE,1994:241-247. [12]SCHROEDER B,GIBSON G A.Understanding disk failurerates[J].ACM Transactions on Storage,2007,3(3):8. [13]ZHOU Y,WANG F,FENG D.ASLDP:An Active Semi-supervised Learning method for Disk Failure Prediction[C]//50th International Conference on Parallel Processing.New York:Association for Computing Machinery,2021:1-11. [14]ZHOU H,NIU Z,WANG G,et al.A Proactive Failure Tolerant Mechanismfor SSDs Storage Systems based on Unsupervised Learning[C]//2021 IEEE/ACM 29th International Symposium on Quality of Service.Tokyo:IEEE,2021:1-10. [15]ZHU B,WANG G,LIU X,et al.Proactive drive failure prediction for large scale storage systems[C]//2013 IEEE 29th Symposium on Mass Storage Systems and Technologies.Long Beach:IEEE,2013:1-5. [16]SUN X,CHAKRABARTY K,HUANG R,et al.System-level hardware failure prediction using deep learning[C]//2019 56th ACM/IEEE Design Automation Conference.Las Vegas:IEEE,2019:1-6. [17]BURRELLO A,PAGLIARI D J,BARTOLINI A,et al.Predicting Hard Disk Failures in Data Centers Using Temporal Convolutional Neural Networks[C]//Euro-Par 2020:Parallel Proces-sing Workshops.Cham:Springer,2021:277-289. [18]ZÜFLE M,KRUPITZER C,ERHARD F,et al.To fail or not to fail:Predicting hard disk drive failure time windows[C]//Mea-surement,Modelling and Evaluation of Computing Systems.Cham:Springer,2020:19-36. [19]JIA J,WU P,ZHANG K,et al.Imbalanced Disk Failure Data Processing Method Based on CTGAN[C]//Intelligent Computing Theories and Application.Cham:Springer,2022:638-649. [20]SHEN J,WAN J,LIM S J,et al.Random-forest-based failure prediction for hard disk drives[J].International Journal of Distributed Sensor Networks,2018,14(11):1-15. [21]BOTEZATU M,GIURGIU I,BOGOJESKA J,et al.Predicting disk replacement towards reliable data centers[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.New York:Association for Computing Machinery,2016:39-48. [22]RINCÓN C A C,PARIS J F,VILALTA R,et al.Disk failure prediction in heterogeneous environments[C]//2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems.Seattle:IEEE,2017:113-119. [23]XIAO J,YI Y,XIONG Z,et al.Disk failure prediction in data centers via online learning[C]//Proceedings of the 47th International Conference on Parallel Processing.New York:Association for Computing Machinery,2018:1-10. [24]XU Y,SUI K,YAO R,et al.Improving serviceavailability ofcloud systems by predicting disk error[C]//Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Confe-rence.USA:USENIX Association,2018:481-494. [25]LI J,JI X,JIA Y,et al.Hard drive failure prediction using classification and regression trees[C]//2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.Atlanta:IEEE,2014:383-394. [26]PEREIRA F L F,DOS SANTOS LIMA F D,DE MOURA LEITE L G,et al.Transfer learning for Bayesian networks with application on hard disk drives failure prediction[C]//2017 Brazi-lian Conference on Intelligent Systems.Uberlandia:IEEE,2017:228-233. [27]XIE Y,FENG D,WANG F,et al.OME:An Optimized Modeling Engine for Disk Failure Prediction in Heterogeneous Data Center[C]//2018 IEEE 36th International Conference on Computer Design.Orlando:IEEE,2019:561-564. [28]WANG J,LAN C,LIU C,et al.Generalizing to Unseen Do-mains:A Survey on Domain Generalization[J].IEEE Transactions on Knowledge and Data Engineering,2023,35(8):8052-8072. [29]FREY B J,DUECK D.Clustering by Passing Messages Between Data Points[J].Science,2007,315(5814):972-976. [30]CHEN C L P,LIU Z.Broad Learning System:An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(1):10-24. [31]PAO Y H,PARK G H,SOBAJIC D J.Learning and generalization characteristics of the random vector functional-link net[J].Neurocomputing,1994,6(2):163-180. [32]CAI X Y,FENG X,YU H Q.Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes[J].Compu-ter Science,2022,49(6):134-141. [33]PENG C,CHUNHAO D.Monitoring multi-domain batchprocess state based on fuzzy broad learning system[J].Expert Systems with Applications,2022,187:115851. [34]LIU B,ZENG X,TIAN F,et al.Domain Transfer Broad Lear-ning System for Long-Term Drift Compensation in Electronic Nose Systems[J].IEEE Access,2019,7:143947-143959. [35]BELKIN M,NIYOGI P,SINDHWANI V.Manifold regularization:A geometric framework for learning from labeled and unlabeled examples[J].The Journal of Machine Learning Research,2006,7:2399-2434. [36]NG N,HULKUND N,CHO K,et al.Predicting Out-of-Domain Generalization with Local Manifold Smoothness[J].arXiv:2207.02093,2022. [37]LU W,WANG J,SUN X,et al.Out-of-distribution Representation Learning for Time Series Classification[C]//The Eleventh International Conference on Learning Representations.Kigali:OpenReview.net,2023:1-21. [38]PENG Y,XU J,ZHAO N.Noise Feature Selection Method in PAKDD 2020 Alibaba AI Ops Competition:Large-Scale Disk Failure Prediction[C]//Large-Scale Disk Failure Prediction.Singapore:Springer,2020:109-118. [39]CAHYADI,FORSHAW M.Hard Disk Failure Prediction onHighly Imbalanced Data using LSTMNetwork[C]//2021 IEEE International Conference on Big Data.Orlando:IEEE,2021:3985-3991. [40]PITAKRAT T,VAN HOORN A,GRUNSKE L.A comparison of machine learning algorithms for proactive hard disk drive fai-lure detection[C]//Proceedings of the 4th International ACM Sigsoft Symposium on Architecting Critical Systems.New York:Association for Computing Machinery,2013:1-10. [41]FRIEDMAN J H.Greedy function approximation:a gradientboosting machine[J].Annals of Statistics,2001,29(5):1189-1232. [42]CHEN T,GUESTRIN C.XGBoost:A scalable tree boostingsystem[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:Association for Computing Machinery,2016:785-794. [43]KE G,MENG Q,FINLEY T,et al.LightGBM:A Highly Efficient Gradient Boosting Decision Tree[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Long Beach:Curran Associates Inc,2017:30. |
|