Computer Science ›› 2025, Vol. 52 ›› Issue (9): 128-136.doi: 10.11896/jsjkx.240700171
• High Performance Computing • Previous Articles Next Articles
ZHAO Yining, WANG Xiaoning, NIU Tie, ZHAO Yi, XIAO Haili
CLC Number:
[1]WANG X D,ZHAO Y N,XIAO H L,et al.LTmatch:A Method to Abstract Pattern from Unstructured Log[J].Applied Sciences,2021,11(11):5302. [2]GAINARU A,CAPPELLO F,SNIR M,et al.Fault prediction under the microscope:A closer look into HPC systems[C]//Proceedings of the International Conference on High Perfor-mance Computing,Networking,Storage and Analysis(SC’12).2012. [3]GAINARU A,CAPPELLO F,SNIR M,et al.Failure prediction for HPC systems and applications:Current situation and open issues[J].International Journal of High Performance Computing Applications,2013,27(3):273-282. [4]DAS A,MUELLER F,SIEGEL C,et al.Desh:Deep Learningfor System Health Prediction of Lead Times to Failure in HPC[C]//The 27th ACM International Symposium on High-Performance Parallel and Distributed Computing(HPDC’18).2018. [5]DAS A,MUELLER F,HARGROVE P,et al.Doomsday:Predicting Which Node Will Fail When on Supercomputers[C]//Proceedings of the International Conference on High Perfor-mance Computing,Networking,Storage and Analysis(SC’18).2018. [6]DAS A,MUELLER F,ROUNTREE B.Aarohi:Making Real-Time Node Failure Prediction Feasible[C]//The 34th IEEE International Parallel & Distributed Processing Symposium(IPDPS2020).2020. [7]ALHARTHI K A,JHUMKA A,DI S,et al.Time Machine:Generative Real-Time Model for Failure(and Lead Time) Prediction in HPC Systems[C]//The 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN).2023. [8]FRANK A,YANG D,BRINKMANN A,et al.Reducing False Node Failure Predictions in HPC[C]//The IEEE 26th International Conference on High Performance Computing,Data,and Analytics(HiPC).2019. [9]MOHAMMED B,AWAN I,UGAIL H,et al.Failure prediction using machine learning in a virtualised HPC system and application[J].Cluster computing,2019,22:471-485. [10]LI J,WANG R,ALI G,et al.Workload Failure Prediction for Data Centers[C]//The IEEE 16th International Conference on Cloud Computing(CLOUD).USA,2023. [11]BANJONGKAN A,PONGSENA W,KERDPRASOP N,et al.A Study of Job Failure Prediction at Job Submit-State and Job Start-State in High-Performance Computing System:Using Decision Tree Algorithms[J].Journal of Advances in Information Technology,2021,12(2):84-92. [12]CHUAH E,JHUMKA A,MALEK M,et al.A Survey of Log-Correlation Tools for Failure Diagnosis and Prediction in Cluster Systems[J].IEEE Access,2022,10:133487-133503. [13]TAN Y M.Online Performance Anomaly Prediction and Prevention for Complex Distributed Systems[D].North Carolina:North Carolina State University,2012. [14]SHEN Q,LOU J G,ZHANG X T,et al.Failure prediction by regularized fuzzy learning with intelligent parameters selection[J].Applied Soft Computing Journal,2021,100:106952. [15]WANG L F,LI D J.SFFDD:Deep Neural Network with En-riched Features for Failure Prediction with Its Application to Computer Disk Driver[J].arXiv:2109.09856,2021. [16]JIA T,LI Y,WU Z H.Survey of State-of-the-art Log-based Failure Diagnosis[J].Journal of Software,2020,31(7):1997-2018. [17]DU M,LI F,ZHENG G,et al.DeepLog:Anomaly Detection and Diagnosis from System Logs through Deep Learning[C]//Computer and Communications Security.ACM,2017. [18]PENG W,LI T,MA S.Mining logs files for data-driven system management[J].ACM SIGKDD Explorations Newsletter,2005,7(1):44-51. [19]GAO J G,ZHENG Y,YU K,et al.Runtime Fault LocationMethod for Sunway Supercomputer[J].Journal of Computer Research and Development,2024,61(1):86-97. [20]AKSAR B,ZHANG Y J,ATES E,et al.Proctor:A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems[C]//International Conference on High Performance Computing.Cham:Springer,2021:195-214. [21]HUANG S H,LIU Y,FUNG C,et al.HitAnomaly:HierarchicalTransformers for Anomaly Detection in System Log[J].IEEE Transactions on Network and Service Management,2020,17(4):2064-2076. [22]ZHANG T Z,QIU H,CASTELLANO G,et al.System LogParsing:A Survey[J].arXiv:2212.14277,2022. [23]WITTKOPP T,ACKER A,KAO O.Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis[J].arXiv:2312.14748,2023. [24]LOSADA N,CORES I,MARTÍN M J,et al.Resilient MPI applications using an application-level checkpointing framework and ULFM[J].Journal of Supercomputing,2017(73):100-113. [25]BENOIT A,CAVELAN A,CAPPELLO F,et al.Coping with silent and fail-stop errors at scale by combining replication and checkpointing[J].Journal of Parallel and Distributed Computing,2018,122:209-225. [26]MA L L,YI S H,LI Q.Efficient service handoff across edgeservers via docker container migration[C]//Proceedings of the Second ACM/IEEE Symposium on Edge Computing.ACM,2017:1-13. [27]ZHAO Q,XIE S Q,HAN K,et al.Container Migration Based on Combination of Remote Direct Memory Access and Check Point[J].Journal of Frontiers of Computer Science and Techno-logy,2019,13(12):1995-2007. [28]LUO C,CUI Y,LIN Y S.Container Migration Method Based on Bandwidth Prediction and Adaptive Compression[J].Computer Engineering,2022,48(5):200-207. |
[1] | WANG Pu, GAO Zhanyun, WANG Zhenfei, SONG Zheli. BDBFT:A Consensus Protocol Based on Reputation Prediction Model for IoT Scenario [J]. Computer Science, 2025, 52(5): 366-374. |
[2] | WANG Chengzhang, BAI Xiaoming, TANG Wenying, CHEN Shuhan. Evolutionary CatBoost Based Housing Price Prediction Model [J]. Computer Science, 2024, 51(11A): 240300180-5. |
[3] | MAO Xin, LEI Zhanyao, QI Zhengwei. Automated Kaomoji Extraction Based on Large-scale Danmaku Texts [J]. Computer Science, 2024, 51(1): 284-294. |
[4] | YANG Heng, ZHU Yan. Analysis of Academic Network Based on Graph OLAP [J]. Computer Science, 2023, 50(6A): 220100237-5. |
[5] | LI Honghui, CHEN Bo, LU Shuyi, ZHANG Junwen. Study on Reliability Prediction Model Based on BASFPA-BP [J]. Computer Science, 2023, 50(5): 31-37. |
[6] | XU Xia, ZHANG Hui, YANG Chunming, LI Bo, ZHAO Xujian. Fair Method for Spectral Clustering to Improve Intra-cluster Fairness [J]. Computer Science, 2023, 50(2): 158-165. |
[7] | CONG Ying-nan, WANG Zhao-yu, ZHU Jin-qing. Insights into Dataset and Algorithm Related Problems in Artificial Intelligence for Law [J]. Computer Science, 2022, 49(4): 74-79. |
[8] | YAN Rui, LIANG Zhi-yong, LI Jin-tao, REN Fei. Predicting Tumor-related Indicators Based on Deep Learning and H&E Stained Pathological Images:A Survey [J]. Computer Science, 2022, 49(2): 69-82. |
[9] | JIANG Hao-chen, WEI Zi-qi, LIU Lin, CHEN Jun. Imbalanced Data Classification:A Survey and Experiments in Medical Domain [J]. Computer Science, 2022, 49(1): 80-88. |
[10] | YU Yue-zhang, XIA Tian-yu, JING Yi-nan, HE Zhen-ying, WANG Xiao-yang. Smart Interactive Guide System for Big Data Analytics [J]. Computer Science, 2021, 48(9): 110-117. |
[11] | CHEN Hui-qin, GUO Guan-cheng, QIN Chao-xuan, LI Zhao-bi. Research on Elderly Population Prediction Based on GM-LSTM Model in Nanjing City [J]. Computer Science, 2021, 48(6A): 231-234. |
[12] | WU Guang-zhi, GUO Bin, DING Ya-san, CHENG Jia-hui, YU Zhi-wen. Cognitive Mechanisms of Fake News [J]. Computer Science, 2021, 48(6): 306-314. |
[13] | ZHANG Han-shuo, YANG Dong-ju. Technology Data Analysis Algorithm Based on Relational Graph [J]. Computer Science, 2021, 48(3): 174-179. |
[14] | WANG Bo-yu, WANG Zhong-qing, ZHOU Guo-dong. Dialogue Act Prediction Based on Response Generation [J]. Computer Science, 2021, 48(2): 212-216. |
[15] | HU Teng, WANG Yan-ping, ZHANG Xiao-song, NIU Wei-na. Data and Behavior Analysis of Blockchain-based DApp [J]. Computer Science, 2021, 48(11): 116-123. |
|