计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 295-300.doi: 10.11896/jsjkx.191200186
雷阳, 姜瑛
LEI Yang, JIANG Ying
摘要: 当前,越来越多的用户选择将服务部署到云计算环境中。然而,云计算服务的多样性以及部署环境的动态性,会导致云计算节点出现异常。传统的节点异常检测方法只针对异常的单一节点,忽略了异常节点对关联节点的影响,从而造成异常传播和关联节点失效等问题。文中提出了一种云计算环境下关联节点的异常判断方法。首先,将Agent部署在各节点上,并通过Agent以特定时间间隔采集节点运行数据,根据节点之间的关联关系建立节点关系图;其次,使用运行数据训练异常检测模型,计算运行数据的权值和综合评分,通过基于滑动时间窗口的方法判断单一节点是否出现异常;最后,在单一节点出现异常的情况下,使用基于标准互信息的方法找出受异常节点影响的其他关联节点。在搭建的云计算平台上,通过模拟各类异常情况,并观察注入异常下节点的状态,验证了文中单一节点异常判断方法和关联节点判断方法的有效性。实验结果表明,该方法在判断单一节点异常的正确率和特异度时都优于其他方法,且在多节点结构下可以准确找到关联的异常节点,具有较高的准确率和稳定性。
中图分类号:
[1] LEE J.A View of Cloud Computing [J].Communications of the ACM,2013,53(4):50-58. [2] WANG T,ZHANG W B,XUN J W,et al.A Survey of Fault Detection for Distributed Software Systems with Statistical Monitoring in Cloud Computing [J].Chinese Journal of Computers,2017,40(2):397-413. [3] FU S.Performance Metric Selection for AutonOMic AnomalyDetection on Cloud Computing Systems[C]//2011 IEEE Global Telecommunications Conference-GLOBECOM 2011.Kathmandu:IEEE,2011:1-5. [4] LIN M,CHEN S.An Efficient Anomaly Detection Framework for Cloud Computing Environment [J].Journal of Computers,2015,10(3):155-165. [5] GUAN Q,FU S.Adaptive Anomaly Identification by Exploring Metric Subspace in Cloud Computing Infrastructures[C]//2013 IEEE 32nd International Symposium on Reliable Distributed Systems.Braga:IEEE,2013:205-214. [6] LAN Z,ZHENG Z,LI Y.Toward Automated Anomaly Identification in Large-scale Systems [J].IEEE Transactions on Parallel and Distributed Systems,2010,21(2):174-187. [7] KANG H,CHEN H,JIANG G.PeerWatch:A Fault Detection and Diagnosis Tool for Virtualized Consolidation Systems[C]//Proceedings of the 7th International Conference on Autonomic Computing.ACM,2010:119-128. [8] LI Y,LAN Z.A Scalable,Non-Parametric Method for Detecting Performance Anomaly in Large Scale Computing[J].IEEE Transactions on Parallel and Distributed Systems,2015,27(7):1902-1914. [9] PERTET S,GANDHI R,NARASIMHAN P.FingerpointingCorrelated Failures in Replicated Systems [C]//Proceedings of the Second Workshop on Tackling Computer Systems Problems with Machine Learning.2007:220-230. [10] KASICK M P,TAN J,GANDHI R,et al.Black-Box Problem Diagnosis in Parallel File Systems[C]//8th USENIX Conference on File and Storage Technologies.San Jose:USENIX Association,2010:23-26. [11] QIANG C.Research on Cloud Computing Resource Manage-ment Model Based on Multi-Agent System[C]//2016 12th International Conference on Computational Intelligence and Security.Wuxi:IEEE,2016:378-381. [12] ALHAMAZANI K,RANJAN R,MITRA K,et al.Clams:Cross-layer Multi-cloud Application Monitoring-as-a-service Framework[C]//2014 IEEE International Conference on Services Computing.Anchorage:IEEE,2014:283-290. [13] REDDY P V V,RAJAMANI L.Performance Comparison ofDifferent Operating Systems in the Private Cloud with KVM hypervisor using SIGAR framework[C]//2015 International Conference on Communication,Information and Computing Technology(ICCICT).Mumbai:IEEE,2015:1-6. [14] WANG X,HUANG S,FU S,et al.Characterizing Workload of Web Applications on Virtualized Servers[C]//Architectural Support for Programming Languages and Operating Systems.2014:98-108. [15] WANG D Q,WANG X X.Large Data Optimization ParticleSwarm Clustering Algorithm Based on Cloud Storage [J].Electronic Design Engineering,2017,25(2):26-30. [16] LEI Y,JIANG Y.Anomaly Judgment for Nodes Based on Agent under Cloud Environment[C]//2019 IEEE International Conference on Computer Science and Educational Informatization(CSEI).Kunming:IEEE,2019:161-167. [17] REN T.Research on Virtual Machine Anomaly Detection System Oriented IaaS[D].Chongqing:Chongqing University,2014. [18] JIANG M,MUNAWAR M A,REIDEMEISTER T,et al.Automatic Fault Detection and Diagnosis in Complex Software Systems by Information-theoretic monitoring[C]//2009 IEEE/IFIP International Conference on Dependable Systems & Networks.Lisbon:IEEE,2009:285-294. [19] O'DRISCOLL A,DAUGELAITE J,SLEATOR R D.‘Big data',Hadoop and cloud computing in genomics[J].Journal of biomedical informatics,2013,46(5):774-781. [20] LIU C C,JIANG Y.Fault Detection Method for Cloud Computing Using Improved Fuzzyk Nearest Neighbor[J].Journal of Chinese Computer Systems,2018,39(10):159-164. |
[1] | 杜艳明, 肖建华. 云环境下基于优先级的多QoS约束工作流调度 Workflow Scheduling Strategy with Multi-QoS Constraint Based on Priority in Cloud Environment 计算机科学, 2019, 46(10): 128-134. https://doi.org/10.11896/jsjkx.180801591 |
[2] | 陈赣浪,颜飞龙,潘家辉. 云计算环境下高复杂度动态数据的增量密度快速聚类算法研究 Study on Fast Incremental Clustering Algorithm for High Complexity Dynamic Data in Cloud Computing Environment 计算机科学, 2018, 45(2): 287-290. https://doi.org/10.11896/j.issn.1002-137X.2018.02.049 |
[3] | 刘勇,乔秀全,李晓峰. 云计算环境下基于Mashup的一种电信网络能力服务提供模式 Telecommunications Network Capability Services Model Based on Mashup Technology in Cloud Computing Environment 计算机科学, 2012, 39(1): 32-36. |
|