计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 50-55.doi: 10.11896/jsjkx.200700145

所属专题: 复杂系统的软件工程和需求工程

• 复杂系统的软件工程和需求工程* • 上一篇    下一篇

云环境下的系统使用模式与故障分析

田宇立, 李宁   

  1. 西北工业大学计算机学院 西安 710029
    西北工业大学工信部大数据存储与管理重点实验室 西安 710029
  • 收稿日期:2020-07-22 修回日期:2020-08-27 出版日期:2020-12-15 发布日期:2020-12-17
  • 通讯作者: 李宁(lining@nwpu.edu.cn)
  • 作者简介:tianyuli002@mail.nwpu.edu.cn
  • 基金资助:
    国家自然科学基金(6197231761402370)

System Usage Analysis and Failure Analysis for Cloud Computing

TIAN Yu-li, LI Ning   

  1. School of Computer Science Northwestern Polytechnical University Xi'an 710029,China
    MIIIT Key Laboratory of Big Data Storage and Management Northwestern Polytechnical University Xi'an 710029,China
  • Received:2020-07-22 Revised:2020-08-27 Online:2020-12-15 Published:2020-12-17
  • About author:TIAN Yu-li,born in 1990Ph.D studentis a student member of China Computer Federation.His main research interests include software quality engineeringsoftware reliability engineering and mining software repository.
    LI Ning,born in 1978Ph.Dassociate professoris a member of China Computer Federation.Her main research interests include software testingsoftware defect analysis and mining software repository.
  • Supported by:
    National Natural Science Foundation of China(61972317,61402370).

摘要: 从软件系统使用视角出发开展系统使用模式与故障分析可以帮助软件提供者更准确地把握用户需求、评价系统质量、指导系统运营和完善系统维护方案.云计算系统整合海量计算资源并通过网络接入为用户提供可配置的计算解决方案受到了学术界和工业界的一致关注.深入理解云计算系统的使用负载和软件故障特征对于提高云计算系统的资源利用效率和系统服务可靠性具有重要的促进作用.文中针对云计算环境下的系统使用模式和系统故障进行研究深入分析了Googlecluster云计算系统的真实执行日志从系统使用模式和故障特征等方面对系统进行了描述和总结揭示了系统存在的质量问题并为提高云计算系统的质量奠定了基础.

关键词: 故障分析, 软件故障, 使用分析, 使用模式, 云计算

Abstract: From the perspective of software system usagethe system usage pattern and fault analysis can help the software provider to more accurately grasp user demandevaluate system qualityguide system operation and improve system maintenance.Cloud computing systems (CCS) provide configurable online accessed computational resolutions to end users from an integrated resource poolwhich have received great attention from both academia and industry.Understanding CCS usage workload and fai-lure patterns is important to improve system resource utilization efficiency as well as system service reliability.This paper performs a deep analysis on the Google cluster dataset to characterize system operation in terms of both usage workload and fa-ilure patterns.The results reveal potential vulnerability to the system and provide the basis for follow-up quality assurance activities.

Key words: Cloud computing, Failure analysis, Software failure, Usage analysis, Usage pattern

中图分类号: 

  • TP311
[1] CLOUD H.The NIST definition of cloud computing[R].National Institute of Science and Technology ,Technical Report,2011,145.
[2] LIU B,LIN Y,CHEN Y.Quantitative workload analysis andprediction using Google cluster traces[C]//IEEE Conference on Computer Communications.2016:935-940.
[3] MORENO I S,GARRAGHAN P,TOWNEND P,et al.Analysis,modeling and simulation of workload patterns in a large-scale utility cloud[J].IEEE Transactions on Cloud Computing,2014,2(2):208-221.
[4] ROSÁ A,CHEN L Y,BINDER W.Failure analysis and prediction for big-data systems[J].IEEE Transactions on Services Computing,2016,10(6):984-998.
[5] GARRAGHAN P,TOWNEND P,XU J.An empirical failure-analysis of a large-scale cloud computing environment[C]//International Symposium on High-Assurance Systems Engineering.2014:113-120.
[6] COTRONEO D,DE SIMONE L,LIGUORI P,et al.Enhancing failure propagation analysis in cloud computing systems[C]//International Symposium on Software Reliability Engineering.2019.
[7] LYU M R.Handbook of software reliability engineering[M].CA:IEEE computer society press,1996.
[8] TIAN J,RUDRARAJU S,LI Z.Evaluating web software reliability based on workload and failure data extracted from server logs[J].IEEE Transactions on Software Engineering,2004,30(11):754-769.
[9] GUPTA S,DILEEP A D.Long range dependence in cloud servers:a statistical analysis based on Google workload trace[J].Computing,2020:102(4):1-19.
[10] KAVULYA S,TAN J,GANDHI R,et al.An analysis of traces from a production mapreduce cluster[C]//IEEE International Conference on Cluster,Cloud and Grid Computing.2010:94-103.
[11] CHEN Z,HU J,MIN G,et al.Towards accurate prediction for high-dimensional and highly-variable cloud workloads with deep Learning[J].IEEE Transactions on Parallel and Distributed Systems,2020,31(4):923-934.
[12] TIAN J.Software quality engineering:testing,quality assur-ance,and quantifiable improvement[M].John Wiley &Sons,2005.
[13] GARG S K,GOPALAIYENGAR S K,BUYYA R.SLA-based resource provisioning for heterogeneous workloads in a virtualized cloud datacenter[C]//International Conference on Algorithms and Architectures for Parallel Processing.2011:371-384.
[14] SHARMA B,CHUDNOVSKY V,HELLERSTEIN J L,et al.Modeling and synthesizing task placement constraints in Google compute clusters[C]//ACM Symposium on Cloud Computing.2011:1-14.
[15] ZHU X,YANG L T,CHEN H,et al.Real-time tasks oriented energy-aware scheduling in virtualized clouds[J].IEEE Transactions on Cloud Computing,2014,2(2):168-180.
[16] SAHOO R K,SQUILLANTE M S,SIVASUBRAMANIAM A,et al.Failure data analysis of a large-scale heterogeneous server environment[C]//IEEE International Conference on Dependable Systems and Networks.2004:772-781.
[17] REISS C,WILKES J,HELLERSTEIN J L.Google cluster-usage traces:format+ schema[R].Google Inc.,Technical Report,2011:1-14.
[18] CHEN Z,HU J,MIN G,et al.Towards accurate prediction for high-dimensional and highly-variable cloud workloads with deep learning[J].IEEE Transactions on Parallel and Distributed Systems,2019,31(4):923-934.
[19] KHAN A A,ZAKARYA M,BUYYA R,et al.An energy and performance aware consolidation technique for containerized datacenters[J].IEEE Transactions on Cloud Computing,2019,PP(99).
[20] MUSA J D,IANNINO A,OKUMOTO K.Software Reliability:Measurement,prediction,application[M].McGrawHill,New York,1987.
[21] TIAN J.Integrating time domain and input domain analyses of software reliability using tree-based models[J].IEEE Transactions on Software Engineering,1995,21(12):945-958.
[22] REISS C,TUMANOV A,GANGER G R,et al.Towards understanding heterogeneous clouds at scale:Google trace analysis[R].Intel Science and Technology Center for Cloud Computing,Tech.Rep,2012.
[1] 高诗尧, 陈燕俐, 许玉岚.
云环境下基于属性的多关键字可搜索加密方案
Expressive Attribute-based Searchable Encryption Scheme in Cloud Computing
计算机科学, 2022, 49(3): 313-321. https://doi.org/10.11896/jsjkx.201100214
[2] 王政, 姜春茂.
一种基于三支决策的云任务调度优化算法
Cloud Task Scheduling Algorithm Based on Three-way Decisions
计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023
[3] 潘瑞杰, 王高才, 黄珩逸.
云计算下基于动态用户信任度的属性访问控制
Attribute Access Control Based on Dynamic User Trust in Cloud Computing
计算机科学, 2021, 48(5): 313-319. https://doi.org/10.11896/jsjkx.200400013
[4] 陈玉平, 刘波, 林伟伟, 程慧雯.
云边协同综述
Survey of Cloud-edge Collaboration
计算机科学, 2021, 48(3): 259-268. https://doi.org/10.11896/jsjkx.201000109
[5] 王文娟, 杜学绘, 任志宇, 单棣斌.
基于因果知识和时空关联的云平台攻击场景重构
Reconstruction of Cloud Platform Attack Scenario Based on Causal Knowledge and Temporal- Spatial Correlation
计算机科学, 2021, 48(2): 317-323. https://doi.org/10.11896/jsjkx.191200172
[6] 蒋慧敏, 蒋哲远.
企业云服务体系结构的参考模型与开发方法
Reference Model and Development Methodology for Enterprise Cloud Service Architecture
计算机科学, 2021, 48(2): 13-22. https://doi.org/10.11896/jsjkx.200300044
[7] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇.
区块链即服务平台关键技术及发展综述
Survey on Key Techniques and Development of Blockchain as a Service Platform
计算机科学, 2021, 48(11): 4-11. https://doi.org/10.11896/jsjkx.210500159
[8] 王勤, 魏立斐, 刘纪海, 张蕾.
基于云服务器辅助的多方隐私交集计算协议
Private Set Intersection Protocols Among Multi-party with Cloud Server Aided
计算机科学, 2021, 48(10): 301-307. https://doi.org/10.11896/jsjkx.210300308
[9] 雷阳, 姜瑛.
云计算环境下关联节点的异常判断
Anomaly Judgment of Directly Associated Nodes Under Cloud Computing Environment
计算机科学, 2021, 48(1): 295-300. https://doi.org/10.11896/jsjkx.191200186
[10] 徐蕴琪, 黄荷, 金钟.
容器技术在科学计算中的应用研究
Application Research on Container Technology in Scientific Computing
计算机科学, 2021, 48(1): 319-325. https://doi.org/10.11896/jsjkx.191100111
[11] 张恺琪, 涂志莹, 初佃辉, 李春山.
基于排队论的服务资源可用性相关研究综述
Survey on Service Resource Availability Forecast Based on Queuing Theory
计算机科学, 2021, 48(1): 26-33. https://doi.org/10.11896/jsjkx.200900211
[12] 李彦, 申德荣, 聂铁铮, 寇月.
面向加密云数据的多关键字语义搜索方法
Multi-keyword Semantic Search Scheme for Encrypted Cloud Data
计算机科学, 2020, 47(9): 318-323. https://doi.org/10.11896/jsjkx.190800139
[13] 马潇潇, 黄艳.
大属性可公开追踪的密文策略属性基加密方案
Publicly Traceable Accountable Ciphertext Policy Attribute Based Encryption Scheme Supporting Large Universe
计算机科学, 2020, 47(6A): 420-423. https://doi.org/10.11896/JsJkx.190700131
[14] 梁俊斌, 张敏, 蒋婵.
社交传感云安全研究进展
Research Progress of Social Sensor Cloud Security
计算机科学, 2020, 47(6): 276-283. https://doi.org/10.11896/jsjkx.190400116
[15] 金小敏, 滑文强.
移动云计算中面向能耗优化的资源管理
Energy Optimization Oriented Resource Management in Mobile Cloud Computing
计算机科学, 2020, 47(6): 247-251. https://doi.org/10.11896/jsjkx.190400020
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!