计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 321-326.doi: 10.11896/j.issn.1002-137X.2019.03.047

• 交叉与前沿 • 上一篇    下一篇

一种层次化的云操作系统性能诊断方法

袁月   

  1. 中国人民大学信息学院 北京 100872
  • 收稿日期:2018-09-30 修回日期:2018-12-28 出版日期:2019-03-15 发布日期:2019-03-22
  • 通讯作者: 袁月(1991-),女,博士生,主要研究方向为云计算、信息安全,E-mail:yyuanyuee@163.com(通信作者)。
  • 基金资助:
    国家自然科学基金面上项目(61472429)资助

Hierarchical Performance Diagnosis Method for Cloud Operating System

YUAN Yue   

  1. School of Information,Renmin University of China,Beijing 100872,China
  • Received:2018-09-30 Revised:2018-12-28 Online:2019-03-15 Published:2019-03-22

摘要: 近年来,很多研究者致力于开发自动的性能诊断工具来应对大规模高负荷的分布式环境。云操作系统是云用户与云资源的中间层,诊断并解决云操作系统响应过慢的问题有助于优化云计算系统的性能,在大规模且复杂的分布式云计算环境下,分析云操作系统的任务执行性能具有挑战性。在此背景下,文中提出了一种基于日志的云操作系统性能诊断方法,目的是为指定类别的云操作系统任务找到其处理过慢的原因,为性能优化提供线索。该方法结合云操作系统的实现原理,从云操作系统所产生的海量日志中分离和提取每个系统执行任务相关的日志,抽取关键信息,从而构建层次化的性能描述模型,并将分析粒度逐层细化到函数执行的粒度。通过这种方法,能够找到系统任务执行过慢的主要因素,辅助定位引发性能异常的根源,无需修改源代码或借助源代码分析。以云操作系统OpenStack为原型系统,搭建云计算环境,并进行大规模并发模拟实验。实验结果表明,文中所提出的诊断方法能为系统性能优化提供有效线索,显著提高系统性能,例如,云资源调度过程的耗时可以从分钟级减少到秒级。

关键词: 基础设施即服务, 日志分析, 系统性能诊断, 云计算

Abstract: Recently,quite some researchers aim to develop automatic performance diagnostic tools for dealing with the large-scale and high-load distributed environment.Cloud operating system is the middle layer between cloud user and cloud resource,and diagnosing and settling the problem of slow response of cloud operating system is helpful for optimizing the performance of cloud computing system.It is a challenging job to analyze the performance of executing task in large-scale and complex distributed cloud computing environment.In light of this,this paper proposed a log-based performance diagnosis method for cloud operating system to find out the reason for low execution speed of appointed tasks and provide clues for performance optimization.This method combines the implementation principal of cloud operating system,separates and extracts relevant logs of each executing tasks from the massive logs generated by cloud operating system,and extracts key information,so as to construct hierarchical performance description model and refine the analysis granularity to function executed granularity layer by layer.Finally,through using this method,the main factor of low execution speed can be gotten,which can assist to locate the source of abnormal performance,and it doesn’t need to modify the source code and use the source code to conduct analysis.This paper utilized the OpenStack as prototype system,created the cloud computing environment,and conducted large-scale concurrent simulation experiment.The experimental results demonstrate that the proposed method can provide efficient clues for optimizing system performance and improve the performance obviously,e.g. the consumed time of cloud resource scheduling can be reduced from minute level to second level.

Key words: Cloud computing, IaaS, Log analysis, System performance diagnosis

中图分类号: 

  • TP311
[1]OpenStack.OpenStack open source cloud computing software
..http://www.openstack.org.
[2]MI H B,WANG H M,ZHOU Y F,et al.Toward fine-grained,unsupervised,scalable performance diagnosis for production cloud computing systems.IEEE Transactions on Parallel and Distributed Systems,2013,24(6):1245-1255.
[3]SAMBASIVAN R R,ZHENG A X,ROSA M D,et al.Diagnosing performance changes by comparing request flows[C]∥Proceedings of USENIX Conference on Networked Systems Design and Implementation.Berkeley:USENIX Association,2011:43-56.
[4]SIGELMAN B H,BARROSO L A,BURROWS M,et al.Dap-
per,a Large-Scale Distributed Systems Tracing Infrastructure.Google Technical Report,2010.
[5]KALDOR J,MACE J,BEJDA M,et al.Canopy:An End-to-End Performance Tracing And Analysis System[C]∥Proceedings of ACM Symposium on Operating Systems Principles.New York:ACM Press,2017:34-50.
[6]NANDI A,MANDAL A,ATREJA S,et al.Anomaly detection using program control flow graph mining from execution logs[C]∥Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2016:215-224.
[7]SHANG WY,JIANG Z M,HEMMATI H,et al.Assisting developers of big data analytics applications when deploying on hadoop clouds[C]∥Proceedings of International Conference on Software Engineering.New York:IEEE Press,2013:402-411.
[8]LIN Q W,ZHANG H Y,LOU J G,et al.Log clustering based problem identification for online service systems[C]∥Procee-dings of International Conference on Software Engineering Companion.New York:IEEE Press,2016:102-111.
[9]HE S L,ZHU J M,HE P J,et al.Experience report:System log analysis for anomaly detection[C]∥Proceedings of IEEE International Symposium on Software Reliability Engineering.New York:IEEE Press,2016:207-218.
[10]WU F,ANCHURI P,LI Z H.Structural event detection from log messages[C]∥Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2017:1175-1184.
[11]MAJUMDAR S,JARRAYA Y,OQAILY M,et al.Leaps:
Learning-based proactive security auditing for clouds[C]∥Proceedings of European Symposium on Research in Computer Security.Berlin:Springer,2017:265-285.
[12]ZHAO X,RODRIGUES K,LUO Y,et al.Non-intrusive per-
formance profiling for entire software stacks based on the flow reconstruction principle[C]∥Proceedings of USENIX Confe-rence on Operating Systems Design and Implementation.Berkeley:USENIX Association,2016:603-618.
[13]ZHAO X,ZHANG Y L,LION D,et al.Lprof:A non-intrusive request flow profiler for distributed systems[C]∥Proceedings of USENIX Conference on Operating Systems Design and Implementation.Berkeley:USENIX Association,2014:629-644.
[14]ROY S,KONIG A C,DVORKIN I,et al.Perfaugur:Robust dia-
gnostics for performance anomalies in cloud services[C]∥Proceedings of IEEE International Conference on Data Engineering.New York:IEEE Press,2015:1167-1178.
[15]NAGARAJ K,KILLIAN C,NEVILLE J.Structured comparative analysis of systems logs to diagnose performance problems[C]∥Proceedings of USENIX Symposium on Networked Systems Design and Implementation.Berkeley:USENIX Association,2012:353-366.
[16]YUAN D,MAI H,XIONG W,et al.SherLog:error diagnosis by connecting clues from run-time logs[C]∥Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM Press,2010:143-154.
[17]HE P J,ZHU J M,HE S L,et al.Towards automated log parsing for large-scale log data analysis.IEEE Transactions on Dependable and Secure Computing,2018,15(6):931-944.
[18]DU M,LI F.Spell:Streaming parsing of system event logs[C]∥Proceedings of IEEE International Conference on Data Mining.New York:IEEE Press,2016:859-864.
[19]WEIL S A,BRANDT S A,MILLER E L,et al.Ceph:A scalable,high-performance distributed file system[C]∥Proceedings of USENIX Conference on Symposium on Operating Systems Design and Implementation.Berkeley:USENIX Association,2006:307-320.
[20]LIU J L,CHENG C Y,CHEN Z,et al.Research on Cloud Data Management Model Based K-Means and Gridding Clustering.Journal of Chongqing University of Technology(Natural Science),2017,31(9):125-130.(in Chinese)
刘加伶,程春游,陈庄,等.基于K-Means和网格化聚类的云数据管理模型研究.重庆理工大学学报(自然科学),2017,31(9):125-130.
[1] 高诗尧, 陈燕俐, 许玉岚.
云环境下基于属性的多关键字可搜索加密方案
Expressive Attribute-based Searchable Encryption Scheme in Cloud Computing
计算机科学, 2022, 49(3): 313-321. https://doi.org/10.11896/jsjkx.201100214
[2] 王政, 姜春茂.
一种基于三支决策的云任务调度优化算法
Cloud Task Scheduling Algorithm Based on Three-way Decisions
计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023
[3] 潘瑞杰, 王高才, 黄珩逸.
云计算下基于动态用户信任度的属性访问控制
Attribute Access Control Based on Dynamic User Trust in Cloud Computing
计算机科学, 2021, 48(5): 313-319. https://doi.org/10.11896/jsjkx.200400013
[4] 陈玉平, 刘波, 林伟伟, 程慧雯.
云边协同综述
Survey of Cloud-edge Collaboration
计算机科学, 2021, 48(3): 259-268. https://doi.org/10.11896/jsjkx.201000109
[5] 蒋慧敏, 蒋哲远.
企业云服务体系结构的参考模型与开发方法
Reference Model and Development Methodology for Enterprise Cloud Service Architecture
计算机科学, 2021, 48(2): 13-22. https://doi.org/10.11896/jsjkx.200300044
[6] 王文娟, 杜学绘, 任志宇, 单棣斌.
基于因果知识和时空关联的云平台攻击场景重构
Reconstruction of Cloud Platform Attack Scenario Based on Causal Knowledge and Temporal- Spatial Correlation
计算机科学, 2021, 48(2): 317-323. https://doi.org/10.11896/jsjkx.191200172
[7] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇.
区块链即服务平台关键技术及发展综述
Survey on Key Techniques and Development of Blockchain as a Service Platform
计算机科学, 2021, 48(11): 4-11. https://doi.org/10.11896/jsjkx.210500159
[8] 王勤, 魏立斐, 刘纪海, 张蕾.
基于云服务器辅助的多方隐私交集计算协议
Private Set Intersection Protocols Among Multi-party with Cloud Server Aided
计算机科学, 2021, 48(10): 301-307. https://doi.org/10.11896/jsjkx.210300308
[9] 雷阳, 姜瑛.
云计算环境下关联节点的异常判断
Anomaly Judgment of Directly Associated Nodes Under Cloud Computing Environment
计算机科学, 2021, 48(1): 295-300. https://doi.org/10.11896/jsjkx.191200186
[10] 徐蕴琪, 黄荷, 金钟.
容器技术在科学计算中的应用研究
Application Research on Container Technology in Scientific Computing
计算机科学, 2021, 48(1): 319-325. https://doi.org/10.11896/jsjkx.191100111
[11] 张恺琪, 涂志莹, 初佃辉, 李春山.
基于排队论的服务资源可用性相关研究综述
Survey on Service Resource Availability Forecast Based on Queuing Theory
计算机科学, 2021, 48(1): 26-33. https://doi.org/10.11896/jsjkx.200900211
[12] 李彦, 申德荣, 聂铁铮, 寇月.
面向加密云数据的多关键字语义搜索方法
Multi-keyword Semantic Search Scheme for Encrypted Cloud Data
计算机科学, 2020, 47(9): 318-323. https://doi.org/10.11896/jsjkx.190800139
[13] 马潇潇, 黄艳.
大属性可公开追踪的密文策略属性基加密方案
Publicly Traceable Accountable Ciphertext Policy Attribute Based Encryption Scheme Supporting Large Universe
计算机科学, 2020, 47(6A): 420-423. https://doi.org/10.11896/JsJkx.190700131
[14] 金小敏, 滑文强.
移动云计算中面向能耗优化的资源管理
Energy Optimization Oriented Resource Management in Mobile Cloud Computing
计算机科学, 2020, 47(6): 247-251. https://doi.org/10.11896/jsjkx.190400020
[15] 孙敏, 陈中雄, 叶侨楠.
云环境下基于HEDSM的工作流调度策略
Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment
计算机科学, 2020, 47(6): 252-259. https://doi.org/10.11896/jsjkx.190400047
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!