计算机科学 ›› 2019, Vol. 46 ›› Issue (3): 321-326.doi: 10.11896/j.issn.1002-137X.2019.03.047
袁月
YUAN Yue
摘要: 近年来,很多研究者致力于开发自动的性能诊断工具来应对大规模高负荷的分布式环境。云操作系统是云用户与云资源的中间层,诊断并解决云操作系统响应过慢的问题有助于优化云计算系统的性能,在大规模且复杂的分布式云计算环境下,分析云操作系统的任务执行性能具有挑战性。在此背景下,文中提出了一种基于日志的云操作系统性能诊断方法,目的是为指定类别的云操作系统任务找到其处理过慢的原因,为性能优化提供线索。该方法结合云操作系统的实现原理,从云操作系统所产生的海量日志中分离和提取每个系统执行任务相关的日志,抽取关键信息,从而构建层次化的性能描述模型,并将分析粒度逐层细化到函数执行的粒度。通过这种方法,能够找到系统任务执行过慢的主要因素,辅助定位引发性能异常的根源,无需修改源代码或借助源代码分析。以云操作系统OpenStack为原型系统,搭建云计算环境,并进行大规模并发模拟实验。实验结果表明,文中所提出的诊断方法能为系统性能优化提供有效线索,显著提高系统性能,例如,云资源调度过程的耗时可以从分钟级减少到秒级。
中图分类号:
[1]OpenStack.OpenStack open source cloud computing software ..http://www.openstack.org. [2]MI H B,WANG H M,ZHOU Y F,et al.Toward fine-grained,unsupervised,scalable performance diagnosis for production cloud computing systems.IEEE Transactions on Parallel and Distributed Systems,2013,24(6):1245-1255. [3]SAMBASIVAN R R,ZHENG A X,ROSA M D,et al.Diagnosing performance changes by comparing request flows[C]∥Proceedings of USENIX Conference on Networked Systems Design and Implementation.Berkeley:USENIX Association,2011:43-56. [4]SIGELMAN B H,BARROSO L A,BURROWS M,et al.Dap- per,a Large-Scale Distributed Systems Tracing Infrastructure.Google Technical Report,2010. [5]KALDOR J,MACE J,BEJDA M,et al.Canopy:An End-to-End Performance Tracing And Analysis System[C]∥Proceedings of ACM Symposium on Operating Systems Principles.New York:ACM Press,2017:34-50. [6]NANDI A,MANDAL A,ATREJA S,et al.Anomaly detection using program control flow graph mining from execution logs[C]∥Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2016:215-224. [7]SHANG WY,JIANG Z M,HEMMATI H,et al.Assisting developers of big data analytics applications when deploying on hadoop clouds[C]∥Proceedings of International Conference on Software Engineering.New York:IEEE Press,2013:402-411. [8]LIN Q W,ZHANG H Y,LOU J G,et al.Log clustering based problem identification for online service systems[C]∥Procee-dings of International Conference on Software Engineering Companion.New York:IEEE Press,2016:102-111. [9]HE S L,ZHU J M,HE P J,et al.Experience report:System log analysis for anomaly detection[C]∥Proceedings of IEEE International Symposium on Software Reliability Engineering.New York:IEEE Press,2016:207-218. [10]WU F,ANCHURI P,LI Z H.Structural event detection from log messages[C]∥Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2017:1175-1184. [11]MAJUMDAR S,JARRAYA Y,OQAILY M,et al.Leaps: Learning-based proactive security auditing for clouds[C]∥Proceedings of European Symposium on Research in Computer Security.Berlin:Springer,2017:265-285. [12]ZHAO X,RODRIGUES K,LUO Y,et al.Non-intrusive per- formance profiling for entire software stacks based on the flow reconstruction principle[C]∥Proceedings of USENIX Confe-rence on Operating Systems Design and Implementation.Berkeley:USENIX Association,2016:603-618. [13]ZHAO X,ZHANG Y L,LION D,et al.Lprof:A non-intrusive request flow profiler for distributed systems[C]∥Proceedings of USENIX Conference on Operating Systems Design and Implementation.Berkeley:USENIX Association,2014:629-644. [14]ROY S,KONIG A C,DVORKIN I,et al.Perfaugur:Robust dia- gnostics for performance anomalies in cloud services[C]∥Proceedings of IEEE International Conference on Data Engineering.New York:IEEE Press,2015:1167-1178. [15]NAGARAJ K,KILLIAN C,NEVILLE J.Structured comparative analysis of systems logs to diagnose performance problems[C]∥Proceedings of USENIX Symposium on Networked Systems Design and Implementation.Berkeley:USENIX Association,2012:353-366. [16]YUAN D,MAI H,XIONG W,et al.SherLog:error diagnosis by connecting clues from run-time logs[C]∥Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM Press,2010:143-154. [17]HE P J,ZHU J M,HE S L,et al.Towards automated log parsing for large-scale log data analysis.IEEE Transactions on Dependable and Secure Computing,2018,15(6):931-944. [18]DU M,LI F.Spell:Streaming parsing of system event logs[C]∥Proceedings of IEEE International Conference on Data Mining.New York:IEEE Press,2016:859-864. [19]WEIL S A,BRANDT S A,MILLER E L,et al.Ceph:A scalable,high-performance distributed file system[C]∥Proceedings of USENIX Conference on Symposium on Operating Systems Design and Implementation.Berkeley:USENIX Association,2006:307-320. [20]LIU J L,CHENG C Y,CHEN Z,et al.Research on Cloud Data Management Model Based K-Means and Gridding Clustering.Journal of Chongqing University of Technology(Natural Science),2017,31(9):125-130.(in Chinese) 刘加伶,程春游,陈庄,等.基于K-Means和网格化聚类的云数据管理模型研究.重庆理工大学学报(自然科学),2017,31(9):125-130. |
[1] | 高诗尧, 陈燕俐, 许玉岚. 云环境下基于属性的多关键字可搜索加密方案 Expressive Attribute-based Searchable Encryption Scheme in Cloud Computing 计算机科学, 2022, 49(3): 313-321. https://doi.org/10.11896/jsjkx.201100214 |
[2] | 王政, 姜春茂. 一种基于三支决策的云任务调度优化算法 Cloud Task Scheduling Algorithm Based on Three-way Decisions 计算机科学, 2021, 48(6A): 420-426. https://doi.org/10.11896/jsjkx.201000023 |
[3] | 潘瑞杰, 王高才, 黄珩逸. 云计算下基于动态用户信任度的属性访问控制 Attribute Access Control Based on Dynamic User Trust in Cloud Computing 计算机科学, 2021, 48(5): 313-319. https://doi.org/10.11896/jsjkx.200400013 |
[4] | 陈玉平, 刘波, 林伟伟, 程慧雯. 云边协同综述 Survey of Cloud-edge Collaboration 计算机科学, 2021, 48(3): 259-268. https://doi.org/10.11896/jsjkx.201000109 |
[5] | 蒋慧敏, 蒋哲远. 企业云服务体系结构的参考模型与开发方法 Reference Model and Development Methodology for Enterprise Cloud Service Architecture 计算机科学, 2021, 48(2): 13-22. https://doi.org/10.11896/jsjkx.200300044 |
[6] | 王文娟, 杜学绘, 任志宇, 单棣斌. 基于因果知识和时空关联的云平台攻击场景重构 Reconstruction of Cloud Platform Attack Scenario Based on Causal Knowledge and Temporal- Spatial Correlation 计算机科学, 2021, 48(2): 317-323. https://doi.org/10.11896/jsjkx.191200172 |
[7] | 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇. 区块链即服务平台关键技术及发展综述 Survey on Key Techniques and Development of Blockchain as a Service Platform 计算机科学, 2021, 48(11): 4-11. https://doi.org/10.11896/jsjkx.210500159 |
[8] | 王勤, 魏立斐, 刘纪海, 张蕾. 基于云服务器辅助的多方隐私交集计算协议 Private Set Intersection Protocols Among Multi-party with Cloud Server Aided 计算机科学, 2021, 48(10): 301-307. https://doi.org/10.11896/jsjkx.210300308 |
[9] | 雷阳, 姜瑛. 云计算环境下关联节点的异常判断 Anomaly Judgment of Directly Associated Nodes Under Cloud Computing Environment 计算机科学, 2021, 48(1): 295-300. https://doi.org/10.11896/jsjkx.191200186 |
[10] | 徐蕴琪, 黄荷, 金钟. 容器技术在科学计算中的应用研究 Application Research on Container Technology in Scientific Computing 计算机科学, 2021, 48(1): 319-325. https://doi.org/10.11896/jsjkx.191100111 |
[11] | 张恺琪, 涂志莹, 初佃辉, 李春山. 基于排队论的服务资源可用性相关研究综述 Survey on Service Resource Availability Forecast Based on Queuing Theory 计算机科学, 2021, 48(1): 26-33. https://doi.org/10.11896/jsjkx.200900211 |
[12] | 李彦, 申德荣, 聂铁铮, 寇月. 面向加密云数据的多关键字语义搜索方法 Multi-keyword Semantic Search Scheme for Encrypted Cloud Data 计算机科学, 2020, 47(9): 318-323. https://doi.org/10.11896/jsjkx.190800139 |
[13] | 马潇潇, 黄艳. 大属性可公开追踪的密文策略属性基加密方案 Publicly Traceable Accountable Ciphertext Policy Attribute Based Encryption Scheme Supporting Large Universe 计算机科学, 2020, 47(6A): 420-423. https://doi.org/10.11896/JsJkx.190700131 |
[14] | 金小敏, 滑文强. 移动云计算中面向能耗优化的资源管理 Energy Optimization Oriented Resource Management in Mobile Cloud Computing 计算机科学, 2020, 47(6): 247-251. https://doi.org/10.11896/jsjkx.190400020 |
[15] | 孙敏, 陈中雄, 叶侨楠. 云环境下基于HEDSM的工作流调度策略 Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment 计算机科学, 2020, 47(6): 252-259. https://doi.org/10.11896/jsjkx.190400047 |
|