计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 145-155.doi: 10.11896/jsjkx.181102210

• 软件与数据库技术 • 上一篇    下一篇

基于日志可视化分析的微服务系统调试方法

李文海, 彭鑫, 丁丹, 向麒麟, 郭晓峰, 周翔, 赵文耘   

  1. (复旦大学计算机科学技术学院 上海201203)
    (上海市数据科学重点实验室(复旦大学) 上海201203)
  • 收稿日期:2018-11-29 出版日期:2019-11-15 发布日期:2019-11-14
  • 通讯作者: 彭鑫(1979-),男,教授,博士生导师,CCF高级会员,主要研究方向为代码大数据、智能化软件开发、移动云计算,E-mail:pengxin@fudan.edu.cn
  • 作者简介:李文海(1994-),男,硕士生,主要研究方向为移动众包、微服务,E-mail:16212010016@fudan.edu.cn;丁丹(1994-),女,硕士生,主要研究方向为软件工程;向麒麟(1994-),男,硕士生,主要研究方向为软件工程;郭晓峰(1996-),男,硕士生,主要研究方向为软件工程;周翔(1983-),男,博士生,主要研究方向为软件工程;赵文耘(1967-),男,硕士,教授,博士生导师,主要研究方向为软件工程、软件开发工具及其环境、企业应用集成(EAI)。
  • 基金资助:
    本文受国家重点研发计划项目(2018YFB1004803)资助。

Method of Microservice System Debugging Based on Log Visualization Analysis

LI Wen-hai, PENG Xin, DING DAN, XIANG Qi-lin, GUO Xiao-feng, ZHOU Xiang, ZHAO Wen-yun   

  1. (School of Computer Science,Fudan University,Shanghai 201203,China)
    (Shanghai Key Laboratory of Data Science,Fudan University,Shanghai 201203,China)
  • Received:2018-11-29 Online:2019-11-15 Published:2019-11-14

摘要: 云计算时代,越来越多的企业开始采用微服务架构进行软件开发或者传统巨石应用改造。然而,微服务系统具有较高的复杂性和动态性,当系统出现故障时,目前没有方法或者工具能够有效支持对故障根源的定位。为此,文中首次提出通过调用链信息关联单次业务请求在所有服务上产生的业务日志,并在此基础上研究基于日志可视化分析的微服务系统调试方法。首先定义了微服务的日志模型,规范化微服务日志可视化分析所需要的数据信息;然后针对4种典型的微服务故障(有异常抛出的普通故障、无异常抛出的逻辑故障、服务异步调用序列未控制导致的故障以及服务多实例版本或状态不一致导致的故障)总结出5种可视化调试策略,用于支持对故障根源的定位,5种策略包括:单条调用链日志查看、不同调用链对比、服务异步调用分析、服务多实例分析以及调用链分段。为了实现服务异步调用分析和服务多实例分析,文中设计了两个算法,同时,设计并实现了一个原型工具LogVisualization。LogVisua-lization可以收集微服务系统运行时产生的日志信息、调用链数据以及集群的节点和服务实例信息,能够以较小的代码侵入性,实现通过调用链信息关联所有业务日志,支持用户使用5种策略进行可视化调试。最后,将该原型工具应用于实际的微服务系统,通过与现有工具(Zipkin+ELK)的实验对比,验证了该原型工具在4种微服务故障根源定位上的有用性和高效性。

关键词: 调试, 调用链, 故障, 可视化, 日志, 微服务

Abstract: In the era of cloud computing,more and more enterprises are adopting microservice architecture for software development or traditional monolithic application transformation.However,microservice system has high complexity and dynamism.When microservice system fails,there is currently no method or tool that can effectively support the location of the root cause of failure.To this end,the paper first proposed that all business log generated on all of the ser-vices by a single request can be associated by the trace information.And on this basis,this paper studied the method of microservice system debugging based on log visualization analysis.Firstly,the model of microservice log is defined.So the data information required for log visualization analysis can be specified.Then five kinds of visual debug strategies are summarized to support the location of four kinds of typical microservice fault’s root cause.The four kinds of microservice faults are ordinary fault with exceptions,logical fault with no exceptions,fault caused by unexpected service asynchronous invocation sequences and faults caused by service multi-instances.The strategies include single trace with log information,comparison of different traces,service asynchronous invocation analysis,service multi-instances analysis and trace segmentation.Among them,in order to realize service asynchronous invocation analysis and service multi-instances analysis,this paper designed two algorithms.At the same time,a prototype tool named LogVisualization was designed and implemented.LogVisualization can collect log information,trace data,nodes information and service instance information of the cluster,generated by the microservice system runtime.It can associate the business log with trace information by less code intrusion.And it supports users to use five strategies for visual debug.Finally,the prototype tool is applied to the actual micro-service system.Compared with the existing tools (Zipkin+ELK),the usefulness and effectiveness of prototype tool in the root location of four micro-service faults are verified.

Key words: Debugging, Fault, Log, Microservice, Trace, Visualization

中图分类号: 

  • TP311
[1]JAMES L,MARTIN F.“Microservices” [EB/OL].[2018-715].https://martinfowler.com/articles/microservices.html.
[2]ZIMMERMANN O.Microservices tenets[J].Computer Science-Research and Development,2017,32(3/4):301-310.
[3]FRANCESCO P D,MALAVOLTA I,LAGO P.Research onArchitecting Microservices:Trends,Focus,and Potential for Industrial Adoption[C]∥IEEE International Conference on Software Architecture.2017:21-30.
[4]HEORHIADI V,RAJAGOPALAN S,JAMJOOM H,et al.Gremlin:Systematic Resilience Testing of Microservices[C]∥IEEE International Conference on Distributed Computing Systems.2016:57-66.
[5]ZHENG H,LI D,LIANG B,et al.Automated Test Input Generation for Android:Towards Getting There in an Industrial Case[C]∥IEEE/ACM International Conference on Software Engineering:Software Engineering in Practice Track.2017:253-262.
[6]RUSLAN M.Microservices at Netflix Scale[EB/OL].[2018-715].https://gotocon.com/dl/goto-amsterdam-2016/slides/Rus-lanMeshenberg_MicroservicesAtNetflixScaleFirstPrinciplesTrade-offsLessonsLearned.pdf.
[7]JAMSHIDI P,PAHL C,MENDONCA N C,et al.Microservices:The Journey So Far and Challenges Ahead[J].IEEE Software,2018,35(3):24-35.
[8]ZHOU X,PENG X,XIE T,et al.Benchmarking Microservice Systems for Software Engineering Research[C]∥Proceedings of International Conference on Software Engineering:Companion Proceeedings.2018:323-324.
[9]DO N H,DO T V,XUAN T T,et al.A scalable routing mechanism for stateful microservices[C]∥Innovations in Clouds,Internet & Networks.2017:72-78.
[10]SHASHA M.Application Delivery Service Challenges in Microservices-based Applications[EB/OL].[2018-7-21].http://www.thefabricnet.com/application-delivery-service-challenges-in-microservices-based-applications/.
[11]ZHOU X,PENG X,XIE T,et al.Delta debugging microservice systems[C]∥Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.2018:802-807.
[12]ZELLER A,HILDEBR R.Simplifying and Isolating Failure-Inducing Input[J].IEEE Transactions on Software Engineering,2002,28(2):183-200.
[13]ZIPKIN.Zipkin[EB/OL].[2018-7-20].https://zipkin.io/.
[14]JAEGER.Jaeger[EB/OL].[2018-7-20].https://www.jaegertracing.io/.
[15]ELK.ELK Stack[EB/OL].[2018-7-20].https://www.elastic.co/elk-stack.
[16]SHIVIZ.Shiviz[EB/OL].[2018-7-20].https://bestchai.bitbucket.io/shiviz/.
[17]PAHL C,JAMSHIDI P.Microservices:A Systematic MappingStudy[C]∥International Conference on Cloud Computing & Services Science.2016:137-146.
[18]HASSAN S,BAHSOON R.Microservices and Their DesignTrade-Offs:A Self-Adaptive Roadmap[C]∥IEEE International Conference on Services Computing.2016:813-818.
[19]CAMARGO A D,SALVADORI I,MELLO R D S,et al.An architecture to automate performance tests on microservices[C]∥International Conference on Information Integration & Web-based Applications & Services.2016:422-429.
[20]HEINRICH R,HOORN A V,KNOCHE H,et al.Performance Engineering for Microservices:Research Challenges and Directions[C]∥International Conference on Performance Enginee-ring Companion.2017:223-226.
[21]SCHERMANN G,SCHÖNI D,LEITNER P,et al.Bifrost:Supporting Continuous Deployment with Automated Enactment of Multi-Phase Live Testing Strategies[C]∥International Middleware Conference.2016:12.
[22]HASSELBRING W.Microservices for scalability:keynote talkabstract[C]∥Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering.2016:133-134.
[23]KLOCK S,VAN DER WERF J M E M,GUELEN J P,et al.Workload-Based Clustering of Coherent Feature Sets in Microservice Architectures[C]∥IEEE International Conference on Software Architecture.2017:11-20.
[24]LEITNERP,JURGEN C,EMANUEL S.Modelling and managing deployment costs of microservice-based cloud applications[C]∥International Conference on Utility & Cloud Computing.2016:165-174.
[25]LIN J,LIN L C,HUANG S.Migrating web applications toclouds with microservice architectures[C]∥International Conference on Applied System Innovation (ICASI).2016:1-4.
[26]百度百科.可视化[EB/OL].[2018-7-25].https://baike.baidu.com/item/%E5%8F%AF%E8%A7%86%E5%8C%96.
[27]AntV.数据可视化概览[EB/OL].[2018-7-25].https://antv.alipay.com/zh-cn/vis/blog/vis-introduce.html.
[28]YANG H.The Research and Implementation of Visual Log Analysis System[D].Xi’an:Xidian University,2010.(in Chinese)
杨华.可视化日志分析系统的研究与实现[D].西安:西安电子科技大学,2010.
[29]ZHANG S,ZHAO J.Research advances on network securitylogs visualization[J].Journal of Frontiers of Computer Science and Technology,2018(5):681-696.(in Chinese)
张胜,赵珏.网络安全日志可视化分析研究进展[J].计算机科学与探索,2018(5):681-696.
[30]HU G.Research and Implementation of The Network Security Log Data Analysis System[D].Beijing:Beijing University of Posts and Telecommunications,2012.(in Chinese)
胡钢.网络安全日志数据可视分析系统的研究与实现[D].北京:北京邮电大学,2012.
[31]DONG Z W.Design and Implementation of Visual Analysis and Monitoring System for National Domain Name Log[D].Beijing:University of Chinese Academy of Sciences(School of Enginee-ring Science),2014.(in Chinese)
董再旺.国家域名日志可视化分析监控系统设计与实现[D].北京:中国科学院大学,2014.
[32]CHEN W W,WU K C.Research and application of massiveDNS log data analysis and visualization[J].Application Research of Computers,2016,33(2):335-338.(in Chinese)
陈文文,吴开超.海量域名日志数据分析与可视化研究及应用[J].计算机应用研究,2016,33(2):335-338.
[33]ZHAO G.Visualization Analysis Research of Website Usability and User Behavior Based on Web Server Log[D].Taiyuan:Shanxi University,2007.(in Chinese)
赵刚.基于Web日志的网站可用性及用户行为可视化分析方法研究[D].太原:山西大学,2007.
[34]LU D.Implementation and Design of OPAC Search Log Visualize System[J].Modern Computer,2016(12):67-70.(in Chinese)
鲁丹.OPAC搜索日志可视化系统的设计与实现[J].现代计算机,2016(12):67-70.
[35]APACHE FLUME.Welcome to Apache Flume[EB/OL].[2018-7-26].https://flume.apache.org/.
[36]GITHUB.facebookarchive/scribe[EB/OL].[2018-7-26].https://github.com/facebookarchive/scribe/wiki.
[37]LIU K.Architecture analysis and application of massive data log system[J].Journal of Changchun University of Technology,2016,37(6):581-586.(in Chinese)
刘锴.海量数据日志系统架构分析与应用[J].长春工业大学学报(自然科学版),2016,37(6):581-586.
[38]CHEN J J,LIU H H.Distributed ELK log analysis systembased on Kubernetes[J].Electronic Technology & Software Engineering,2016(15):211-212.(in Chinese)
陈建娟,刘行行.基于Kubernetes的分布式ELK日志分析系统[J].电子技术与软件工程,2016(15):211-212.
[39]BESCHASTNIKH I,WANG P,BRUN Y,et al.Debugging Distributed Systems[J].Queue,2016,14(2):91-110.
[40]OPENTRACING.The OpenTracing Semantic Specification[EB/OL].[2018-7-26].https://opentracing.io/specification/.
[41]SIGELMAN B H,BARROSO L A,BURROWS M.Dapper,a Large-Scale Distributed Systems Tracing Infrastructure[EB/OL].[2018-7-26].https://storage.googleapis.com/pub-tools-public-publication-data/pdf/36356.pdf.
[42]JAEGER.Introduction[EB/OL].[2018-7-26].https://www.jaegertracing.io/docs/.
[43]SEMATEXT.Jaeger vs Zipkin-OpenTracing Distributed Tracers[EB/OL].[2018-7-28].https://sematext.com/blog/jaeger-vs-zipkin-opentracing-distributed-tracers/.
[44]INSTANA[EB/OL].[2018-7-28].https://www.instana.com/.
[45]LUXBURG U V.A tutorial on spectral clustering[J].Statistics &Computing,2007,17(4):395-416.
[46]TECHBEACON.3 reasons why you should always run microservices apps in containers[EB/OL].[2018-7-28].https://techbeacon.com/3-reasons-why-you-should-always-run-microserv-ices-apps-containers.
[47]BUOYANT.What’s a service mesh? And why do I need one?[EB/OL].[2018-7-28].https://blog.buoyant.io/2017/04/25/whats-a-service-mesh-and-why-do-i-need-one/.
[48]KUBERNETES[EB/OL].[2018-7-28].https://kubernetes.io/.
[49]ISTIO[EB/OL].[2018-7-28].https://istio.io/.
[50]SUGIYAMA K,TAGAWA S,TODA M.Methods for VisualUnderstanding of Hierarchical System Structures[J].IEEE Transactions on Systems,Man and Cybernetics,1981,11(2):109-125.
[51]沪江网.以红色为主的色彩搭配,值得推荐[EB/OL].[2018-7-28].https://www.hujiang.com/fyuid_s/p886132/.
[52]新浪博客.用户体验设计之圆角和直角[EB/OL].[2018-7-28].http://blog.sina.com.cn/s/blog_5d7170af0101dnpk.html.
[53]GITHUB.Dagrejs[EB/OL].[2018-7-29].https://github.com/dagrejs/dagre-d3.
[54]GITHUB.Rogen319/logvisualization_trainticket[EB/OL].[2018-7-28].https://github.com/Rogen319/logvisualization_trainticket.git.
[1] 杨啸, 王翔坤, 胡浩, 朱敏.
面向设备状态监测的可视化技术综述
Survey on Visualization Technology for Equipment Condition Monitoring
计算机科学, 2022, 49(7): 89-99. https://doi.org/10.11896/jsjkx.210900167
[2] 陈慧嫔, 王琨, 杨恒, 郑智捷.
蓝舌病毒基因组序列多元概率特征可视化分析
Visual Analysis of Multiple Probability Features of Bluetongue Virus Genome Sequence
计算机科学, 2022, 49(6A): 27-31. https://doi.org/10.11896/jsjkx.210300129
[3] 朱敏, 梁朝晖, 姚林, 王翔坤, 曹梦琦.
学术引用信息可视化方法综述
Survey of Visualization Methods on Academic Citation Information
计算机科学, 2022, 49(4): 88-99. https://doi.org/10.11896/jsjkx.210300219
[4] 耿海军, 王威, 尹霞.
基于混合软件定义网络的单节点故障保护方法
Single Node Failure Routing Protection Algorithm Based on Hybrid Software Defined Networks
计算机科学, 2022, 49(2): 329-335. https://doi.org/10.11896/jsjkx.210100051
[5] 李家振, 纪庆革, 朱泳霖.
分子可视化中的光线追踪棋盘渲染
Ray Tracing Checkerboard Rendering in Molecular Visualization
计算机科学, 2022, 49(2): 134-141. https://doi.org/10.11896/jsjkx.210900126
[6] 李家振, 纪庆革.
动态低采样环境光遮蔽的实时光线追踪分子渲染
Dynamic Low-sampling Ambient Occlusion Real-time Ray Tracing for Molecular Rendering
计算机科学, 2022, 49(1): 175-180. https://doi.org/10.11896/jsjkx.210200042
[7] 骆菁菁, 唐卫贞, 丁继婷.
基于皮尔逊系数的管制仿真训练数据独立化与因子分析下的数据可视化研究
Research of ATC Simulator Training Values Independence Based on Pearson Correlation Coefficient and Study of Data Visualization Based on Factor Analysis
计算机科学, 2021, 48(6A): 623-628. https://doi.org/10.11896/jsjkx.210200021
[8] 曾友渝, 谢强.
基于改进RNN和VAR的船舶设备故障预测方法
Fault Prediction Method Based on Improved RNN and VAR for Ship Equipment
计算机科学, 2021, 48(6): 184-189. https://doi.org/10.11896/jsjkx.200700117
[9] 雷剑梅, 曾令秋, 牟洁, 陈立东, 王淙, 柴勇.
基于整车EMC标准测试和机器学习的反向诊断方法
Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning
计算机科学, 2021, 48(6): 190-195. https://doi.org/10.11896/jsjkx.200700204
[10] 徐佳庆, 胡小月, 唐付桥, 王强, 何杰.
基于随机森林的高性能互连网络阻塞故障检测
Detecting Blocking Failure in High Performance Interconnection Networks Based on Random Forest
计算机科学, 2021, 48(6): 246-252. https://doi.org/10.11896/jsjkx.201200142
[11] 徐建波, 舒辉, 康绯.
反向调试技术研究综述
Summary on Reverse Debugging Technology
计算机科学, 2021, 48(5): 9-15. https://doi.org/10.11896/jsjkx.200600152
[12] 苏庆, 黎智洲, 刘添添, 吴伟民, 黄剑锋, 李小妹.
程序调试中的树形结构演变可视化模型
Tree Structure Evaluation Visualization Model for Program Debugging
计算机科学, 2021, 48(5): 68-74. https://doi.org/10.11896/jsjkx.200100133
[13] 张航, 唐聃, 蔡红亮.
分布式存储系统中的预测式纠删码研究
Study on Predictive Erasure Codes in Distributed Storage System
计算机科学, 2021, 48(5): 130-139. https://doi.org/10.11896/jsjkx.200300124
[14] 冯凯, 马鑫玉.
(n,k)-冒泡排序网络的子网络可靠性
Subnetwork Reliability of (n,k)-bubble-sort Networks
计算机科学, 2021, 48(4): 43-48. https://doi.org/10.11896/jsjkx.201100139
[15] 鄂海红, 张田宇, 宋美娜.
基于Web的数据可视化图表渲染优化方法
Web-based Data Visualization Chart Rendering Optimization Method
计算机科学, 2021, 48(3): 119-123. https://doi.org/10.11896/jsjkx.200600038
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!