当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study
IEEE Transactions on Software Engineering ( IF 7.4 ) Pub Date : 2018-01-01 , DOI: 10.1109/tse.2018.2887384
Xiang Zhou , Xin Peng , Tao Xie , Jun Sun , Chao Ji , Wenhai Li , Dan Ding

The complexity and dynamism of microservice systems pose unique challenges to a variety of software engineering tasks such as fault analysis and debugging. In spite of the prevalence and importance of microservices in industry, there is limited research on the fault analysis and debugging of microservice systems. To fill this gap, we conduct an industrial survey to learn typical faults of microservice systems, current practice of debugging, and the challenges faced by developers in practice. We then develop a medium-size benchmark microservice system (being the largest and most complex open source microservice system within our knowledge) and replicate 22 industrial fault cases on it. Based on the benchmark system and the replicated fault cases, we conduct an empirical study to investigate the effectiveness of existing industrial debugging practices and whether they can be further improved by introducing the state-of-the-art tracing and visualization techniques for distributed systems. The results show that the current industrial practices of microservice debugging can be improved by employing proper tracing and visualization techniques and strategies. Our findings also suggest that there is a strong need for more intelligent trace analysis and visualization, e.g., by combining trace visualization and improved fault localization, and employing data-driven and learning-based recommendation for guided visual exploration and comparison of traces.

中文翻译:

微服务系统故障分析与调试:行业调查、基准系统与实证研究

微服务系统的复杂性和动态性对各种软件工程任务(如故障分析和调试)提出了独特的挑战。尽管微服务在业界盛行和重要,但对微服务系统的故障分析和调试的研究还很有限。为了填补这一空白,我们进行了行业调查,了解微服务系统的典型故障、当前的调试实践以及开发人员在实践中面临的挑战。然后我们开发了一个中等规模的基准微服务系统(是我们所知最大、最复杂的开源微服务系统),并在其上复制了22个工业故障案例。基于基准系统和复制的故障案例,我们进行了一项实证研究,以调查现有工业调试实践的有效性,以及是否可以通过为分布式系统引入最先进的跟踪和可视化技术来进一步改进它们。结果表明,通过采用适当的跟踪和可视化技术和策略,可以改进当前微服务调试的工业实践。我们的研究结果还表明,强烈需要更智能的轨迹分析和可视化,例如,通过将轨迹可视化和改进的故障定位相结合,并采用数据驱动和基于学习的推荐来引导可视化探索和轨迹比较。结果表明,通过采用适当的跟踪和可视化技术和策略,可以改进当前微服务调试的工业实践。我们的研究结果还表明,强烈需要更智能的轨迹分析和可视化,例如,通过将轨迹可视化和改进的故障定位相结合,并采用数据驱动和基于学习的推荐来引导可视化探索和轨迹比较。结果表明,通过采用适当的跟踪和可视化技术和策略,可以改进当前微服务调试的工业实践。我们的研究结果还表明,强烈需要更智能的轨迹分析和可视化,例如,通过将轨迹可视化和改进的故障定位相结合,并采用数据驱动和基于学习的推荐来引导可视化探索和轨迹比较。
更新日期:2018-01-01
down
wechat
bug