当前位置: X-MOL 学术Automat. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Experience report on applying software analytics in incident management of online service
Automated Software Engineering ( IF 2.0 ) Pub Date : 2017-07-01 , DOI: 10.1007/s10515-017-0218-1
Jian-Guang Lou , Qingwei Lin , Rui Ding , Qiang Fu , Dongmei Zhang , Tao Xie

As online services become more and more popular, incident management has become a critical task that aims to minimize the service downtime and to ensure high quality of the provided services. In practice, incident management is conducted through analyzing a huge amount of monitoring data collected at runtime of a service. Such data-driven incident management faces several significant challenges such as the large data scale, complex problem space, and incomplete knowledge. To address these challenges, we carried out 2-year software-analytics research where we designed a set of novel data-driven techniques and developed an industrial system called the Service Analysis Studio (SAS) targeting real scenarios in a large-scale online service of Microsoft. SAS has been deployed to worldwide product datacenters and widely used by on-call engineers for incident management. This paper shares our experience about using software analytics to solve engineers pain points in incident management, the developed data-analysis techniques, and the lessons learned from the process of research development and technology transfer.

中文翻译:

在线服务事件管理中应用软件分析的经验报告

随着在线服务变得越来越流行,事件管理已成为一项关键任务,旨在最大限度地减少服务停机时间并确保所提供服务的高质量。在实践中,事件管理是通过分析服务运行时收集的大量监控数据来进行的。这种数据驱动的事件管理面临着几个重大挑战,例如大数据规模、复杂的问题空间和不完整的知识。为了应对这些挑战,我们进行了为期 2 年的软件分析研究,我们设计了一套新颖的数据驱动技术,并开发了一个名为 Service Analysis Studio (SAS) 的工业系统,针对大规模在线服务中的真实场景微软。SAS 已部署到全球产品数据中心,并被值班工程师广泛用于事件管理。本文分享了我们使用软件分析解决事件管理中工程师痛点的经验、已开发的数据分析技术以及从研究开发和技术转让过程中汲取的经验教训。
更新日期:2017-07-01
down
wechat
bug