On Misbehaviour and Fault Tolerance in Machine Learning Systems,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Misbehaviour and Fault Tolerance in Machine Learning Systems
arXiv - CS - Software Engineering Pub Date : 2021-09-16 , DOI: arxiv-2109.07989
Lalli Myllyaho, Mikko Raatikainen, Tomi Männistö, Jukka K. Nurminen, Tommi Mikkonen

Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability, such as reliability and security, of these systems. Systems can be tested and monitored, but this does not provide protection against faults and failures in adapted ML systems themselves. We studied software designs that aim at introducing fault tolerance in ML systems so that possible problems in ML components of the systems can be avoided. The research was conducted as a case study, and its data was collected through five semi-structured interviews with experienced software architects. We present a conceptualisation of the misbehaviour of ML systems, the perceived role of fault tolerance, and the designs used. Common patterns to incorporating ML components in design in a fault tolerant fashion have started to emerge. ML models are, for example, guarded by monitoring the inputs and their distribution, and enforcing business rules on acceptable outputs. Multiple, specialised ML models are used to adapt to the variations and changes in the surrounding world, and simpler fall-over techniques like default outputs are put in place to have systems up and running in the face of problems. However, the general role of these patterns is not widely acknowledged. This is mainly due to the relative immaturity of using ML as part of a complete software system: the field still lacks established frameworks and practices beyond training to implement, operate, and maintain the software that utilises ML. ML software engineering needs further analysis and development on all fronts.

中文翻译：

机器学习系统中的不当行为和容错

机器学习 (ML) 为我们提供了许多机会，使 ML 系统能够适应新的情况和环境。同时，这种适应性增加了有关这些系统的运行时产品质量或可靠性（例如可靠性和安全性）的不确定性。可以对系统进行测试和监控，但这并不能提供针对自适应 ML 系统本身的故障和故障的保护。我们研究了旨在在 ML 系统中引入容错的软件设计，以便可以避免系统的 ML 组件中可能出现的问题。该研究是作为案例研究进行的，其数据是通过与经验丰富的软件架构师进行的五次半结构化访谈收集的。我们提出了机器学习系统的不当行为、容错的感知作用以及所使用的设计的概念化。以容错方式将 ML 组件合并到设计中的常见模式已经开始出现。例如，ML 模型通过监控输入及其分布来保护，并对可接受的输出执行业务规则。多个专门的 ML 模型用于适应周围世界的变化和变化，并采用更简单的故障转移技术（如默认输出）来让系统在遇到问题时启动并运行。然而，这些模式的一般作用并未得到广泛承认。这主要是由于将 ML 作为完整软件系统的一部分使用相对不成熟：该领域仍然缺乏既定框架和实践，超出了实施、操作和维护利用 ML 的软件的培训。

更新日期：2021-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>