Remote explainability faces the bouncer problem,Nature Machine Intelligence

当前位置： X-MOL 学术 › Nat. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Remote explainability faces the bouncer problem
Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2020-08-24 , DOI: 10.1038/s42256-020-0216-z
Erwan Le Merrer , Gilles Trédan

The concept of explainability is envisioned to satisfy society’s demands for transparency about machine learning decisions. The concept is simple: like humans, algorithms should explain the rationale behind their decisions so that their fairness can be assessed. Although this approach is promising in a local context (for example, the model creator explains it during debugging at the time of training), we argue that this reasoning cannot simply be transposed to a remote context, where a model trained by a service provider is only accessible to a user through a network and its application programming interface. This is problematic, as it constitutes precisely the target use case requiring transparency from a societal perspective. Through an analogy with a club bouncer (who may provide untruthful explanations upon customer rejection), we show that providing explanations cannot prevent a remote service from lying about the true reasons leading to its decisions. More precisely, we observe the impossibility of remote explainability for single explanations by constructing an attack on explanations that hides discriminatory features from the querying user. We provide an example implementation of this attack. We then show that the probability that an observer spots the attack, using several explanations for attempting to find incoherences, is low in practical settings. This undermines the very concept of remote explainability in general.

中文翻译：

远程解释性面临保镖问题

可解释性的概念可以满足社会对机器学习决策透明性的需求。这个概念很简单：像人类一样，算法应解释其决策背后的原理，以便可以评估其公平性。尽管这种方法在本地环境中很有希望（例如，模型创建者在训练时的调试过程中对其进行了解释），但我们认为，这种推理不能简单地转换为远程环境，在该环境中，服务提供商训练的模型是用户只能通过网络及其应用程序编程接口访问。这是有问题的，因为它正好构成了从社会角度来看需要透明的目标用例。类似于俱乐部保镖（他可能会在客户拒绝时提供不正确的解释），我们表明，提供解释不能阻止远程服务撒谎导致其做出决定的真正原因。更确切地说，我们通过对解释隐藏攻击者隐藏的歧视性特征的攻击，来观察远程解释性对于单个解释的可能性。我们提供了此攻击的示例实现。然后，我们使用实践中的几种解释来发现不连贯的观察者发现攻击者发现攻击的概率在实际环境中较低。通常，这破坏了远程解释性的概念。我们通过对解释进行掩盖攻击以隐藏对查询用户的歧视性特征，从而观察到无法对单个解释进行远程解释的可能性。我们提供了此攻击的示例实现。然后，我们使用实践中的几种解释来发现观察者发现攻击的概率在实际环境中较低。通常，这破坏了远程解释性的概念。我们通过对解释进行掩盖攻击以隐藏对查询用户的歧视性特征，从而观察到无法对单个解释进行远程解释的可能性。我们提供了此攻击的示例实现。然后，我们使用实践中的几种解释来发现不连贯的观察者发现攻击者发现攻击的概率在实际环境中较低。通常，这破坏了远程解释性的概念。

更新日期：2020-08-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文