Absolute Shapley Value,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Absolute Shapley Value
arXiv - CS - Databases Pub Date : 2020-03-23 , DOI: arxiv-2003.10076
Jinfei Liu

Shapley value is a concept in cooperative game theory for measuring the contribution of each participant, which was named in honor of Lloyd Shapley. Shapley value has been recently applied in data marketplaces for compensation allocation based on their contribution to the models. Shapley value is the only value division scheme used for compensation allocation that meets three desirable criteria: group rationality, fairness, and additivity. In cooperative game theory, the marginal contribution of each contributor to each coalition is a nonnegative value. However, in machine learning model training, the marginal contribution of each contributor (data tuple) to each coalition (a set of data tuples) can be a negative value, i.e., the accuracy of the model trained by a dataset with an additional data tuple can be lower than the accuracy of the model trained by the dataset only. In this paper, we investigate the problem of how to handle the negative marginal contribution when computing Shapley value. We explore three philosophies: 1) taking the original value (Original Shapley Value); 2) taking the larger of the original value and zero (Zero Shapley Value); and 3) taking the absolute value of the original value (Absolute Shapley Value). Experiments on Iris dataset demonstrate that the definition of Absolute Shapley Value significantly outperforms the other two definitions in terms of evaluating data importance (the contribution of each data tuple to the trained model).

中文翻译：

绝对沙普利值

沙普利值是合作博弈论中衡量每个参与者贡献的一个概念，以劳埃德·沙普利命名。Shapley 值最近已应用于数据市场，根据他们对模型的贡献进行薪酬分配。Shapley 值是用于薪酬分配的唯一满足三个理想标准的价值划分方案：群体合理性、公平性和可加性。在合作博弈论中，每个贡献者对每个联盟的边际贡献是一个非负值。然而，在机器学习模型训练中，每个贡献者（数据元组）对每个联盟（一组数据元组）的边际贡献可以是负值，即，由具有附加数据元组的数据集训练的模型的准确率可能低于仅由数据集训练的模型的准确率。在本文中，我们研究了在计算 Shapley 值时如何处理负边际贡献的问题。我们探索了三个哲学：1）取原始值（Original Shapley Value）；2) 取原值与零中的较大者(Zero Shapley Value)；3）取原值的绝对值（Absolute Shapley Value）。在 Iris 数据集上的实验表明，Absolute Shapley Value 的定义在评估数据重要性（每个数据元组对训练模型的贡献）方面明显优于其他两个定义。我们研究了在计算 Shapley 值时如何处理负边际贡献的问题。我们探索了三个哲学：1）取原始值（Original Shapley Value）；2) 取原值与零中的较大者(Zero Shapley Value)；3）取原值的绝对值（Absolute Shapley Value）。在 Iris 数据集上的实验表明，Absolute Shapley Value 的定义在评估数据重要性（每个数据元组对训练模型的贡献）方面明显优于其他两个定义。我们研究了在计算 Shapley 值时如何处理负边际贡献的问题。我们探索了三个哲学：1）取原始值（Original Shapley Value）；2) 取原值与零中的较大者(Zero Shapley Value)；3）取原值的绝对值（Absolute Shapley Value）。在 Iris 数据集上的实验表明，Absolute Shapley Value 的定义在评估数据重要性（每个数据元组对训练模型的贡献）方面明显优于其他两个定义。3）取原值的绝对值（Absolute Shapley Value）。在 Iris 数据集上的实验表明，Absolute Shapley Value 的定义在评估数据重要性（每个数据元组对训练模型的贡献）方面明显优于其他两个定义。3）取原值的绝对值（Absolute Shapley Value）。在 Iris 数据集上的实验表明，Absolute Shapley Value 的定义在评估数据重要性（每个数据元组对训练模型的贡献）方面明显优于其他两个定义。

更新日期：2020-03-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文