当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dealer: End-to-End Data Marketplace with Model-based Pricing
arXiv - CS - Databases Pub Date : 2020-03-29 , DOI: arxiv-2003.13103
Jinfei Liu

Data-driven machine learning (ML) has witnessed great successes across a variety of application domains. Since ML model training are crucially relied on a large amount of data, there is a growing demand for high quality data to be collected for ML model training. However, from data owners' perspective, it is risky for them to contribute their data. To incentivize data contribution, it would be ideal that their data would be used under their preset restrictions and they get paid for their data contribution. In this paper, we take a formal data market perspective and propose the first en\textbf{\underline{D}}-to-\textbf{\underline{e}}nd d\textbf{\underline{a}}ta marketp\textbf{\underline{l}}ace with mod\textbf{\underline{e}}l-based p\textbf{\underline{r}}icing (\emph{Dealer}) towards answering the question: \emph{How can the broker assign value to data owners based on their contribution to the models to incentivize more data contribution, and determine pricing for a series of models for various model buyers to maximize the revenue with arbitrage-free guarantee}. For the former, we introduce a Shapley value-based mechanism to quantify each data owner's value towards all the models trained out of the contributed data. For the latter, we design a pricing mechanism based on models' privacy parameters to maximize the revenue. More importantly, we study how the data owners' data usage restrictions affect market design, which is a striking difference of our approach with the existing methods. Furthermore, we show a concrete realization DP-\emph{Dealer} which provably satisfies the desired formal properties. Extensive experiments show that DP-\emph{Dealer} is efficient and effective.

中文翻译:

经销商:基于模型定价的端到端数据市场

数据驱动的机器学习 (ML) 在各种应用领域取得了巨大成功。由于 ML 模型训练极其依赖大量数据,因此对为 ML 模型训练收集的高质量数据的需求不断增长。然而,从数据所有者的角度来看,他们贡献数据是有风险的。为了激励数据贡献,理想的做法是在预设限制下使用他们的数据,并为他们的数据贡献获得报酬。在本文中,我们从正式的数据市场角度提出了第一个 en\textbf{\underline{D}}-to-\textbf{\underline{e}}nd d\textbf{\underline{a}}ta marketp \textbf{\underline{l}}ace with mod\textbf{\underline{e}}l-based p\textbf{\underline{r}}icing (\emph{Dealer}) 来回答这个问题:\emph{经纪人如何根据数据所有者对模型的贡献为数据所有者分配价值,以激励更多的数据贡献,并为各种模型购买者确定一系列模型的定价,以实现无套利保证的收入最大化}。对于前者,我们引入了一种基于 Shapley 价值的机制来量化每个数据所有者对所有从贡献的数据中训练出来的模型的价值。对于后者,我们设计了一种基于模型隐私参数的定价机制,以最大化收益。更重要的是,我们研究了数据所有者的数据使用限制如何影响市场设计,这是我们的方法与现有方法的显着差异。此外,我们展示了一个具体的实现 DP-\emph{Dealer},它可以证明满足所需的形式属性。
更新日期:2020-03-31
down
wechat
bug