Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release
arXiv - CS - Machine Learning Pub Date : 2021-02-16 , DOI: arxiv-2103.02683
Liam Fowl, Ping-yeh Chiang, Micah Goldblum, Jonas Geiping, Arpit Bansal, Wojtek Czaja, Tom Goldstein

Large organizations such as social media companies continually release data, for example user images. At the same time, these organizations leverage their massive corpora of released data to train proprietary models that give them an edge over their competitors. These two behaviors can be in conflict as an organization wants to prevent competitors from using their own data to replicate the performance of their proprietary models. We solve this problem by developing a data poisoning method by which publicly released data can be minimally modified to prevent others from train-ing models on it. Moreover, our method can be used in an online fashion so that companies can protect their data in real time as they release it.We demonstrate the success of our approach onImageNet classification and on facial recognition.

中文翻译：

防止未经授权使用专有数据：中毒以释放安全的数据集

诸如社交媒体公司之类的大型组织不断发布数据，例如用户图像。同时，这些组织利用其庞大的已发布数据集来训练专有模型，从而使其在竞争者中脱颖而出。当组织希望阻止竞争对手使用自己的数据来复制其专有模型的性能时，这两种行为可能会发生冲突。我们通过开发一种数据中毒方法来解决此问题，该方法可以对公开发布的数据进行最小程度的修改，以防止其他人对其进行训练。而且，我们的方法可以以在线方式使用，以便公司在发布数据时可以实时保护其数据。我们证明了我们的方法在ImageNet分类和面部识别方面的成功。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文