当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Limit Data Collection via Scaling Laws: Data Minimization Compliance in Practice
arXiv - CS - Information Retrieval Pub Date : 2021-07-16 , DOI: arxiv-2107.08096
Divya Shanmugam, Samira Shabanian, Fernando Diaz, Michèle Finck, Asia Biega

Data minimization is a legal obligation defined in the European Union's General Data Protection Regulation (GDPR) as the responsibility to process an adequate, relevant, and limited amount of personal data in relation to a processing purpose. However, unlike fairness or transparency, the principle has not seen wide adoption for machine learning systems due to a lack of computational interpretation. In this paper, we build on literature in machine learning and law to propose the first learning framework for limiting data collection based on an interpretation that ties the data collection purpose to system performance. We formalize a data minimization criterion based on performance curve derivatives and provide an effective and interpretable piecewise power law technique that models distinct stages of an algorithm's performance throughout data collection. Results from our empirical investigation offer deeper insights into the relevant considerations when designing a data minimization framework, including the choice of feature acquisition algorithm, initialization conditions, as well as impacts on individuals that hint at tensions between data minimization and fairness.

中文翻译:

学习通过扩展法则限制数据收集:实践中的数据最小化合规性

数据最小化是欧盟通用数据保护条例 (GDPR) 中定义的一项法律义务,即根据处理目的处理足够、相关且有限数量的个人数据的责任。然而,与公平或透明不同,由于缺乏计算解释,该原则并未被机器学习系统广泛采用。在本文中,我们以机器学习和法律方面的文献为基础,基于将数据收集目的与系统性能联系起来的解释,提出了第一个限制数据收集的学习框架。我们形式化了基于性能曲线导数的数据最小化标准,并提供了一种有效且可解释的分段幂律技术,可以对算法的不同阶段进行建模 在整个数据收集过程中的性能。我们的实证调查结果为设计数据最小化框架时的相关考虑因素提供了更深入的见解,包括特征采集算法的选择、初始化条件,以及暗示数据最小化和公平之间紧张关系的对个人的影响。
更新日期:2021-07-20
down
wechat
bug