当前位置: X-MOL 学术Int. Data Priv. Law › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Practical approaches to big data privacy over time
International Data Privacy Law ( IF 2.500 ) Pub Date : 2018-02-01 , DOI: 10.1093/idpl/ipx027
Micah Altman 1 , Alexandra Wood 2 , David R O’Brien 2 , Urs Gasser 3
Affiliation  

Increasingly, governments and businesses are collecting, analyzing, and sharing detailed information about individuals over long periods of time. Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, behavior, and relationships and advance the state of science, public policy, and innovation. At the same time, the collection and use of fine-grained personal data over time is associated with significant risks to individuals, groups, and society at large. In this article, we examine a range of longterm data collections, conducted by researchers in social science, in order to identify the characteristics of these programs that drive their unique sets of risks and benefits. We also examine the practices that have been established by social scientists to protect the privacy of data subjects in light of the challenges presented in long-term studies. We argue that many uses of big data, across academic, government, and industry settings, have characteristics similar to those of traditional long-term research studies. In this article, we discuss the lessons that can be learned from longstanding data management practices in research and potentially applied in the context of newly emerging data sources and uses. 1. Corporations and governments are collecting data more frequently, and collecting, storing, and using it for longer periods. Commercial and government actors are collecting, storing, analyzing, and sharing increasingly greater quantities of personal information about individuals over progressively long periods of time. Advances in technology, such as the proliferation of GPS receivers and highly-accurate sensors embedded in consumer devices, are leading to new sources of data that offer data at more frequent intervals and at finer levels of detail. New methods of data storage such as cloud storage processes are more efficient and less costly than previous technologies and are contributing to large amounts of data being retained for longer periods of time. Powerful analytical capabilities, including emerging machine learning techniques, are enabling 1 The authors describe contributions to this essay using a standard taxonomy. See Liz Allen, Jo Scott, Amy Brand, Marjorie Hlava & Micah Altman, Publishing: Credit Where Credit Is Due, 508 NATURE 312, 312–13 (2014). Altman provided the core formulation of the essay’s goals and aims, and Wood led the writing of the original manuscript. All authors contributed to conceptualization through additional ideas and through commentary, review, editing, and revision. This material is based upon work supported by the National Science Foundation under Grant No. 1237235 and the Alfred P. Sloan Foundation. The manuscript was prepared for the Identifiability: Policy and Practical Solutions for Anonymization and Pseudonymization workshop, hosted by the Brussels Privacy Hub of the Vrije Universiteit Brussel and the Future of Privacy Forum in Brussels, Belgium, on November 8, 2016. The authors wish to thank their colleagues through the Privacy Tools for Sharing Research Data project at Harvard University for articulating ideas that underlie many of the conclusions drawn in this essay. 2 MIT Libraries, Massachusetts Institute for Technology, escience@mit.edu 3 Berkman Klein Center for Internet & Society, Harvard University, awood@cyber.harvard.edu 4 Berkman Klein Center for Internet & Society, Harvard University, dobrien@cyber.harvard.edu 5 Berkman Klein Center for Internet & Society, Harvard University, ugasser@cyber.harvard.edu 6 See PRESIDENT’S COUNCIL OF ADVISORS ON SCIENCE AND TECHNOLOGY, BIG DATA AND PRIVACY: A TECHNOLOGICAL PERSPECTIVE, Report to the President (May 2014), https://www.whitehouse.gov/sites/default/files/ microsites/ostp/PCAST/pcast_big_data_and_privacy_-_may_2014.pdf.

中文翻译:

随着时间的推移大数据隐私的实用方法

越来越多的政府和企业长期收集、分析和共享有关个人的详细信息。来自新来源的大量数据和用于大规模数据分析的新方法有望加深对人类特征、行为和关系的理解,并推动科学、公共政策和创新的发展。同时,随着时间的推移,细粒度个人数据的收集和使用会给个人、群体和整个社会带来重大风险。在本文中,我们研究了由社会科学研究人员进行的一系列长期数据收集,以确定这些项目的特征,这些特征会推动其独特的风险和收益。鉴于长期研究中面临的挑战,我们还研究了社会科学家为保护数据主体隐私而建立的做法。我们认为,在学术、政府和行业环境中,大数据的许多用途与传统的长期研究具有相似的特征。在本文中,我们讨论了可以从研究中的长期数据管理实践中吸取的经验教训,并可能应用于新兴数据源和用途的背景下。1. 企业和政府收集数据的频率越来越高,收集、存储和使用的时间越来越长。商业和政府行为者正在收集、存储、分析、并且在越来越长的时间内共享越来越多的个人信息。技术的进步,例如 GPS 接收器和嵌入在消费设备中的高精度传感器的普及,正在催生新的数据源,以更频繁的间隔和更精细的细节级别提供数据。新的数据存储方法(例如云存储过程)比以前的技术更高效且成本更低,并且有助于将大量数据保留更长时间。强大的分析能力,包括新兴的机器学习技术,正在支持 1 作者使用标准分类法描述了对本文的贡献。参见 Liz Allen、Jo Scott、Amy Brand、Marjorie Hlava 和 Micah Altman,出版:Credit Where Credit Is Due,508 自然 312、312-13(2014 年)。奥尔特曼提供了论文目标的核心表述,伍德领导了原稿的写作。所有作者都通过其他想法以及评论、审查、编辑和修订为概念化做出了贡献。本材料基于美国国家科学基金会 (National Science Foundation) 资助编号 1237235 和阿尔弗雷德·斯隆基金会 (Alfred P. Sloan Foundation) 支持的工作。该手稿是为 2016 年 11 月 8 日在比利时布鲁塞尔由布鲁塞尔自由大学的布鲁塞尔隐私中心和隐私论坛的未来主办的可识别性:匿名化和假名化的政策和实用解决方案研讨会准备的。作者希望通过哈佛大学的隐私工具共享研究数据项目感谢他们的同事阐明了作为本文得出的许多结论的基础的想法。2 麻省理工学院麻省理工学院图书馆,escience@mit.edu 3 哈佛大学伯克曼克莱因互联网与社会中心,awood@cyber.harvard.edu 4 哈佛大学伯克曼克莱因互联网与社会中心,dobrien@cyber.harvard .edu 5 哈佛大学伯克曼克莱因互联网与社会中心,ugasser@cyber.harvard.edu 6 参见总统科学技术、大数据和隐私顾问委员会:技术视角,给总统的报告(2014 年 5 月), https://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST​​/pcast_big_data_and_privacy_-_may_2014.pdf。
更新日期:2018-02-01
down
wechat
bug