Scalable Data Classification for Security and Privacy,arXiv - CS - Computers and Society

当前位置： X-MOL 学术 › arXiv.cs.CY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable Data Classification for Security and Privacy
arXiv - CS - Computers and Society Pub Date : 2020-06-25 , DOI: arxiv-2006.14109
Paulo Tanaka, Sameet Sapra, Nikolay Laptev

Content based data classification is an open challenge. Traditional Data Loss Prevention (DLP)-like systems solve this problem by fingerprinting the data in question and monitoring endpoints for the fingerprinted data. With a large number of constantly changing data assets in Facebook, this approach is both not scalable and ineffective in discovering what data is where. This paper is about an end-to-end system built to detect sensitive semantic types within Facebook at scale and enforce data retention and access controls automatically. The approach described here is our first end-to-end privacy system that attempts to solve this problem by incorporating data signals, machine learning, and traditional fingerprinting techniques to map out and classify all data within Facebook. The described system is in production achieving a 0.9+ average F2 scores across various privacy classes while handling a large number of data assets across dozens of data stores.

中文翻译：

用于安全和隐私的可扩展数据分类

基于内容的数据分类是一个开放的挑战。传统的数据丢失防护 (DLP) 类系统通过对相关数据进行指纹识别并监控指纹数据的端点来解决此问题。由于 Facebook 中有大量不断变化的数据资产，这种方法既不可扩展又无法发现数据在哪里。本文介绍了一个端到端系统，该系统旨在大规模检测 Facebook 中的敏感语义类型并自动实施数据保留和访问控制。这里描述的方法是我们的第一个端到端隐私系统，它试图通过结合数据信号、机器学习和传统指纹技术来映射和分类 Facebook 中的所有数据来解决这个问题。所描述的系统在生产中达到了 0。

更新日期：2020-07-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>