AI-enabled Automation for Completeness Checking of Privacy Policies,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AI-enabled Automation for Completeness Checking of Privacy Policies
arXiv - CS - Software Engineering Pub Date : 2021-06-10 , DOI: arxiv-2106.05688
Orlando Amaral, Sallam Abualhaija, Damiano Torre, Mehrdad Sabetzadeh, Lionel C. Briand

Technological advances in information sharing have raised concerns about data protection. Privacy policies contain privacy-related requirements about how the personal data of individuals will be handled by an organization or a software system (e.g., a web service or an app). In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). A prerequisite for GDPR compliance checking is to verify whether the content of a privacy policy is complete according to the provisions of GDPR. Incomplete privacy policies might result in large fines on violating organization as well as incomplete privacy-related software specifications. Manual completeness checking is both time-consuming and error-prone. In this paper, we propose AI-based automation for the completeness checking of privacy policies. Through systematic qualitative methods, we first build two artifacts to characterize the privacy-related provisions of GDPR, namely a conceptual model and a set of completeness criteria. Then, we develop an automated solution on top of these artifacts by leveraging a combination of natural language processing and supervised machine learning. Specifically, we identify the GDPR-relevant information content in privacy policies and subsequently check them against the completeness criteria. To evaluate our approach, we collected 234 real privacy policies from the fund industry. Over a set of 48 unseen privacy policies, our approach detected 300 of the total of 334 violations of some completeness criteria correctly, while producing 23 false positives. The approach thus has a precision of 92.9% and recall of 89.8%. Compared to a baseline that applies keyword search only, our approach results in an improvement of 24.5% in precision and 38% in recall.

中文翻译：

用于隐私政策完整性检查的人工智能自动化

信息共享方面的技术进步引起了人们对数据保护的担忧。隐私政策包含有关组织或软件系统（例如，网络服务或应用程序）将如何处理个人的个人数据的隐私相关要求。在欧洲，隐私政策须遵守通用数据保护条例 (GDPR)。GDPR 合规性检查的前提是根据 GDPR 的规定验证隐私政策的内容是否完整。不完整的隐私政策可能会导致违反组织的巨额罚款以及不完整的与隐私相关的软件规范。手动完整性检查既耗时又容易出错。在本文中，我们提出了基于 AI 的自动化来检查隐私政策的完整性。通过系统的定性方法，我们首先构建了两个工件来表征 GDPR 的隐私相关条款，即概念模型和一组完整性标准。然后，我们通过利用自然语言处理和监督机器学习的组合，在这些工件之上开发自动化解决方案。具体来说，我们会在隐私政策中识别与 GDPR 相关的信息内容，然后根据完整性标准对其进行检查。为了评估我们的方法，我们从基金行业收集了 234 份真实的隐私政策。在一组 48 个看不见的隐私策略中，我们的方法正确检测到了 334 次违反某些完整性标准的 300 次，同时产生了 23 次误报。因此，该方法具有 92.9% 的准确率和 89.8% 的召回率。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文