Privacy-preserving data mining for open government data from heterogeneous sources,Government Information Quarterly

当前位置： X-MOL 学术 › Government Information Quarterly › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Privacy-preserving data mining for open government data from heterogeneous sources
Government Information Quarterly ( IF 7.8 ) Pub Date : 2020-11-10 , DOI: 10.1016/j.giq.2020.101544
Jae-Seong Lee , Seung-Pyo Jun

Open data is a global movement with the potential to generate significant social and economic benefits. Policies on open government data (OGD) inspire the development of new and innovative services that government agencies may lack. The International Open Data Charter adequately describes the importance of data mining. Governments that have signed this charter should focus on the following areas—(i) data mining, (ii) linkage, and (iii) in-depth analysis, i.e., distribution of open data that is freely accessible for elaborate analysis using machine reading. However, a series of practical difficulties is observed in connection with the data mining of OGD for in-depth analysis. First, most OGD do not have identifiers to prevent privacy disclosure. Second, owing to the nature of siloed data, the data sharing and collection methods vary with respect to heterogeneous OGD, and administrative or institutional barriers need to be overcome. This has created a demand for a novel technical solution that applies micro-aggregation and distance-based record linkage to address the aforementioned issues. Thus, in this study, a method capable of integrating two or more de-identified OGDs into one dataset to enable OGD data mining is proposed. In addition, the proposed method allows users to adjust the privacy threshold level to determine an appropriate balance between privacy disclosure risk and data utility. The effectiveness of the method is evaluated in terms of several metrics via extensive experimentation. This study emphasizes the importance of the research on efficient utilization of already-published OGDs, which has been relatively neglected in the past. Further, it broadens the research area for privacy-preserving data mining by proposing a method capable of mining heterogeneous data even in the absence of identifiers.

中文翻译：

用于保护来自异构源的开放政府数据的隐私保护数据挖掘

开放数据是一项全球运动，有可能产生重大的社会和经济利益。开放政府数据（OGD）政策鼓励政府机构可能缺乏的创新服务。《国际开放数据宪章》充分描述了数据挖掘的重要性。签署本宪章的政府应将重点放在以下领域：（i）数据挖掘，（ii）链接和（iii）深入分析，即分发公开数据，这些数据可以免费使用机器阅读进行详尽的分析。但是，在进行OGD数据挖掘以进行深入分析时，发现了一系列实际困难。首先，大多数OGD没有标识符以防止隐私泄露。其次，由于孤立数据的性质，异构OGD的数据共享和收集方法各不相同，需要克服行政或机构障碍。这就产生了一种新颖的技术解决方案的需求，该解决方案应用了微聚合和基于距离的记录链接来解决上述问题。因此，在这项研究中，提出了一种能够将两个或多个去识别的OGD集成到一个数据集中以实现OGD数据挖掘的方法。另外，所提出的方法允许用户调整隐私阈值水平，以确定隐私公开风险与数据实用性之间的适当平衡。该方法的有效性通过广泛的实验根据几个指标进行评估。这项研究强调了对有效利用已经出版的OGD进行研究的重要性，在过去相对被忽略。此外，它提出了一种即使在没有标识符的情况下也能够挖掘异构数据的方法，从而拓宽了隐私保护数据挖掘的研究领域。

更新日期：2020-11-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文