Learning New Words from Keystroke Data with Local Differential Privacy,IEEE Transactions on Knowledge and Data Engineering

当前位置： X-MOL 学术 › IEEE Trans. Knowl. Data. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning New Words from Keystroke Data with Local Differential Privacy
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-03-01 , DOI: 10.1109/tkde.2018.2885749
Sungwook Kim , Hyejin Shin , Chunghun Baek , Soohyung Kim , Junbum Shin

Keystroke data collected from smart devices includes various sensitive information about users. Collecting and analyzing such data raise serious privacy concerns. Google and Apple have recently applied local differential privacy (LDP) to address privacy issue on learning new words from users’ keystroke data. However, these solutions require multiple LDP reports for a single word, which result in inefficient use of privacy budget and high computational cost. In this paper, we develop a novel algorithm for learning new words under LDP. Unlike the existing solutions, the proposed method generates only one LDP report for a single word. This enables the proposed method to use full privacy budget for generating a report and brings the benefit that the proposed method provides better utility at the same privacy degree than the existing methods. In our algorithm, each user appends a hash value to new word and sends only one LDP report of an

$n$

-gram selected randomly from the string packed by each new word and its hash value. The server then decodes frequent

$n$

-grams at each position of the string and discovers the candidate words by exploring graph-theoretic links between

$n$

-grams and checking integrity of candidates with hash values. Frequencies of frequent new words discovered are estimated from distribution estimates of

$n$

-grams by robust regression. We theoretically show that our algorithm can recover popular new words even though the server does not know the domain of the raw data. In addition, we theoretically and empirically demonstrate that our algorithm achieves higher accuracy compared to the existing solutions.

中文翻译：

使用局部差分隐私从击键数据中学习新词

从智能设备收集的击键数据包括有关用户的各种敏感信息。收集和分析此类数据会引起严重的隐私问题。谷歌和苹果最近应用本地差分隐私（LDP）来解决从用户击键数据中学习新词的隐私问题。然而，这些解决方案需要一个词的多个 LDP 报告，这导致隐私预算的低效使用和高计算成本。在本文中，我们开发了一种在 LDP 下学习新词的新算法。与现有解决方案不同，所提出的方法仅针对单个单词生成一个 LDP 报告。这使得所提出的方法能够使用完整的隐私预算来生成报告，并且带来的好处是所提出的方法在相同的隐私程度下提供比现有方法更好的效用。

$n$

-gram 从由每个新单词打包的字符串中随机选择，及其哈希值。然后服务器解码频繁

$n$

-grams 在字符串的每个位置，并通过探索之间的图论联系来发现候选词

$n$

-grams 并使用哈希值检查候选的完整性。发现的频繁新词的频率是根据分布估计来估计的

$n$

-grams 通过稳健回归。我们理论上表明，即使服务器不知道原始数据的域，我们的算法也可以恢复流行的新词。此外，我们从理论上和经验上证明，与现有解决方案相比，我们的算法实现了更高的准确性。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南