当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
QuPiD Attack: Machine Learning-Based Privacy Quantification Mechanism for PIR Protocols in Health-Related Web Search
Scientific Programming Pub Date : 2020-07-14 , DOI: 10.1155/2020/8868686
Rafiullah Khan 1, 2 , Arshad Ahmad 3 , Alhuseen Omar Alsayed 4 , Muhammad Binsawad 5 , Muhammad Arshad Islam 6 , Mohib Ullah 1, 2
Affiliation  

With the advancement in ICT, web search engines have become a preferred source to find health-related information published over the Internet. Google alone receives more than one billion health-related queries on a daily basis. However, in order to provide the results most relevant to the user, WSEs maintain the users’ profiles. These profiles may contain private and sensitive information such as the user’s health condition, disease status, and others. Health-related queries contain privacy-sensitive information that may infringe user’s privacy, as the identity of a user is exposed and may be misused by the WSE and third parties. This raises serious concerns since the identity of a user is exposed and may be misused by third parties. One well-known solution to preserve privacy involves issuing the queries via peer-to-peer private information retrieval protocol, such as useless user profile (UUP), thereby hiding the user’s identity from the WSE. This paper investigates the level of protection offered by UUP. For this purpose, we present QuPiD (query profile distance) attack: a machine learning-based attack that evaluates the effectiveness of UUP in privacy protection. QuPiD attack determines the distance between the user’s profile (web search history) and upcoming query using our proposed novel feature vector. The experiments were conducted using ten classification algorithms belonging to the tree-based, rule-based, lazy learner, metaheuristic, and Bayesian families for the sake of comparison. Furthermore, two subsets of an America Online dataset (noisy and clean datasets) were used for experimentation. The results show that the proposed QuPiD attack associates more than 70% queries to the correct user with a precision of over 72% for the clean dataset, while for the noisy dataset, the proposed QuPiD attack associates more than 40% queries to the correct user with 70% precision.

中文翻译:

QuPiD 攻击:健康相关网络搜索中 PIR 协议的基于机器学习的隐私量化机制

随着 ICT 的进步,网络搜索引擎已成为查找通过 Internet 发布的与健康相关的信息的首选来源。仅 Google 每天就会收到超过 10 亿次与健康相关的查询。但是,为了提供与用户最相关的结果,WSE 会维护用户的配置文件。这些配置文件可能包含隐私和敏感信息,例如用户的健康状况、疾病状态等。与健康相关的查询包含可能侵犯用户隐私的隐私敏感信息,因为用户的身份被暴露并且可能被 WSE 和第三方滥用。这引起了严重的担忧,因为用户的身份被暴露并可能被第三方滥用。保护隐私的一种众所周知的解决方案涉及通过对等私有信息检索协议(例如无用用户配置文件 (UUP))发出查询,从而向 WSE 隐藏用户的身份。本文研究了 UUP 提供的保护级别。为此,我们提出了 QuPiD(查询配置文件距离)攻击:一种基于机器学习的攻击,用于评估 UUP 在隐私保护方面的有效性。QuPiD 攻击使用我们提出的新颖特征向量确定用户的个人资料(网络搜索历史)和即将到来的查询之间的距离。为了比较,使用属于基于树、基于规则、惰性学习器、元启发式和贝叶斯系列的十种分类算法进行了实验。此外,美国在线数据集的两个子集(嘈杂和干净的数据集)用于实验。结果表明,提出的 QuPiD 攻击将超过 70% 的查询关联到正确的用户,对于干净数据集的精度超过 72%,而对于嘈杂的数据集,提出的 QuPiD 攻击将超过 40% 的查询关联到正确的用户以 70% 的精度。
更新日期:2020-07-14
down
wechat
bug