Skip to main content
Log in

Privacy protection of user profiles in online search via semantic randomization

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Querying a search engine is one of the most frequent activities performed by Internet users. As queries are submitted, the server collects and aggregates them to build detailed user profiles. While user profiles are used to offer personalized search services, they may also be employed in behavioral targeting or, even worse, be transferred to third parties. Proactive protection of users' privacy in front of search engines has been tackled by submitting fake queries that aim at distorting the users' real profile. However, most approaches submit either random queries (which do not allow controlling the profile distortion) or queries constructed by following deterministic algorithms (which may be detected by aware search engines). In this paper, we propose a semantically grounded method to generate fake queries that (i) is driven by the privacy requirements of the user, (ii) submits the least number of fake queries needed to fulfill the requirements and (iii) creates queries in a non-deterministic way. Unlike related works, we accurately analyze and exploit the semantics underlying to user queries and their influence in the resulting profile. As a result, our approach offers more control—because users can tailor how their profile should be protected—and greater efficiency—because the desired protection is achieved with fewer fake queries. The experimental results on real query logs illustrate the benefits of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Viejo A, Sánchez D (2014) Profiling social networks to provide useful and privacy-preserving web search. J Am Soc Inf Sci 65(12):2444–2458

    Google Scholar 

  2. Gómez-Boix A, Laperdrix P, Baudry B (2018) Hiding in the crowd: an analysis of the effectiveness of browser fingerprinting at large scale. In: WWW2018—TheWebConf 2018: 27th international world wide web conference. 2018. Lyon, France

  3. Tegegne G, van der Weide TP (2014) Enriching queries with user preferences in healthcare. Inf Process Manag 50(4):599–620

    Article  Google Scholar 

  4. Bordogna G et al (2012) Disambiguated query suggestions and personalized content-similarity and novelty ranking of clustered results to optimize web searches. Inf Process Manag 48(3):419–437

    Article  Google Scholar 

  5. Selvaretnam B, Belkhatir M (2019) Coupled intrinsic and extrinsic human language resource-based query expansion. Knowl Inf Syst 60:1397–1426

    Article  Google Scholar 

  6. Raza MA, Mokhtar R, Ahmad N (2019) A survey of statistical apporaches for query expansion. Knowl Inf Syst 61:1–25

    Article  Google Scholar 

  7. Chen J, Stallaert J (2014) An Economic Analysis of Online Advertising Using Behavioral Targeting. MIS Q 38(2):429–449

    Article  Google Scholar 

  8. Ramirez E et al (2014) Data brokers: a call for transparency and accountability, in report. 2014, U.S. Federal Trade Commission

  9. Nissenbaum HF, Howe D (2009) Trackmenot: resisting surveillance in web search. In: Kerr I, Lucock C, Steeves V (eds) Lessons from the identity trail: anonymity, privacy, and identity in a networked society. Oxford University Press, Oxford

    Google Scholar 

  10. Romero-Tris C, Castellà-Roca J, Viejo A (2011) Multi-party private web search with untrusted partners. In: 7th International ICST conference on security and privacy in communication networks—SecureComm’11. Springer

  11. Viejo A, Castellà-Roca J (2010) Using social networks to distort users’ profiles generated by web search engines. Comput Netw 54:1343–1357

    Article  Google Scholar 

  12. Castellà-Roca J, Viejo A, Herrera-Joancomarti J (2009) Preserving user’s privacy in web search engines. Comput Commun 32:1541–1551

    Article  Google Scholar 

  13. Lindell Y, Waisbard E (2010) Private web search with malicious adversaries. In: 10th International conference on privacy enhancing technologies—PETS’10

  14. Romero-Tris C et al (2015) Design of a P2P network that protects users’ privacy in front of Web Search Engines. Comput Commun 57:37–49

    Article  Google Scholar 

  15. Kaaniche N et al (2020) Privacy preserving cooperative computation for personalized web search applications. I:n 35th Annual ACM symposium on applied computing. ACM, Brno, Czech Republic

  16. Petit A, Cerqueus T, Mokhtar SB, Brunie L (2015) Kosch. PEAS: private, efficient and accurate web search. In: 14th IEEE international conference on trust, security and privacy in computing and communications

  17. Romero-Tris C, Viejo A, Castellà-Roca J (2015) Multi-party methods for privacy-preserving web search: survey and contributions. In: Navarro-Arribas G, Torra V (eds) Advanced research in data privacy. Studies in computational intelligence. vol 567, Springer, Cham, pp 367–387. https://doi.org/10.1007/978-3-319-09885-2_20

    Chapter  Google Scholar 

  18. Domingo-Ferrer J, Solanas A, Castellà-Roca J (2009) h(k)-Private information retrieval from privacy-uncooperative queryable databases. J Online Inf Rev 33(4):1468–4527

    Google Scholar 

  19. Peddinti ST, Saxena N (2010) On the privacy of web search based on query obfuscation: a case study of trackmenot. In: 10th International conference on privacy enhancing technologies—PETS’10

  20. Shou L, Bai H, Chen K, Chen G (2012) Supporting privacy protection in personalized web search. IEEE Trans Knowl Data Eng 26(2):453–467

    Article  Google Scholar 

  21. Shapira B et al (2005) PRAW—a PRivAcy model for the Web. J Am Soc Inf Sci Technol 56:159–172

    Article  Google Scholar 

  22. Sánchez D, Castellà-Roca J, Viejo A (2013) Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines. Inf Sci 218:17–30

    Article  Google Scholar 

  23. Ahmad WU, Chang K-W, Wang H (2018) Intent-aware query obfuscation for privacy protection in personalized web search. In: 41st International ACM SIGIR conference on research and development in information retrieval. ACM, Ann Arbor, MI, USA

  24. Rodrigo-Ginés FJ et al (2018) PrivacySearch: an end-user and query generalization tool for privacy enhancement in web search. in international conference on network and system security—NSS 2018

  25. Wu Z et al (2020) A dummy-based user privacy protection approach for text information retrieval. Knowl Based Syst 195:105679

    Article  Google Scholar 

  26. Guarino N (1998) Formal ontology in information systems. In: 1st International conference on formal ontology in information systems, FOIS 1998. IOS Press, Trento, Italy

  27. Batet M, Sánchez D (2015) A Review on semantic similarity. In: Mehdi Khosrow-Pour DBA (ed) Encyclopedia of information science and technology. 3rd edn. IGI Global, pp 7575–7583. https://doi.org/10.4018/978-1-4666-5888-2.ch746

  28. Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceeding of the annual meeting of the association for computational linguistics. pp 133–139

  29. Sánchez D et al (2012) Enabling semantic similarity estimation across multiple ontologies: an evaluation in the biomedical domain. J Biomed Inform 45(1):141–155

    Article  Google Scholar 

  30. Batet M et al (2014) An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Inf Sci 283:197–210

    Article  Google Scholar 

  31. Martínez S, Valls A, Sánchez D (2012) Semantically-grounded construction of centroids for datasets with textual attributes. Knowl Based Syst 35:160–172

    Article  Google Scholar 

  32. Barbaro M, Zeller T (2006) A face is exposed for aol searcher no. 4417749. The New York Times. http://www.nytimes.com/2006/08/09/technology/09aol.html?pagewanted=2&ei=5088&en=996f61c946da4d34&ex=1312776000&partner=rssnyt&emc=rss

  33. Viejo A, Sánchez D, Castellà-Roca J (2012) Preventing automatic user profiling in Web 2.0 applications. Knowl Based Syst 36:191–205

    Article  Google Scholar 

  34. Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge

    Book  Google Scholar 

Download references

Acknowledgements

This work was partly supported by the European Commission (Projects H2020-871042 SoBigData++ and H2020-101006879 MobiDataLab) the Spanish Government (Projects RTI2018-095094-B-C21 CONSENT and TIN2016-80250-R Sec-MCloud), the Norwegian Research Council (Project 308904 CLEANUP) and the Government of Catalonia (2017 SGR 705 and ICREA Acadèmia Prize to David Sánchez). The opinions expressed in this paper are those of the authors and do not necessarily reflect the views of UNESCO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mercedes Rodriguez-Garcia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodriguez-Garcia, M., Batet, M., Sánchez, D. et al. Privacy protection of user profiles in online search via semantic randomization. Knowl Inf Syst 63, 2455–2477 (2021). https://doi.org/10.1007/s10115-021-01597-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01597-x

Keywords

Navigation