Articles producció científica> Enginyeria Informàtica i Matemàtiques

A Real-Time Query Log Protection Method for Web Search Engines

  • Dades identificatives

    Identificador: imarina:6390195
    Autors:
    Pamies-Estrems DCastella-Roca JGarcia-Alfaro J
    Resum:
    © 2013 IEEE. Web search engines (e.g., Google, Bing, Qwant, and DuckDuckGo) may process a myriad of search queries per second. According to Internet Live Stats, Google handles more than two hundred million queries per hour, i.e., about two trillion queries per year. For monetization purposes, the queries can be stored and complemented with additional data, referred to as query logs. Together, they can correlate valuable information to build accurate user profiles. Before releasing the query logs to third parties (e.g., for profit purposes), the personal information contained in the query logs must be properly protected by the web search engines. Current regulations establish strict control, and require from provable anonymization processing (e.g., in terms of statistical disclosure) of any personally identifiable information. In this paper, we tackle this challenge. We propose a real-time anonymization solution to protect streams of unstructured data at the server side. Our approach is based on the use of a probabilistic $k$ -anonymity technique. It allows probabilistic processing of personally identifiable attributes contained in the query logs, with provable privacy properties. Our solution handles limitations of traditional $k$ -anonymity approaches with respect to unstructured data and real-time processing. We present the implementation of our solution and report experimental evaluation results. The evaluation is conducted in terms of privacy, utility, and scalability achievement. Results validate the feasibility of our proposal.
  • Altres:

    Autor segons l'article: Pamies-Estrems D; Castella-Roca J; Garcia-Alfaro J
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Castellà Roca, Jordi
    Paraules clau: Web search engines Social networks Single-database Query logs Private information-retrieval Privacy Data streams Anonymization
    Resum: © 2013 IEEE. Web search engines (e.g., Google, Bing, Qwant, and DuckDuckGo) may process a myriad of search queries per second. According to Internet Live Stats, Google handles more than two hundred million queries per hour, i.e., about two trillion queries per year. For monetization purposes, the queries can be stored and complemented with additional data, referred to as query logs. Together, they can correlate valuable information to build accurate user profiles. Before releasing the query logs to third parties (e.g., for profit purposes), the personal information contained in the query logs must be properly protected by the web search engines. Current regulations establish strict control, and require from provable anonymization processing (e.g., in terms of statistical disclosure) of any personally identifiable information. In this paper, we tackle this challenge. We propose a real-time anonymization solution to protect streams of unstructured data at the server side. Our approach is based on the use of a probabilistic $k$ -anonymity technique. It allows probabilistic processing of personally identifiable attributes contained in the query logs, with provable privacy properties. Our solution handles limitations of traditional $k$ -anonymity approaches with respect to unstructured data and real-time processing. We present the implementation of our solution and report experimental evaluation results. The evaluation is conducted in terms of privacy, utility, and scalability achievement. Results validate the feasibility of our proposal.
    Àrees temàtiques: Telecommunications Materials science (miscellaneous) Materials science (all) General materials science General engineering General computer science Engineering, electrical & electronic Engineering (miscellaneous) Engineering (all) Engenharias iv Engenharias iii Electrical and electronic engineering Computer science, information systems Computer science (miscellaneous) Computer science (all) Ciência da computação
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    ISSN: 2169-3536
    Adreça de correu electrònic de l'autor: jordi.castella@urv.cat
    Identificador de l'autor: 0000-0002-0037-9888
    Data d'alta del registre: 2023-02-19
    Volum de revista: 8
    Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
    Enllaç font original: https://ieeexplore.ieee.org/document/9085377
    Referència a l'article segons font original: Ieee Access. 8 87393-87413
    Referència de l'ítem segons les normes APA: Pamies-Estrems D; Castella-Roca J; Garcia-Alfaro J (2020). A Real-Time Query Log Protection Method for Web Search Engines. Ieee Access, 8(), 87393-87413. DOI: 10.1109/ACCESS.2020.2992012
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    DOI de l'article: 10.1109/ACCESS.2020.2992012
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2020
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Computer Science (Miscellaneous),Computer Science, Information Systems,Engineering (Miscellaneous),Engineering, Electrical & Electronic,Materials Science (Miscellaneous),Telecommunications
    Web search engines
    Social networks
    Single-database
    Query logs
    Private information-retrieval
    Privacy
    Data streams
    Anonymization
    Telecommunications
    Materials science (miscellaneous)
    Materials science (all)
    General materials science
    General engineering
    General computer science
    Engineering, electrical & electronic
    Engineering (miscellaneous)
    Engineering (all)
    Engenharias iv
    Engenharias iii
    Electrical and electronic engineering
    Computer science, information systems
    Computer science (miscellaneous)
    Computer science (all)
    Ciência da computação
  • Documents:

  • Cerca a google

    Search to google scholar