Tesis doctoralsDepartament d'Enginyeria Informàtica i Matemàtiques

Utility-Preserving Anonymization of Textual Documents

  • Identification data

    Identifier:  TDX:3182
    Authors:  Hassan, FadiAbdulfattah Mohammed
    Abstract:
    Every day, people post a significant amount of data on the Internet, such as tweets, reviews, photos, and videos. Organizations collecting these types of data use them to extract information in order to improve their services or for commercial purposes. Yet, if the collected data contain sensitive personal information, they cannot be shared with third parties or released publicly without consent or adequate protection of the data subjects. Privacy-preserving mechanisms provide ways to sanitize data so that identities and/or confidential attributes are not disclosed. A great variety of mechanisms have been proposed to anonymize structured databases with numerical and categorical attributes; however, automatically protecting unstructured textual data has received much less attention. In general, textual data anonymization requires, first, to detect pieces of text that may disclose sensitive information and, then, to mask those pieces via suppression or generalization. In this work, we leverage several technologies to anonymize textual documents. We first improve state-of-the-art techniques based on sequence labeling. After that, we extend them to make them more aligned with the notion of privacy risk and the privacy requirements. Finally, we propose a complete framework based on word embedding models that captures a broader notion of data protection and provides flexible protection driven by privacy requirements. We also leverage ontologies to preserve the utility of the masked text, that is, its semantics and readability. Extensive experimental results show that our methods outperform the state of the art by providing more robust anonymization while reasonably preserving the utility of the protected outcomes
  • Others:

    Publisher: Universitat Rovira i Virgili
    Date: 2021-06-21, 2021-06-30T09:16:29Z, 2021-06-30T09:16:29Z
    Identifier: http://hdl.handle.net/10803/672012
    Departament/Institute: Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili.
    Language: eng
    Author: Hassan, FadiAbdulfattah Mohammed
    Director: Sánchez Ruenes, David, Domingo Ferrer, Josep
    Source: TDX (Tesis Doctorals en Xarxa)
    Format: application/pdf, application/pdf, 122 p.
  • Keywords:

    Textual data
    Artificial intelligence
    Data privacy
    Datos textuales
    Inteligencia Artificial
    Privacidad de datos
    Dades textuals
    Intel·ligència Artificial
    Privacitat de dades
    Enginyeria i arquitectura
  • Documents:

  • Cerca a google

    Search to google scholar