Articles producció científica> Enginyeria Informàtica i Matemàtiques

Utility-preserving sanitization of semantically correlated terms in textual documents

  • Dades identificatives

    Identificador: PC:812
    Autors:
    Viejo, A.Batet, A.Sánchez, D.
    Resum:
    10.1016/j.ins.2014.03.103
  • Altres:

    Autor segons l'article: Viejo, A. Batet, A. Sánchez, D.
    Departament: Enginyeria Informàtica i Matemàtiques
    Resum: Traditionally, redaction has been the method chosen to mitigate the privacy issues related to the declassification of textual documents containing sensitive data. This process is based on removing sensitive words in the documents prior to their release and has the undesired side effect of severely reducing the utility of the content. Document sanitization is a recent alternative to redaction, which avoids utility issues by generalizing the sensitive terms instead of eliminating them. Some (semi-)automatic redaction/sanitization schemes can be found in the literature; however, they usually neglect the importance of semantic correlations between the terms of the document, even though these may disclose sanitized/redacted sensitive terms. To tackle this issue, this paper proposes a theoretical framework grounded in the Information Theory, which offers a general model capable of measuring the disclosure risk caused by semantically correlated terms, regardless of the fact that they are proposed for removal or generalization. The new method specifically focuses on generating sanitized documents that retain as much utility (i.e., semantics) as possible while fulfilling the privacy requirements. The implementation of the method has been evaluated in a practical setting, showing that the new approach improves the output’s utility in comparison to the previous work, while retaining a similar level of accuracy.
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Paraula clau altres idiomes: Semantic knowledge
    ISSN: 0020-0255
    Pàgina final: 93
    Volum de revista: 279
    Versió de l'article dipositat: info:eu-repo/semantics/submittedVersion
    Enllaç font original: http://www.sciencedirect.com/science/article/pii/S0020025514004009
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    DOI de l'article: 10.1016/j.ins.2014.03.103
    Entitat: Universitat Rovira i Virgili.
    Any de publicació de la revista: 2014
    Pàgina inicial: 77