Articles producció científica> Enginyeria Informàtica i Matemàtiques

Toward sensitive document release with privacy guarantees

  • Datos identificativos

    Identificador: PC:2536
    Autores:
    David SánchezMontserrat Batet
    Resumen:
    DOI: 10.1016/j.engappai.2016.12.013 URL: http://www.sciencedirect.com/science/article/pii/S0952197616302408 Filiació URV: SI Inclòs a la memòria: SI
  • Otros:

    Autor según el artículo: David Sánchez; Montserrat Batet
    Departamento: Enginyeria Informàtica i Matemàtiques
    Autor/es de la URV: SÁNCHEZ RUENES, DAVID; Montserrat Batet
    Palabras clave: Ontologies Privacy semantics
    Resumen: Privacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection measures. Nowadays, much of these data are texts (e.g., emails, messages posted in social media, healthcare outcomes, etc.) that, because of their unstructured and semantic nature, constitute a challenge for automatic data protection methods. In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. To do so, human experts identify sensitive terms (i.e., terms that may reveal identities and/or confidential information) and protect them accordingly (e.g., via removal or, preferably, generalization). To relieve experts from this burdensome task, in a previous work we introduced the theoretical basis of C-sanitization, an inherently semantic privacy model that provides the basis to the development of automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations when applied to practice (mainly regarding flexibility, efficiency and accuracy). In this paper, we propose a new more flexible model, named (C, g(C))-sanitization, which enables an intuitive configuration of the trade-off between the desired level of protection (i.e., controlled information disclosure) and the preservation of the utility of the protected data (i.e., amount of semantics to be preserved). Moreover, we also present a set of technical solutions and algorithms that provide an efficient and scalable implementation of the model and improve its practical accuracy, as we also illustrate through empirical experiments.
    Grupo de investigación: Seguretat i Privadesa
    Áreas temáticas: Enginyeria informàtica Ingeniería informática Computer engineering
    Acceso a la licencia de uso: https://creativecommons.org/licenses/by/3.0/es/
    ISSN: 0952-1976
    Identificador del autor: 0000-0001-7275-7887; 0000-0001-8174-7592
    Fecha de alta del registro: 2017-01-18
    Página final: 24
    Volumen de revista: 59
    Versión del articulo depositado: info:eu-repo/semantics/submittedVersion
    Enlace a la fuente original: https://www.sciencedirect.com/science/article/abs/pii/S0952197616302408?via%3Dihub
    URL Documento de licencia: https://repositori.urv.cat/ca/proteccio-de-dades/
    DOI del artículo: 10.1016/j.engappai.2016.12.013
    Entidad: Universitat Rovira i Virgili
    Año de publicación de la revista: 2017
    Página inicial: 23
    Tipo de publicación: Article Artículo Article
  • Palabras clave:

    Protecció de dades
    Seguretat informàtica
    Ontologies
    Privacy
    semantics
    Enginyeria informàtica
    Ingeniería informática
    Computer engineering
    0952-1976
  • Documentos:

  • Cerca a google

    Search to google scholar