Articles producció científicaEnginyeria Informàtica i Matemàtiques

Toward sensitive document release with privacy guarantees

  • Dades identificatives

    Identificador:  imarina:5130658
    Autors:  Sanchez, David; Batet, Montserrat
    Resum:
    Privacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection measures. Nowadays, much of these data are texts (e.g., emails, messages posted in social media, healthcare outcomes, etc.) that, because of their unstructured and semantic nature, constitute a challenge for automatic data protection methods. In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. To do so, human experts identify sensitive terms (i.e., terms that may reveal identities and/or confidential information) and protect them accordingly (e.g., via removal or, preferably, generalization). To relieve experts from this burdensome task, in a previous work we introduced the theoretical basis of C-sanitization, an inherently semantic privacy model that provides the basis to the development of automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations when applied to practice (mainly regarding flexibility, efficiency and accuracy). In this paper, we propose a new more flexible model, named (C, g(C))-sanitization, which enables an intuitive configuration of the trade-off between the desired level of protection (i.e., controlled information disclosure) and the preservation of the utility of the protected data (i.e., amount of semantics to be preserved). Moreover, we also present a set of technical solutions and algorithms that provide an efficient and scalable implementation of the model and improve its practical accuracy
  • Altres:

    Autor segons l'article: Sanchez, David; Batet, Montserrat
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Batet Sanromà, Montserrat / Sánchez Ruenes, David
    Paraules clau: Document redaction; Ontologies; Privacy; Sanitization; Semantics
    Resum: Privacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection measures. Nowadays, much of these data are texts (e.g., emails, messages posted in social media, healthcare outcomes, etc.) that, because of their unstructured and semantic nature, constitute a challenge for automatic data protection methods. In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. To do so, human experts identify sensitive terms (i.e., terms that may reveal identities and/or confidential information) and protect them accordingly (e.g., via removal or, preferably, generalization). To relieve experts from this burdensome task, in a previous work we introduced the theoretical basis of C-sanitization, an inherently semantic privacy model that provides the basis to the development of automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations when applied to practice (mainly regarding flexibility, efficiency and accuracy). In this paper, we propose a new more flexible model, named (C, g(C))-sanitization, which enables an intuitive configuration of the trade-off between the desired level of protection (i.e., controlled information disclosure) and the preservation of the utility of the protected data (i.e., amount of semantics to be preserved). Moreover, we also present a set of technical solutions and algorithms that provide an efficient and scalable implementation of the model and improve its practical accuracy, as we also illustrate through empirical experiments.
    Àrees temàtiques: Administração pública e de empresas, ciências contábeis e turismo; Artificial intelligence; Automation & control systems; Biotecnología; Ciência da computação; Ciência de alimentos; Ciências agrárias i; Computer science, artificial intelligence; Control and systems engineering; Electrical and electronic engineering; Engenharias i; Engenharias ii; Engenharias iii; Engenharias iv; Engineering; Engineering, electrical & electronic; Engineering, multidisciplinary; Interdisciplinar; Linguística e literatura; Matemática / probabilidade e estatística; Materiais; Medicina i; Robotics & automatic control
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: david.sanchez@urv.cat; montserrat.batet@urv.cat
    Data d'alta del registre: 2024-10-12
    Versió de l'article dipositat: info:eu-repo/semantics/acceptedVersion
    Enllaç font original: https://www.sciencedirect.com/science/article/pii/S0952197616302408
    Referència a l'article segons font original: Engineering Applications Of Artificial Intelligence. 59 23-34
    Referència de l'ítem segons les normes APA: Sanchez, David; Batet, Montserrat (2017). Toward sensitive document release with privacy guarantees. Engineering Applications Of Artificial Intelligence, 59(), 23-34. DOI: 10.1016/j.engappai.2016.12.013
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    DOI de l'article: 10.1016/j.engappai.2016.12.013
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2017
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Artificial Intelligence,Automation & Control Systems,Computer Science, Artificial Intelligence,Control and Systems Engineering,Electrical and Electronic Engineering,Engineering,Engineering, Electrical & Electronic,Engineering, Multidisciplinary,Robotics & Automatic Control
    Document redaction
    Ontologies
    Privacy
    Sanitization
    Semantics
    Administração pública e de empresas, ciências contábeis e turismo
    Artificial intelligence
    Automation & control systems
    Biotecnología
    Ciência da computação
    Ciência de alimentos
    Ciências agrárias i
    Computer science, artificial intelligence
    Control and systems engineering
    Electrical and electronic engineering
    Engenharias i
    Engenharias ii
    Engenharias iii
    Engenharias iv
    Engineering
    Engineering, electrical & electronic
    Engineering, multidisciplinary
    Interdisciplinar
    Linguística e literatura
    Matemática / probabilidade e estatística
    Materiais
    Medicina i
    Robotics & automatic control
  • Documents:

  • Cerca a google

    Search to google scholar