Articles producció científicaEnginyeria Informàtica i Matemàtiques

Minimizing the disclosure risk of semantic correlations in document sanitization

  • Dades identificatives

    Identificador:  imarina:6387458
    Autors:  Sanchez, David; Batet, Montserrat; Viejo, Alexandre
    Resum:
    Text sanitization is crucial to enable privacy-preserving declassification of confidential documents. Moreover, considering the advent of new information sharing technologies that enable the daily publication of thousands of textual documents, automatic and semi-automatic sanitization methods are needed. Even though several of these methods have been proposed, most of them detect and sanitize sensitive terms (e.g., people names, addresses, diseases, etc.) independently, neglecting the importance of semantic correlations. From the attacker's perspective, semantic correlations can be exploited to disclose a sanitized term from the presence of one or several non-sanitized words. To tackle this problem, this paper presents a general-purpose method that, by taking the output of a standard sanitization mechanism, analyses, detects and proposes for sanitization those semantically correlated terms that represent a plausible disclosure risk for the already sanitized ones. Our method relies on an information-theoretic formulation of disclosure risk which is able to adapt its behavior to the criterion of the initial sanitizer. The evaluation, carried on over a collection of real documents, shows that semantic correlations represent a real privacy threat in prior sanitized documents, and that our method is able to detect them effectively. As a result, the disclosure risk of the sanitized output is significantly reduced with respect to standard sanitization mechanisms. © 2013 Elsevier Inc. All rights reserved.
  • Altres:

    Enllaç font original: https://www.sciencedirect.com/science/article/abs/pii/S0020025513004799?via%3Dihub
    Referència de l'ítem segons les normes APA: Sanchez, David; Batet, Montserrat; Viejo, Alexandre (2013). Minimizing the disclosure risk of semantic correlations in document sanitization. Information Sciences, 249(), 110-123. DOI: 10.1016/j.ins.2013.06.042
    Referència a l'article segons font original: Information Sciences. 249 110-123
    DOI de l'article: 10.1016/j.ins.2013.06.042
    Any de publicació de la revista: 2013
    Entitat: Universitat Rovira i Virgili
    Versió de l'article dipositat: info:eu-repo/semantics/acceptedVersion
    Data d'alta del registre: 2025-02-08
    Autor/s de la URV: Batet Sanromà, Montserrat / SANCHEZ CERVELLÓ, DOMINGO JOSÉ / Sánchez Ruenes, David / Viejo Galicia, Luis Alexandre
    Departament: Enginyeria Informàtica i Matemàtiques
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Tipus de publicació: Journal Publications
    ISSN: 00200255
    Autor segons l'article: Sanchez, David; Batet, Montserrat; Viejo, Alexandre
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Àrees temàtiques: Administração pública e de empresas, ciências contábeis e turismo, Artificial intelligence, Astronomia / física, Biodiversidade, Ciência da computação, Ciências agrárias i, Ciências ambientais, Ciências biológicas i, Ciencias sociales, Computer science applications, Computer science, information systems, Comunicação e informação, Control and systems engineering, Engenharias i, Engenharias iii, Engenharias iv, Ensino, Information systems and management, Interdisciplinar, Matemática / probabilidade e estatística, Medicina ii, Software, Theoretical computer science
    Adreça de correu electrònic de l'autor: alexandre.viejo@urv.cat, david.sanchez@urv.cat, montserrat.batet@urv.cat
  • Paraules clau:

    Document sanitization
    Information theory
    Privacy
    Semantic correlation
    Artificial Intelligence
    Computer Science Applications
    Computer Science
    Information Systems
    Control and Systems Engineering
    Information Systems and Management
    Software
    Theoretical Computer Science
    Administração pública e de empresas
    ciências contábeis e turismo
    Astronomia / física
    Biodiversidade
    Ciência da computação
    Ciências agrárias i
    Ciências ambientais
    Ciências biológicas i
    Ciencias sociales
    Comunicação e informação
    Engenharias i
    Engenharias iii
    Engenharias iv
    Ensino
    Interdisciplinar
    Matemática / probabilidade e estatística
    Medicina ii
  • Documents:

  • Cerca a google

    Search to google scholar