Articles producció científicaEnginyeria Informàtica i Matemàtiques

Minimizing the disclosure risk of semantic correlations in document sanitization

  • Dades identificatives

    Identificador:  imarina:6387458
    Autors:  Sánchez, D; Batet, M; Viejo, A
    Resum:
    Text sanitization is crucial to enable privacy-preserving declassification of confidential documents. Moreover, considering the advent of new information sharing technologies that enable the daily publication of thousands of textual documents, automatic and semi-automatic sanitization methods are needed. Even though several of these methods have been proposed, most of them detect and sanitize sensitive terms (e.g., people names, addresses, diseases, etc.) independently, neglecting the importance of semantic correlations. From the attacker's perspective, semantic correlations can be exploited to disclose a sanitized term from the presence of one or several non-sanitized words. To tackle this problem, this paper presents a general-purpose method that, by taking the output of a standard sanitization mechanism, analyses, detects and proposes for sanitization those semantically correlated terms that represent a plausible disclosure risk for the already sanitized ones. Our method relies on an information-theoretic formulation of disclosure risk which is able to adapt its behavior to the criterion of the initial sanitizer. The evaluation, carried on over a collection of real documents, shows that semantic correlations represent a real privacy threat in prior sanitized documents, and that our method is able to detect them effectively. As a result, the disclosure risk of the sanitized output is significantly reduced with respect to standard sanitization mechanisms. © 2013 Elsevier Inc. All rights reserved.
  • Altres:

    Enllaç font original: https://www.sciencedirect.com/science/article/abs/pii/S0020025513004799?via%3Dihub
    Referència de l'ítem segons les normes APA: Sánchez, D; Batet, M; Viejo, A (2013). Minimizing the disclosure risk of semantic correlations in document sanitization. Information Sciences, 249(), 110-123. DOI: 10.1016/j.ins.2013.06.042
    Referència a l'article segons font original: Information Sciences. 249 110-123
    DOI de l'article: 10.1016/j.ins.2013.06.042
    Any de publicació de la revista: 2013-11-10
    Entitat: Universitat Rovira i Virgili
    Versió de l'article dipositat: info:eu-repo/semantics/acceptedVersion
    Data d'alta del registre: 2026-05-09
    Autor/s de la URV: Batet Sanromà, Montserrat / SANCHEZ CERVELLÓ, DOMINGO JOSÉ / Sánchez Ruenes, David / Viejo Galicia, Luis Alexandre
    Departament: Enginyeria Informàtica i Matemàtiques
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Tipus de publicació: Journal Publications
    ISSN: 00200255
    Autor segons l'article: Sánchez, D; Batet, M; Viejo, A
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Àrees temàtiques: Theoretical computer science, Software, Information systems and management, Control and systems engineering, Computer science, information systems, Computer science applications, Ciencias sociales, Ciência da computação, Astronomia / física, Artificial intelligence
    Adreça de correu electrònic de l'autor: montserrat.batet@urv.cat, montserrat.batet@urv.cat, david.sanchez@urv.cat, david.sanchez@urv.cat, alexandre.viejo@urv.cat, alexandre.viejo@urv.cat, montserrat.batet@urv.cat
  • Paraules clau:

    Semantic correlation
    Privacy
    Information theory
    Document sanitization
    Artificial Intelligence
    Computer Science Applications
    Computer Science
    Information Systems
    Control and Systems Engineering
    Information Systems and Management
    Software
    Theoretical Computer Science
    Ciencias sociales
    Ciência da computação
    Astronomia / física
  • Documents:

  • Cerca a google

    Search to google scholar