Automatic general-purpose sanitization of textual documents

Sánchez, D; Batet, M; Viejo, A

doi:10.1109/TIFS.2013.2239641

Dades identificatives

Identificador: imarina:6387373

Handle: https://hdl.handle.net/20.500.11797/imarina6387373

Autors: Sánchez, D; Batet, M; Viejo, A

Resum:
The advent of new information sharing technologies has led society to a scenario where thousands of textual documents are publicly published every day. The existence of confidential information in many of these documents motivates the use of measures to hide sensitive data before being published, which is precisely the goal of document sanitization. Even though methods to assist the sanitization process have been proposed, most of them are focused on the detection of specific types of sensitive entities for concrete domains, lacking generality and and requiring user supervision. Moreover, to hide sensitive terms, most approaches opt to remove them, a measure that hampers the utility of the sanitized document. This paper presents a general-purpose sanitization method that, based on information theory and exploiting knowledge bases, detects and hides sensitive textual information while preserving its meaning. Our proposal works in an automatic and unsupervised way and it can be applied to heterogeneous documents, which make it specially suitable for environments with massive and heterogeneous information-sharing needs. Evaluation results show that our method outperforms strategies based on trained classifiers regarding the detection recall, whereas it better retains the document's utility compared to term-suppression methods. © 2005-2012 IEEE.
Altres:

Enllaç font original: https://ieeexplore.ieee.org/document/6410029
Referència de l'ítem segons les normes APA: Sánchez, D; Batet, M; Viejo, A (2013). Automatic general-purpose sanitization of textual documents. Ieee Transactions On Information Forensics And Security, 8(6), 853-862. DOI: 10.1109/TIFS.2013.2239641
Referència a l'article segons font original: Ieee Transactions On Information Forensics And Security. 8 (6): 853-862
DOI de l'article: 10.1109/TIFS.2013.2239641
Any de publicació de la revista: 2013-06-03
Entitat: Universitat Rovira i Virgili
Versió de l'article dipositat: info:eu-repo/semantics/acceptedVersion
Data d'alta del registre: 2026-05-09
Autor/s de la URV: Batet Sanromà, Montserrat / SANCHEZ CERVELLÓ, DOMINGO JOSÉ / Sánchez Ruenes, David / Viejo Galicia, Luis Alexandre
Departament: Enginyeria Informàtica i Matemàtiques
URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
Tipus de publicació: Journal Publications
ISSN: 15566013
Autor segons l'article: Sánchez, D; Batet, M; Viejo, A
Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
Àrees temàtiques: Safety, risk, reliability and quality, Engineering, electrical & electronic, Engenharias iv, Engenharias iii, Computer science, theory & methods, Computer networks and communications, Ciência da computação
Adreça de correu electrònic de l'autor: montserrat.batet@urv.cat, montserrat.batet@urv.cat, david.sanchez@urv.cat, david.sanchez@urv.cat, alexandre.viejo@urv.cat, alexandre.viejo@urv.cat, montserrat.batet@urv.cat

Paraules clau:

Privacy
Information theory
Document sanitization
Data publishing
Computer Networks and Communications
Computer Science
Theory & Methods
Engineering
Electrical & Electronic
Safety
Risk
Reliability and Quality
Engenharias iv
Engenharias iii
Ciência da computação
Documents:

DocumentPrincipal
Cerca a google

Automatic general-purpose sanitization of textual documents

Dades identificatives

Altres:

Paraules clau:

Documents:

Cerca a google