Utility-preserving privacy protection of textual healthcare documents

Sánchez, D.; Batet, M.; Viejo, A.

doi:10.1016/j.jbi.2014.06.008

Identification data

Identifier: PC:1144

Handle: https://hdl.handle.net/20.500.11797/PC1144

Authors: Sánchez, D.; Batet, M.; Viejo, A.

Abstract:
The adoption of ITs by medical organisations makes possible the compilation of large amounts of healthcare data, which are quite often needed to be released to third parties for research or business purposes. Many of this data are of sensitive nature, because they may include patient-related documents such as electronic healthcare records. In order to protect the privacy of individuals, several legislations on healthcare data management, which state the kind of information that should be protected, have been defined. Traditionally, to meet with current legislations, a manual redaction process is applied to patient-related documents in order to remove or black-out sensitive terms. This process is costly and time-consuming and has the undesired side effect of severely reducing the utility of the released content. Automatic methods available in the literature usually propose ad-hoc solutions that are limited to protect specific types of structured information (e.g. e-mail addresses, social security numbers, etc.); as a result, they are hardly applicable to the sensitive entities stated in current regulations that do not present those structural regularities (e.g. diseases, symptoms, treatments, etc.). To tackle these limitations, in this paper we propose an automatic sanitisation method for textual medical documents (e.g. electronic healthcare records) that is able to protect, regardless of their structure, sensitive entities (e.g. diseases) and also those semantically related terms (e.g. symptoms) that may disclose the former ones. Contrary to redaction schemes based on term removal, our approach improves the utility of the protected output by replacing sensitive terms with appropriate generalisations retrieved from several medical and general-purpose knowledge bases. Experiments conducted on highly sensitive documents and in coherency with current regulations on healthcare data privacy show promising results in terms of the practical privacy and utility of the protected output.
Others:

Link to the original source: http://www.sciencedirect.com/science/article/pii/S1532046414001464
Article's DOI: 10.1016/j.jbi.2014.06.008
Journal publication year: 2014
Entity: Universitat Rovira i Virgili.
Paper version: info:eu-repo/semantics/submittedVersion
First page: 189
Department: Química Física i Inorgànica
Licence document URL: https://repositori.urv.cat/ca/proteccio-de-dades/
Last page: 198
ISSN: 1532-0464
Author, as appears in the article.: Sánchez, D., Batet, M., Viejo, A.
licence for use: https://creativecommons.org/licenses/by/3.0/es/
Journal volume: 52

Keywords:

1532-0464
Documents:

DocumentPrincipal
Cerca a google

Utility-preserving privacy protection of textual healthcare documents

Identification data

Others:

Keywords:

Documents:

Cerca a google