Articles producció científicaEnginyeria Informàtica i Matemàtiques

Utility-preserving privacy protection of textual healthcare documents

  • Dades identificatives

    Identificador:  imarina:6388007
    Autors:  Sanchez, David; Batet, Montserrat; Viejo, Alexandre
    Resum:
    © 2014 Elsevier Inc. The adoption of ITs by medical organisations makes possible the compilation of large amounts of healthcare data, which are quite often needed to be released to third parties for research or business purposes. Many of this data are of sensitive nature, because they may include patient-related documents such as electronic healthcare records. In order to protect the privacy of individuals, several legislations on healthcare data management, which state the kind of information that should be protected, have been defined. Traditionally, to meet with current legislations, a manual redaction process is applied to patient-related documents in order to remove or black-out sensitive terms. This process is costly and time-consuming and has the undesired side effect of severely reducing the utility of the released content. Automatic methods available in the literature usually propose ad-hoc solutions that are limited to protect specific types of structured information (e.g. e-mail addresses, social security numbers, etc.); as a result, they are hardly applicable to the sensitive entities stated in current regulations that do not present those structural regularities (e.g. diseases, symptoms, treatments, etc.). To tackle these limitations, in this paper we propose an automatic sanitisation method for textual medical documents (e.g. electronic healthcare records) that is able to protect, regardless of their structure, sensitive entities (e.g. diseases) and also those semantically related terms (e.g. symptoms) that may disclose the former ones. Contrary to redaction schemes based on term removal, our approach improves the utility of the protected output by replacing sensitive terms with appropriate generalisations retrieved from several medical and general-purpose
  • Altres:

    Autor segons l'article: Sanchez, David; Batet, Montserrat; Viejo, Alexandre
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Batet Sanromà, Montserrat / SANCHEZ CERVELLÓ, DOMINGO JOSÉ / Sánchez Ruenes, David / Viejo Galicia, Luis Alexandre
    Paraules clau: data sanitisation; document redaction; healthcare data; information theory; privacy-protection; Computer security; Confidentiality; Data sanitisation; Document redaction; Electronic health records; Healthcare data; Humans; Information theory; Natural language processing; Privacy-protection; Semantics
    Resum: © 2014 Elsevier Inc. The adoption of ITs by medical organisations makes possible the compilation of large amounts of healthcare data, which are quite often needed to be released to third parties for research or business purposes. Many of this data are of sensitive nature, because they may include patient-related documents such as electronic healthcare records. In order to protect the privacy of individuals, several legislations on healthcare data management, which state the kind of information that should be protected, have been defined. Traditionally, to meet with current legislations, a manual redaction process is applied to patient-related documents in order to remove or black-out sensitive terms. This process is costly and time-consuming and has the undesired side effect of severely reducing the utility of the released content. Automatic methods available in the literature usually propose ad-hoc solutions that are limited to protect specific types of structured information (e.g. e-mail addresses, social security numbers, etc.); as a result, they are hardly applicable to the sensitive entities stated in current regulations that do not present those structural regularities (e.g. diseases, symptoms, treatments, etc.). To tackle these limitations, in this paper we propose an automatic sanitisation method for textual medical documents (e.g. electronic healthcare records) that is able to protect, regardless of their structure, sensitive entities (e.g. diseases) and also those semantically related terms (e.g. symptoms) that may disclose the former ones. Contrary to redaction schemes based on term removal, our approach improves the utility of the protected output by replacing sensitive terms with appropriate generalisations retrieved from several medical and general-purpose knowledge bases. Experiments conducted on highly sensitive documents and in coherency with current regulations on healthcare data privacy show promising results in terms of the practical privacy and utility of the protected output.
    Àrees temàtiques: Ciência da computação; Ciências biológicas i; Computer science applications; Computer science, interdisciplinary applications; Engenharias iv; Ensino; Health informatics; Interdisciplinar; Mathematical & computational biology; Medical informatics; Saúde coletiva
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: alexandre.viejo@urv.cat; david.sanchez@urv.cat; montserrat.batet@urv.cat
    ISSN: 15320464
    Data d'alta del registre: 2025-02-08
    Versió de l'article dipositat: info:eu-repo/semantics/acceptedVersion
    Enllaç font original: https://www.sciencedirect.com/science/article/pii/S1532046414001464?via%3Dihub
    Referència a l'article segons font original: Journal Of Biomedical Informatics. 52 189-198
    Referència de l'ítem segons les normes APA: Sanchez, David; Batet, Montserrat; Viejo, Alexandre (2014). Utility-preserving privacy protection of textual healthcare documents. Journal Of Biomedical Informatics, 52(), 189-198. DOI: 10.1016/j.jbi.2014.06.008
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    DOI de l'article: 10.1016/j.jbi.2014.06.008
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2014
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Computer Science Applications,Computer Science, Interdisciplinary Applications,Health Informatics,Mathematical & Computational Biology,Medical Informatics
    data sanitisation
    document redaction
    healthcare data
    information theory
    privacy-protection
    Computer security
    Confidentiality
    Data sanitisation
    Document redaction
    Electronic health records
    Healthcare data
    Humans
    Information theory
    Natural language processing
    Privacy-protection
    Semantics
    Ciência da computação
    Ciências biológicas i
    Computer science applications
    Computer science, interdisciplinary applications
    Engenharias iv
    Ensino
    Health informatics
    Interdisciplinar
    Mathematical & computational biology
    Medical informatics
    Saúde coletiva
    15320464
  • Documents:

  • Cerca a google

    Search to google scholar