Articles producció científica> Enginyeria Informàtica i Matemàtiques

Privacy protection of textual attributes through a semantic-based masking method

  • Dades identificatives

    Identificador: imarina:9298248
    Autors:
    Martinez, SergioSanchez, DavidValls, AidaBatet, Montserrat
    Resum:
    Using microdata provided by statistical agencies has many benefits from the data mining point of view. However, such data often involve sensitive information that can be directly or indirectly related to individuals. An appropriate anonymisation process is needed to minimise the risk of disclosure. Several masking methods have been developed to deal with continuous-scale numerical data or bounded textual values but approaches to tackling the anonymisation of textual values are scarce and shallow. Because of the importance of textual data in the Information Society, in this paper we present a new masking method for anonymising unbounded textual values based on the fusion of records with similar values to form groups of indistinguishable individuals. Since, from the data exploitation point of view, the utility of textual information is closely related to the preservation of its meaning, our method relies on the structured knowledge representation given by ontologies. This domain knowledge is used to guide the masking process towards the merging that best preserves the semantics of the original data. Because textual data typically consist of large and heterogeneous value sets, our method provides a computationally efficient algorithm by relying on several heuristics rather than exhaustive searches. The method is evaluated with real data in a concrete data mining application that involves solving a clustering problem. We also compare the method with more classical approaches that focus on optimising the value distribution of the dataset. Results show that a semantically grounded anonymisation best preserves the utility of data in both the theoretical and the practical setting, and reduces the probability of record linkage. At the same time, it achieves good scalability with
  • Altres:

    Autor segons l'article: Martinez, Sergio; Sanchez, David; Valls, Aida; Batet, Montserrat
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Batet Sanromà, Montserrat / Martinez Lluis, Sergio / Sánchez Ruenes, David / Valls Mateu, Aïda
    Paraules clau: Semantic similarity Privacy protection Ontologies Fusion of textual data Anonymity
    Resum: Using microdata provided by statistical agencies has many benefits from the data mining point of view. However, such data often involve sensitive information that can be directly or indirectly related to individuals. An appropriate anonymisation process is needed to minimise the risk of disclosure. Several masking methods have been developed to deal with continuous-scale numerical data or bounded textual values but approaches to tackling the anonymisation of textual values are scarce and shallow. Because of the importance of textual data in the Information Society, in this paper we present a new masking method for anonymising unbounded textual values based on the fusion of records with similar values to form groups of indistinguishable individuals. Since, from the data exploitation point of view, the utility of textual information is closely related to the preservation of its meaning, our method relies on the structured knowledge representation given by ontologies. This domain knowledge is used to guide the masking process towards the merging that best preserves the semantics of the original data. Because textual data typically consist of large and heterogeneous value sets, our method provides a computationally efficient algorithm by relying on several heuristics rather than exhaustive searches. The method is evaluated with real data in a concrete data mining application that involves solving a clustering problem. We also compare the method with more classical approaches that focus on optimising the value distribution of the dataset. Results show that a semantically grounded anonymisation best preserves the utility of data in both the theoretical and the practical setting, and reduces the probability of record linkage. At the same time, it achieves good scalability with regard to the size of input data. (C) 2011 Elsevier B.V. All rights reserved.
    Àrees temàtiques: Software Signal processing Information systems Hardware and architecture Engenharias iv Engenharias iii Computer science, theory & methods Computer science, artificial intelligence Ciência da computação
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: montserrat.batet@urv.cat david.sanchez@urv.cat aida.valls@urv.cat sergio.martinezl@urv.cat
    Identificador de l'autor: 0000-0001-8174-7592 0000-0001-7275-7887 0000-0003-3616-7809 0000-0002-3941-5348
    Data d'alta del registre: 2024-10-12
    Versió de l'article dipositat: info:eu-repo/semantics/acceptedVersion
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Referència a l'article segons font original: Information Fusion. 13 (4): 304-314
    Referència de l'ítem segons les normes APA: Martinez, Sergio; Sanchez, David; Valls, Aida; Batet, Montserrat (2012). Privacy protection of textual attributes through a semantic-based masking method. Information Fusion, 13(4), 304-314. DOI: 10.1016/j.inffus.2011.03.004
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2012
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Computer Science, Artificial Intelligence,Computer Science, Theory & Methods,Hardware and Architecture,Information Systems,Signal Processing,Software
    Semantic similarity
    Privacy protection
    Ontologies
    Fusion of textual data
    Anonymity
    Software
    Signal processing
    Information systems
    Hardware and architecture
    Engenharias iv
    Engenharias iii
    Computer science, theory & methods
    Computer science, artificial intelligence
    Ciência da computação
  • Documents:

  • Cerca a google

    Search to google scholar