Articles producció científica> Enginyeria Informàtica i Matemàtiques

Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion

  • Dades identificatives

    Identificador: imarina:9282651
    Autors:
    Oganian ADomingo-Ferrer J
    Resum:
    Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of local synthesis of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of k-anonymity; in particular we use a variant of the k-anonymity privacy model, namely probabilistic k-anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation. © 2017, University of Skovde. All rights reserved.
  • Altres:

    Autor segons l'article: Oganian A; Domingo-Ferrer J
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Domingo Ferrer, Josep / OGANIAN, ANNA
    Paraules clau: Synthetic data generations Synthetic data Statistical disclosure limitations Statistical disclosure limitation (sdl) Sensitive informations Probabilistic k-anonymity Privacy Population statistics Mixture model Maximum principle K-anonymity Expectation-maximization algorithms Expectation-maximization (em) algorithm Disclosure limitations Data privacy utility synthetic data risk probabilistic k-anonymity mixture model microaggregation expectation-maximization (em) algorithm
    Resum: Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of local synthesis of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of k-anonymity; in particular we use a variant of the k-anonymity privacy model, namely probabilistic k-anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation. © 2017, University of Skovde. All rights reserved.
    Àrees temàtiques: Statistics and probability Software Computer science, theory & methods Ciência da computação
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: josep.domingo@urv.cat
    Identificador de l'autor: 0000-0001-7213-4962
    Data d'alta del registre: 2023-12-16
    Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
    Referència a l'article segons font original: Transactions On Data Privacy. 10 (1): 61-81
    Referència de l'ítem segons les normes APA: Oganian A; Domingo-Ferrer J (2017). Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion. Transactions On Data Privacy, 10(1), 61-81
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2017
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Computer Science, Theory & Methods,Software,Statistics and Probability
    Synthetic data generations
    Synthetic data
    Statistical disclosure limitations
    Statistical disclosure limitation (sdl)
    Sensitive informations
    Probabilistic k-anonymity
    Privacy
    Population statistics
    Mixture model
    Maximum principle
    K-anonymity
    Expectation-maximization algorithms
    Expectation-maximization (em) algorithm
    Disclosure limitations
    Data privacy
    utility
    synthetic data
    risk
    probabilistic k-anonymity
    mixture model
    microaggregation
    expectation-maximization (em) algorithm
    Statistics and probability
    Software
    Computer science, theory & methods
    Ciência da computação
  • Documents:

  • Cerca a google

    Search to google scholar