Domingo-Ferrer, Josep; Muralidhar, Krishnamurty; Martinez, Sergio (2025). Synthetic Data Generation via the Permutation Paradigm With Optional k-Anonymity. Ieee Transactions On Dependable And Secure Computing, 22(3), 3155-3165. DOI: 10.1109/tdsc.2024.3525149
Paper original source:
Ieee Transactions On Dependable And Secure Computing. 22 (3): 3155-3165
Abstract:
Most methods in the literature on synthetic microdata (individual records) generation are parametric, that is, they require knowing or estimating the joint or the conditional distribution of the original microdata. This may be a significant hurdle unless the original microdata are multivariate normal. We propose a rank-based approach to generating synthetic microdata based on the permutation paradigm. We present three different methods and we analyze the utility and the confidentiality they afford. The third method is actually an extension of the second method that adds k-anonymity protection against reidentification to the confidentiality against attribute disclosure offered by the first two methods. Our algorithms only require the identification of the marginal distributions of attributes and yield synthetic attributes that replicate the relationships between the original attributes exclusively based on ranks. This proposal is especially attractive for non-normal or multi-type microdata.
Most methods in the literature on synthetic microdata (individual records) generation are parametric, that is, they require knowing or estimating the joint or the conditional distribution of the original microdata. This may be a significant hurdle unless the original microdata are multivariate normal. We propose a rank-based approach to generating synthetic microdata based on the permutation paradigm. We present three different methods and we analyze the utility and the confidentiality they afford. The third method is actually an extension of the second method that adds k-anonymity protection against reidentification to the confidentiality against attribute disclosure offered by the first two methods. Our algorithms only require the identification of the marginal distributions of attributes and yield synthetic attributes that replicate the relationships between the original attributes exclusively based on ranks. This proposal is especially attractive for non-normal or multi-type microdata.
Title:
Synthetic Data Generation via the Permutation Paradigm With Optional k-Anonymity