Articles producció científica> Enginyeria Informàtica i Matemàtiques

A semantic framework for noise addition with nominal data

  • Dades identificatives

    Identificador: imarina:5130932
    Autors:
    Rodriguez-Garcia, MercedesBatet, MontserratSanchez, David
    Resum:
    Noise addition is a data distortion technique widely used in data intensive applications. For example, in machine learning tasks it helps to reduce overfitting, whereas in data privacy protection it adds uncertainty to personally identifiable information. Yet, due to its mathematical operating principle, noise addition is a method mainly intended for continuous numerical data. In fact, despite the large amount of nominal data that are being currently compiled and used in data analysis, only a few alternative techniques have been proposed to distort nominal data in a similar way as standard noise addition does for numerical data. Furthermore, all these alternative methods rely on the distribution of the data rather than on the semantics of nominal values, which negatively affects the utility of the distorted outcomes. To tackle this issue, in this paper we present a semantically-grounded alternative to numerical noise suitable for nominal data, which we name semantic noise. By means of semantic noise, and by exploiting structured knowledge sources such as ontologies, we are able to distort nominal data while preserving better their semantics and thus, their analytical utility. To that end, we provide semantically and mathematically coherent versions of the statistical operators required in the noise addition process, which include the difference, the mean, the variance and the covariance. Then, we propose semantic noise addition algorithms that cope with the finite, discrete and non-ordinal nature of nominal data. The proposed algorithms cover both uncorrelated noise addition, which is suited to independent attributes, and correlated noise addition, which can cope with multivariate datasets with dependent attributes. Empirical results show that our proposals offer genera
  • Altres:

    Autor segons l'article: Rodriguez-Garcia, Mercedes; Batet, Montserrat; Sanchez, David
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Batet Sanromà, Montserrat / Sánchez Ruenes, David
    Paraules clau: Semantics Ontologies Nominal data Noise addition Medical ontologies
    Resum: Noise addition is a data distortion technique widely used in data intensive applications. For example, in machine learning tasks it helps to reduce overfitting, whereas in data privacy protection it adds uncertainty to personally identifiable information. Yet, due to its mathematical operating principle, noise addition is a method mainly intended for continuous numerical data. In fact, despite the large amount of nominal data that are being currently compiled and used in data analysis, only a few alternative techniques have been proposed to distort nominal data in a similar way as standard noise addition does for numerical data. Furthermore, all these alternative methods rely on the distribution of the data rather than on the semantics of nominal values, which negatively affects the utility of the distorted outcomes. To tackle this issue, in this paper we present a semantically-grounded alternative to numerical noise suitable for nominal data, which we name semantic noise. By means of semantic noise, and by exploiting structured knowledge sources such as ontologies, we are able to distort nominal data while preserving better their semantics and thus, their analytical utility. To that end, we provide semantically and mathematically coherent versions of the statistical operators required in the noise addition process, which include the difference, the mean, the variance and the covariance. Then, we propose semantic noise addition algorithms that cope with the finite, discrete and non-ordinal nature of nominal data. The proposed algorithms cover both uncorrelated noise addition, which is suited to independent attributes, and correlated noise addition, which can cope with multivariate datasets with dependent attributes. Empirical results show that our proposals offer general and configurable mechanisms to distort nominal data while preserving data semantics better than baseline methods based only on the distribution of the data.
    Àrees temàtiques: Software Matemática / probabilidade e estatística Management information systems Interdisciplinar Information systems and management Información y documentación Engenharias iv Engenharias iii Economia Computer science, artificial intelligence Ciencias sociales Ciências biológicas i Ciência da computação Astronomia / física Artificial intelligence Administração pública e de empresas, ciências contábeis e turismo
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: montserrat.batet@urv.cat david.sanchez@urv.cat
    Identificador de l'autor: 0000-0001-8174-7592 0000-0001-7275-7887
    Data d'alta del registre: 2024-10-12
    Versió de l'article dipositat: info:eu-repo/semantics/acceptedVersion
    Enllaç font original: https://www.sciencedirect.com/science/article/pii/S0950705117300473
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Referència a l'article segons font original: Knowledge-Based Systems. 122 103-118
    Referència de l'ítem segons les normes APA: Rodriguez-Garcia, Mercedes; Batet, Montserrat; Sanchez, David (2017). A semantic framework for noise addition with nominal data. Knowledge-Based Systems, 122(), 103-118. DOI: 10.1016/j.knosys.2017.01.032
    DOI de l'article: 10.1016/j.knosys.2017.01.032
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2017
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Artificial Intelligence,Computer Science, Artificial Intelligence,Information Systems and Management,Management Information Systems,Software
    Semantics
    Ontologies
    Nominal data
    Noise addition
    Medical ontologies
    Software
    Matemática / probabilidade e estatística
    Management information systems
    Interdisciplinar
    Information systems and management
    Información y documentación
    Engenharias iv
    Engenharias iii
    Economia
    Computer science, artificial intelligence
    Ciencias sociales
    Ciências biológicas i
    Ciência da computação
    Astronomia / física
    Artificial intelligence
    Administração pública e de empresas, ciências contábeis e turismo
  • Documents:

  • Cerca a google

    Search to google scholar