Articles producció científica> Enginyeria Informàtica i Matemàtiques

Explaining Image Misclassification in Deep Learning via Adversarial Examples

  • Dades identificatives

    Identificador: imarina:9229338
    Autors:
    Haffar, RamiJebreel, Najeeb MoharramDomingo-Ferrer, JosepSanchez, David
    Resum:
    With the increasing use of convolutional neural networks (CNNs) for computer vision and other artificial intelligence tasks, the need arises to interpret their predictions. In this work, we tackle the problem of explaining CNN misclassification of images. We propose to construct adversarial examples that allow identifying the regions of the input images that had the largest impact on the CNN wrong predictions. More specifically, for each image that was incorrectly classified by the CNN, we implemented an inverted adversarial attack consisting on modifying the input image as little as possible so that it becomes correctly classified. The changes made to the image to fix classification errors explain the causes of misclassification and allow adjusting the model and the data set to obtain more accurate models. We present two methods, of which the first one employs the gradients from the CNN itself to create the adversarial examples and is meant for model developers. However, end users only have access to the CNN model as a black box. Our second method is intended for end users and employs a surrogate model to estimate the gradients of the original CNN model, which are then used to create the adversarial examples. In our experiments, the first method achieved 99.67% success rate at finding the misclassification explanations and needed on average 1.96 queries per misclassified image to build the corresponding adversarial example. The second method achieved 73.08% success rate at finding the explanations with 8.73 queries per image on average.
  • Altres:

    Autor segons l'article: Haffar, Rami; Jebreel, Najeeb Moharram; Domingo-Ferrer, Josep; Sanchez, David
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Domingo Ferrer, Josep / Haffar, Rami / Sánchez Ruenes, David
    Paraules clau: Image classification Explainability Deep learning Convolutional neural networks Adversarial examples image classification deep learning convolutional neural networks adversarial examples
    Resum: With the increasing use of convolutional neural networks (CNNs) for computer vision and other artificial intelligence tasks, the need arises to interpret their predictions. In this work, we tackle the problem of explaining CNN misclassification of images. We propose to construct adversarial examples that allow identifying the regions of the input images that had the largest impact on the CNN wrong predictions. More specifically, for each image that was incorrectly classified by the CNN, we implemented an inverted adversarial attack consisting on modifying the input image as little as possible so that it becomes correctly classified. The changes made to the image to fix classification errors explain the causes of misclassification and allow adjusting the model and the data set to obtain more accurate models. We present two methods, of which the first one employs the gradients from the CNN itself to create the adversarial examples and is meant for model developers. However, end users only have access to the CNN model as a black box. Our second method is intended for end users and employs a surrogate model to estimate the gradients of the original CNN model, which are then used to create the adversarial examples. In our experiments, the first method achieved 99.67% success rate at finding the misclassification explanations and needed on average 1.96 queries per misclassified image to build the corresponding adversarial example. The second method achieved 73.08% success rate at finding the explanations with 8.73 queries per image on average.
    Àrees temàtiques: Theoretical computer science Saúde coletiva Química Psicología Planejamento urbano e regional / demografia Odontología Medicina veterinaria Medicina iii Medicina ii Medicina i Materiais Matemática / probabilidade e estatística Linguística e literatura Interdisciplinar Geografía Geociências General o multidisciplinar General computer science Farmacia Ensino Engenharias iv Engenharias iii Engenharias ii Engenharias i Educação física Educação Direito Comunicació i informació Comunicação e informação Computer science, theory & methods Computer science, artificial intelligence Computer science (miscellaneous) Computer science (all) Ciências sociais aplicadas i Ciências biológicas iii Ciências biológicas ii Ciências biológicas i Ciências ambientais Ciências agrárias i Ciência da computação Biotecnología Biodiversidade Astronomia / física Artes Arquitetura, urbanismo e design Arquitetura e urbanismo Administração, ciências contábeis e turismo Administração pública e de empresas, ciências contábeis e turismo
    Adreça de correu electrònic de l'autor: rami.haffar@urv.cat rami.haffar@urv.cat david.sanchez@urv.cat josep.domingo@urv.cat
    Identificador de l'autor: 0000-0001-7275-7887 0000-0001-7213-4962
    Data d'alta del registre: 2024-10-12
    Versió de l'article dipositat: info:eu-repo/semantics/submittedVersion
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Referència a l'article segons font original: Lecture Notes In Computer Science. 12898 LNAI 323-334
    Referència de l'ítem segons les normes APA: Haffar, Rami; Jebreel, Najeeb Moharram; Domingo-Ferrer, Josep; Sanchez, David (2021). Explaining Image Misclassification in Deep Learning via Adversarial Examples. : Springer Science and Business Media Deutschland GmbH
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2021
    Tipus de publicació: Proceedings Paper
  • Paraules clau:

    Computer Science (Miscellaneous),Computer Science, Artificial Intelligence,Computer Science, Theory & Methods,Theoretical Computer Science
    Image classification
    Explainability
    Deep learning
    Convolutional neural networks
    Adversarial examples
    image classification
    deep learning
    convolutional neural networks
    adversarial examples
    Theoretical computer science
    Saúde coletiva
    Química
    Psicología
    Planejamento urbano e regional / demografia
    Odontología
    Medicina veterinaria
    Medicina iii
    Medicina ii
    Medicina i
    Materiais
    Matemática / probabilidade e estatística
    Linguística e literatura
    Interdisciplinar
    Geografía
    Geociências
    General o multidisciplinar
    General computer science
    Farmacia
    Ensino
    Engenharias iv
    Engenharias iii
    Engenharias ii
    Engenharias i
    Educação física
    Educação
    Direito
    Comunicació i informació
    Comunicação e informação
    Computer science, theory & methods
    Computer science, artificial intelligence
    Computer science (miscellaneous)
    Computer science (all)
    Ciências sociais aplicadas i
    Ciências biológicas iii
    Ciências biológicas ii
    Ciências biológicas i
    Ciências ambientais
    Ciências agrárias i
    Ciência da computação
    Biotecnología
    Biodiversidade
    Astronomia / física
    Artes
    Arquitetura, urbanismo e design
    Arquitetura e urbanismo
    Administração, ciências contábeis e turismo
    Administração pública e de empresas, ciências contábeis e turismo
  • Documents:

  • Cerca a google

    Search to google scholar