Explaining Image Misclassification in Deep Learning via Adversarial Examples

Haffar, Rami; Jebreel, Najeeb Moharram; Domingo-Ferrer, Josep; Sanchez, David

Dades identificatives

Identificador: imarina:9229338

Handle: https://hdl.handle.net/20.500.11797/imarina9229338

Autors:
Haffar, RamiJebreel, Najeeb MoharramDomingo-Ferrer, JosepSanchez, David

Resum:
With the increasing use of convolutional neural networks (CNNs) for computer vision and other artificial intelligence tasks, the need arises to interpret their predictions. In this work, we tackle the problem of explaining CNN misclassification of images. We propose to construct adversarial examples that allow identifying the regions of the input images that had the largest impact on the CNN wrong predictions. More specifically, for each image that was incorrectly classified by the CNN, we implemented an inverted adversarial attack consisting on modifying the input image as little as possible so that it becomes correctly classified. The changes made to the image to fix classification errors explain the causes of misclassification and allow adjusting the model and the data set to obtain more accurate models. We present two methods, of which the first one employs the gradients from the CNN itself to create the adversarial examples and is meant for model developers. However, end users only have access to the CNN model as a black box. Our second method is intended for end users and employs a surrogate model to estimate the gradients of the original CNN model, which are then used to create the adversarial examples. In our experiments, the first method achieved 99.67% success rate at finding the misclassification explanations and needed on average 1.96 queries per misclassified image to build the corresponding adversarial example. The second method achieved 73.08% success rate at finding the explanations with 8.73 queries per image on average.
Altres:

Autor segons l'article: Haffar, Rami; Jebreel, Najeeb Moharram; Domingo-Ferrer, Josep; Sanchez, David
Departament: Enginyeria Informàtica i Matemàtiques
Autor/s de la URV: Domingo Ferrer, Josep / Haffar, Rami / Sánchez Ruenes, David
Paraules clau: Image classification Explainability Deep learning Convolutional neural networks Adversarial examples image classification deep learning convolutional neural networks adversarial examples
Resum: With the increasing use of convolutional neural networks (CNNs) for computer vision and other artificial intelligence tasks, the need arises to interpret their predictions. In this work, we tackle the problem of explaining CNN misclassification of images. We propose to construct adversarial examples that allow identifying the regions of the input images that had the largest impact on the CNN wrong predictions. More specifically, for each image that was incorrectly classified by the CNN, we implemented an inverted adversarial attack consisting on modifying the input image as little as possible so that it becomes correctly classified. The changes made to the image to fix classification errors explain the causes of misclassification and allow adjusting the model and the data set to obtain more accurate models. We present two methods, of which the first one employs the gradients from the CNN itself to create the adversarial examples and is meant for model developers. However, end users only have access to the CNN model as a black box. Our second method is intended for end users and employs a surrogate model to estimate the gradients of the original CNN model, which are then used to create the adversarial examples. In our experiments, the first method achieved 99.67% success rate at finding the misclassification explanations and needed on average 1.96 queries per misclassified image to build the corresponding adversarial example. The second method achieved 73.08% success rate at finding the explanations with 8.73 queries per image on average.
Àrees temàtiques: Theoretical computer science Saúde coletiva Química Psicología Planejamento urbano e regional / demografia Odontología Medicina veterinaria Medicina iii Medicina ii Medicina i Materiais Matemática / probabilidade e estatística Linguística e literatura Interdisciplinar Geografía Geociências General o multidisciplinar General computer science Farmacia Ensino Engenharias iv Engenharias iii Engenharias ii Engenharias i Educação física Educação Direito Comunicació i informació Comunicação e informação Computer science, theory & methods Computer science, artificial intelligence Computer science (miscellaneous) Computer science (all) Ciências sociais aplicadas i Ciências biológicas iii Ciências biológicas ii Ciências biológicas i Ciências ambientais Ciências agrárias i Ciência da computação Biotecnología Biodiversidade Astronomia / física Artes Arquitetura, urbanismo e design Arquitetura e urbanismo Administração, ciências contábeis e turismo Administração pública e de empresas, ciências contábeis e turismo
Adreça de correu electrònic de l'autor: rami.haffar@urv.cat rami.haffar@urv.cat david.sanchez@urv.cat josep.domingo@urv.cat
Identificador de l'autor: 0000-0001-7275-7887 0000-0001-7213-4962
Data d'alta del registre: 2024-10-12
Versió de l'article dipositat: info:eu-repo/semantics/submittedVersion
URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
Referència a l'article segons font original: Lecture Notes In Computer Science. 12898 LNAI 323-334
Referència de l'ítem segons les normes APA: Haffar, Rami; Jebreel, Najeeb Moharram; Domingo-Ferrer, Josep; Sanchez, David (2021). Explaining Image Misclassification in Deep Learning via Adversarial Examples. : Springer Science and Business Media Deutschland GmbH
Entitat: Universitat Rovira i Virgili
Any de publicació de la revista: 2021
Tipus de publicació: Proceedings Paper

Paraules clau:

Computer Science (Miscellaneous),Computer Science, Artificial Intelligence,Computer Science, Theory & Methods,Theoretical Computer Science
Image classification
Explainability
Deep learning
Convolutional neural networks
Adversarial examples
image classification
deep learning
convolutional neural networks
adversarial examples
Theoretical computer science
Saúde coletiva
Química
Psicología
Planejamento urbano e regional / demografia
Odontología
Medicina veterinaria
Medicina iii
Medicina ii
Medicina i
Materiais
Matemática / probabilidade e estatística
Linguística e literatura
Interdisciplinar
Geografía
Geociências
General o multidisciplinar
General computer science
Farmacia
Ensino
Engenharias iv
Engenharias iii
Engenharias ii
Engenharias i
Educação física
Educação
Direito
Comunicació i informació
Comunicação e informação
Computer science, theory & methods
Computer science, artificial intelligence
Computer science (miscellaneous)
Computer science (all)
Ciências sociais aplicadas i
Ciências biológicas iii
Ciências biológicas ii
Ciências biológicas i
Ciências ambientais
Ciências agrárias i
Ciência da computação
Biotecnología
Biodiversidade
Astronomia / física
Artes
Arquitetura, urbanismo e design
Arquitetura e urbanismo
Administração, ciências contábeis e turismo
Administração pública e de empresas, ciências contábeis e turismo
Documents:

DocumentPrincipal
Cerca a google

Repositori URV

Articles producció científica> Enginyeria Informàtica i Matemàtiques

Explaining Image Misclassification in Deep Learning via Adversarial Examples

Dades identificatives

Altres:

Paraules clau:

Documents:

Cerca a google