Contributions to Explainability and Attack Detection  in Deep Learning

Haffar, Rami

Datos identificativos

Identificador: TDX:4292

Handle: https://hdl.handle.net/20.500.11797/TDX4292

Autores: Haffar, Rami

Resumen:
Artificial intelligence (AI) is used for various purposes that are critical to human life. However, most state-of-the-art AI algorithms, and in particular deep-learning (DL) models, are black-box, meaning humans cannot understand how such models make decisions. To forestall an algorithm-based authoritarian society, decisions based on machine learning ought to inspire trust by being \emph{explainable}. For AI explainability to be practical, it must be feasible to obtain explanations systematically and automatically. There are two main methodologies to generate explanations. Explanation methods using internal components of DL models (a.k.a. model-specific explanations) are more accurate and effective than those relying solely on the inputs and outputs (a.k.a. model-agnostic explanations). However, the users of the black-box model lack white-box access to the internal components of the providers' models.Nevertheless, the only way for users to trust predictions and for these to align with ethical regulations is for predictions to be accompanied by explanations locally and independently generated by the users (rather than by explanations offered by the model providers). Furthermore, those models can be vulnerable to various security and privacy attacks targeting their training. In this thesis, we leverage both model-specific and model-agnostic explainability techniques. First, we propose a model-agnostic explainability method using random decision forests as surrogates. The surrogate model can explain the predictions of the black-box models in both centralized and decentralized settings. In addition, it uses those explanations to protect the models from attacks that might target them. We also propose a model-specific explainability method that uses the gradients of the model to generate adversarial examples that counterfactually explain why an input example is classified into a specific class. We also generalize this method so that external users can use it by training a local surrogate model that mimics the black-box model's behavior and using the surrogate gradients to generate the adversarial examples. Extensive experimental results show that our methods outperform the state-of-the-art techniques by providing more representative explanations and model protections while requiring a low computational cost.
Otros:

Editor: Universitat Rovira i Virgili
Fecha: 2024-01-30T01:00:00Z, 2023-11-21, 2024-01-29T12:14:21Z
Identificador: http://hdl.handle.net/10803/689901
Departamento/Instituto: Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili.
Idioma: eng
Autor: Haffar, Rami
Director: Sánchez Ruenes, David, Domingo Ferrer, Josep
Fuente: TDX (Tesis Doctorals en Xarxa)
Formato: application/pdf, 179 p.

Palabras clave:

Attack detection
Artificial intelligence
Explainability
Detección de ataques
Inteligencia artificial
Explicabilidad
Detecció d'atacs
Intel · ligència artificial
Explicabilitat
Ciències
Documentos:

Memoria
Cerca a google

Contributions to Explainability and Attack Detection in Deep Learning

Datos identificativos

Otros:

Palabras clave:

Documentos:

Cerca a google