Tesis doctoralsDepartament d'Enginyeria Informàtica i Matemàtiques

Contributions to Explainability and Attack Detection in Deep Learning

  • Datos identificativos

    Identificador:  TDX:4292
    Autores:  Haffar, Rami
    Resumen:
    Artificial intelligence (AI) is used for various purposes that are critical to human life. However, most state-of-the-art AI algorithms, and in particular deep-learning (DL) models, are black-box, meaning humans cannot understand how such models make decisions. To forestall an algorithm-based authoritarian society, decisions based on machine learning ought to inspire trust by being \emph{explainable}. For AI explainability to be practical, it must be feasible to obtain explanations systematically and automatically. There are two main methodologies to generate explanations. Explanation methods using internal components of DL models (a.k.a. model-specific explanations) are more accurate and effective than those relying solely on the inputs and outputs (a.k.a. model-agnostic explanations). However, the users of the black-box model lack white-box access to the internal components of the providers' models.Nevertheless, the only way for users to trust predictions and for these to align with ethical regulations is for predictions to be accompanied by explanations locally and independently generated by the users (rather than by explanations offered by the model providers). Furthermore, those models can be vulnerable to various security and privacy attacks targeting their training. In this thesis, we leverage both model-specific and model-agnostic explainability techniques. First, we propose a model-agnostic explainability method using random decision forests as surrogates. The surrogate model can explain the predictions of the black-box models in both centralized and decentralized settings. In addition, it uses those explanations to protect the models from attacks that might target them. We also propose a model-specific explainability method that uses the gradients of the model to generate adversarial examples that counterfactually explain why an input example is classified into a specific class. We also generalize this method so that external users can use it by training a local surrogate model that mimics the black-box model's behavior and using the surrogate gradients to generate the adversarial examples. Extensive experimental results show that our methods outperform the state-of-the-art techniques by providing more representative explanations and model protections while requiring a low computational cost.
  • Otros:

    Editor: Universitat Rovira i Virgili
    Fecha: 2024-01-30T01:00:00Z, 2023-11-21, 2024-01-29T12:14:21Z
    Identificador: http://hdl.handle.net/10803/689901
    Departamento/Instituto: Departament d'Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili.
    Idioma: eng
    Autor: Haffar, Rami
    Director: Sánchez Ruenes, David, Domingo Ferrer, Josep
    Fuente: TDX (Tesis Doctorals en Xarxa)
    Formato: application/pdf, 179 p.
  • Palabras clave:

    Attack detection
    Artificial intelligence
    Explainability
    Detección de ataques
    Inteligencia artificial
    Explicabilidad
    Detecció d'atacs
    Intel · ligència artificial
    Explicabilitat
    Ciències
  • Documentos:

  • Cerca a google

    Search to google scholar