Digital forgetting in large language models: a survey of unlearning methods

Blanco-Justicia, A; Jebreel, N; Manzanares-Salor, B; Sánchez, D; Domingo-Ferrer, J; Collell, G; Tan, KE

doi:10.1007/s10462-024-11078-6

Datos identificativos

Identificador: imarina:9435540

Handle: https://hdl.handle.net/20.500.11797/imarina9435540

Autores: Blanco-Justicia, A; Jebreel, N; Manzanares-Salor, B; Sánchez, D; Domingo-Ferrer, J; Collell, G; Tan, KE

Resumen:
Large language models (LLMs) have become the state of the art in natural language processing. The massive adoption of generative LLMs and the capabilities they have shown have prompted public concerns regarding their impact on the labor market, privacy, the use of copyrighted work, and how these models align with human ethics and the rule of law. As a response, new regulations are being pushed, which require developers and service providers to evaluate, monitor, and forestall or at least mitigate the risks posed by their models. One mitigation strategy is digital forgetting: given a model with undesirable knowledge or behavior, the goal is to obtain a new model where the detected issues are no longer present. Digital forgetting is usually enforced via machine unlearning techniques, which modify trained machine learning models for them to behave as models trained on a subset of the original training data. In this work, we describe the motivations and desirable properties of digital forgetting when applied to LLMs, and we survey recent works on machine unlearning. Specifically, we propose a taxonomy of unlearning methods based on the reach and depth of the modifications done on the models, we discuss and compare the effectiveness of machine unlearning methods for LLMs proposed so far, and we survey their evaluation. Finally, we describe open problems of machine unlearning applied to LLMs and we put forward recommendations for developers and practitioners.
Otros:

Enlace a la fuente original: https://link.springer.com/article/10.1007/s10462-024-11078-6
Referencia de l'ítem segons les normes APA: Blanco-Justicia, A; Jebreel, N; Manzanares-Salor, B; Sánchez, D; Domingo-Ferrer, J; Collell, G; Tan, KE (2025). Digital forgetting in large language models: a survey of unlearning methods. ARTIFICIAL INTELLIGENCE REVIEW, 58(90), 90-. DOI: 10.1007/s10462-024-11078-6
Referencia al articulo segun fuente origial: ARTIFICIAL INTELLIGENCE REVIEW. 58 (90): 90-
DOI del artículo: 10.1007/s10462-024-11078-6
Año de publicación de la revista: 2025-01-13
Entidad: Universitat Rovira i Virgili
Versión del articulo depositado: info:eu-repo/semantics/publishedVersion
Fecha de alta del registro: 2026-05-09
Autor/es de la URV: Blanco Justicia, Alberto / Domingo Ferrer, Josep / Jebreel, Najeeb Moharram Salim / Manzanares Salor, Benet / Sánchez Ruenes, David
Departamento: Enginyeria Informàtica i Matemàtiques
URL Documento de licencia: https://repositori.urv.cat/ca/proteccio-de-dades/
Tipo de publicación: Journal Publications
Autor según el artículo: Blanco-Justicia, A; Jebreel, N; Manzanares-Salor, B; Sánchez, D; Domingo-Ferrer, J; Collell, G; Tan, KE
Acceso a la licencia de uso: https://creativecommons.org/licenses/by/3.0/es/
Áreas temáticas: Psychology, Psicología, Linguistics and language, Linguistics, Language and linguistics, Filologia, lingüística i sociolingüística, Filología lingüística y sociolingüística, Computer science, artificial intelligence, Ciencias sociales, Ciencias humanas, Ciência da computação, Artificial intelligence
Direcció de correo del autor: najeeb.jebreel@urv.cat, alberto.blanco@urv.cat, alberto.blanco@urv.cat, benet.manzanares@urv.cat, benet.manzanares@urv.cat, najeeb.jebreel@urv.cat, najeeb.jebreel@urv.cat, david.sanchez@urv.cat, david.sanchez@urv.cat, alberto.blanco@urv.cat, josep.domingo@urv.cat, josep.domingo@urv.cat, josep.domingo@urv.cat, josep.domingo@urv.cat

Palabras clave:

Trustworthy ai
Trustworthy a
Privacy
Machine unlearning
Large language models
Copyright
Artificial Intelligence
Computer Science
Language and Linguistics
Linguistics and Language
Psychology
Psicología
Linguistics
Filologia
lingüística i sociolingüística
Filología lingüística y sociolingüística
Ciencias sociales
Ciencias humanas
Ciência da computação
Documentos:

DocumentPrincipal
Cerca a google

Digital forgetting in large language models: a survey of unlearning methods

Datos identificativos

Otros:

Palabras clave:

Documentos:

Cerca a google