Articles producció científicaEnginyeria Informàtica i Matemàtiques

A comparative analysis, enhancement and evaluation of text anonymization with pre-trained Large Language Models

  • Datos identificativos

    Identificador:  imarina:9465606
    Autores:  Manzanares-Salor, Benet; Sanchez, David
    Resumen:
    Large Language Models (LLMs) have gained prominence for their remarkable proficiency across various natural language processing tasks. Recent studies have suggested their potential to outperform current text anonymization methods, although an objective evaluation is needed to validate these claims. To address this issue, this work introduces a comprehensive evaluation framework that automatically assesses both privacy protection and utility preservation without relying on manually curated ground-truth data. Moreover, we conduct an in-depth analysis of the LLM-based text anonymization methods proposed so far. Building on the strengths and limitations we found, we propose a novel method to enhance anonymization quality. We also report extensive experimental comparisons between LLM-based approaches and a variety of previous techniques, including those based on named entity recognition (NER), and those more oriented towards privacy-preserving data publishing (PPDP). The results show that LLM-based approaches effectively outperform traditional methods in terms of privacy and utility. Furthermore, we benchmark against manual anonymization, which performed poorly, thus highlighting the limitations of using them as evaluation ground truth. Notably, our LLM-based method stood out by achieving the best privacy protection, and the best privacy-utility trade-off.
  • Otros:

    Enlace a la fuente original: https://www.sciencedirect.com/science/article/pii/S0957417425030908?via%3Dihub
    Referencia de l'ítem segons les normes APA: Manzanares-Salor, Benet; Sanchez, David (2026). A comparative analysis, enhancement and evaluation of text anonymization with pre-trained Large Language Models. Expert Systems With Applications, 297(), 129474-. DOI: 10.1016/j.eswa.2025.129474
    Referencia al articulo segun fuente origial: Expert Systems With Applications. 297 129474-
    DOI del artículo: 10.1016/j.eswa.2025.129474
    Año de publicación de la revista: 2026
    Entidad: Universitat Rovira i Virgili
    Versión del articulo depositado: info:eu-repo/semantics/publishedVersion
    Fecha de alta del registro: 2025-09-27
    Autor/es de la URV: Sánchez Ruenes, David
    Departamento: Enginyeria Informàtica i Matemàtiques
    URL Documento de licencia: https://repositori.urv.cat/ca/proteccio-de-dades/
    Tipo de publicación: Journal Publications
    Autor según el artículo: Manzanares-Salor, Benet; Sanchez, David
    Acceso a la licencia de uso: https://creativecommons.org/licenses/by/3.0/es/
    Áreas temáticas: Administração pública e de empresas, ciências contábeis e turismo, Administração, ciências contábeis e turismo, Arquitetura, urbanismo e design, Artificial intelligence, Astronomia / física, Biodiversidade, Biotecnología, Ciência da computação, Ciências agrárias i, Ciências ambientais, Ciências biológicas i, Ciências biológicas ii, Ciências biológicas iii, Ciências sociais aplicadas i, Ciencias sociales, Computer science applications, Computer science, artificial intelligence, Direito, Economia, Educação, Enfermagem, Engenharias i, Engenharias ii, Engenharias iii, Engenharias iv, Engineering (all), Engineering (miscellaneous), Engineering, electrical & electronic, Farmacia, General engineering, Geociências, Interdisciplinar, Matemática / probabilidade e estatística, Materiais, Medicina i, Medicina ii, Medicina iii, Operations research & management science, Química
    Direcció de correo del autor: david.sanchez@urv.cat
  • Palabras clave:

    Evaluation
    Information-content
    Large language models
    Privacy
    Record linkage
    Semantic similarity
    Text anonymization
    Utility
    Artificial Intelligence
    Computer Science Applications
    Computer Science
    Engineering (Miscellaneous)
    Engineering
    Electrical & Electronic
    Operations Research & Management Science
    Administração pública e de empresas
    ciências contábeis e turismo
    Administração
    Arquitetura
    urbanismo e design
    Astronomia / física
    Biodiversidade
    Biotecnología
    Ciência da computação
    Ciências agrárias i
    Ciências ambientais
    Ciências biológicas i
    Ciências biológicas ii
    Ciências biológicas iii
    Ciências sociais aplicadas i
    Ciencias sociales
    Direito
    Economia
    Educação
    Enfermagem
    Engenharias i
    Engenharias ii
    Engenharias iii
    Engenharias iv
    Engineering (all)
    Farmacia
    General engineering
    Geociências
    Interdisciplinar
    Matemática / probabilidade e estatística
    Materiais
    Medicina i
    Medicina ii
    Medicina iii
    Química
  • Documentos:

  • Cerca a google

    Search to google scholar