Articles producció científicaEnginyeria Informàtica i Matemàtiques

A comparative analysis, enhancement and evaluation of text anonymization with pre-trained Large Language Models

  • Dades identificatives

    Identificador:  imarina:9465606
    Autors:  Manzanares-Salor, Benet; Sanchez, David
    Resum:
    Large Language Models (LLMs) have gained prominence for their remarkable proficiency across various natural language processing tasks. Recent studies have suggested their potential to outperform current text anonymization methods, although an objective evaluation is needed to validate these claims. To address this issue, this work introduces a comprehensive evaluation framework that automatically assesses both privacy protection and utility preservation without relying on manually curated ground-truth data. Moreover, we conduct an in-depth analysis of the LLM-based text anonymization methods proposed so far. Building on the strengths and limitations we found, we propose a novel method to enhance anonymization quality. We also report extensive experimental comparisons between LLM-based approaches and a variety of previous techniques, including those based on named entity recognition (NER), and those more oriented towards privacy-preserving data publishing (PPDP). The results show that LLM-based approaches effectively outperform traditional methods in terms of privacy and utility. Furthermore, we benchmark against manual anonymization, which performed poorly, thus highlighting the limitations of using them as evaluation ground truth. Notably, our LLM-based method stood out by achieving the best privacy protection, and the best privacy-utility trade-off.
  • Altres:

    Enllaç font original: https://www.sciencedirect.com/science/article/pii/S0957417425030908?via%3Dihub
    Referència de l'ítem segons les normes APA: Manzanares-Salor, Benet; Sanchez, David (2026). A comparative analysis, enhancement and evaluation of text anonymization with pre-trained Large Language Models. Expert Systems With Applications, 297(), 129474-. DOI: 10.1016/j.eswa.2025.129474
    Referència a l'article segons font original: Expert Systems With Applications. 297 129474-
    DOI de l'article: 10.1016/j.eswa.2025.129474
    Any de publicació de la revista: 2026
    Entitat: Universitat Rovira i Virgili
    Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
    Data d'alta del registre: 2025-09-27
    Autor/s de la URV: Sánchez Ruenes, David
    Departament: Enginyeria Informàtica i Matemàtiques
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    Tipus de publicació: Journal Publications
    Autor segons l'article: Manzanares-Salor, Benet; Sanchez, David
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Àrees temàtiques: Administração pública e de empresas, ciências contábeis e turismo, Administração, ciências contábeis e turismo, Arquitetura, urbanismo e design, Artificial intelligence, Astronomia / física, Biodiversidade, Biotecnología, Ciência da computação, Ciências agrárias i, Ciências ambientais, Ciências biológicas i, Ciências biológicas ii, Ciências biológicas iii, Ciências sociais aplicadas i, Ciencias sociales, Computer science applications, Computer science, artificial intelligence, Direito, Economia, Educação, Enfermagem, Engenharias i, Engenharias ii, Engenharias iii, Engenharias iv, Engineering (all), Engineering (miscellaneous), Engineering, electrical & electronic, Farmacia, General engineering, Geociências, Interdisciplinar, Matemática / probabilidade e estatística, Materiais, Medicina i, Medicina ii, Medicina iii, Operations research & management science, Química
    Adreça de correu electrònic de l'autor: david.sanchez@urv.cat
  • Paraules clau:

    Evaluation
    Information-content
    Large language models
    Privacy
    Record linkage
    Semantic similarity
    Text anonymization
    Utility
    Artificial Intelligence
    Computer Science Applications
    Computer Science
    Engineering (Miscellaneous)
    Engineering
    Electrical & Electronic
    Operations Research & Management Science
    Administração pública e de empresas
    ciências contábeis e turismo
    Administração
    Arquitetura
    urbanismo e design
    Astronomia / física
    Biodiversidade
    Biotecnología
    Ciência da computação
    Ciências agrárias i
    Ciências ambientais
    Ciências biológicas i
    Ciências biológicas ii
    Ciências biológicas iii
    Ciências sociais aplicadas i
    Ciencias sociales
    Direito
    Economia
    Educação
    Enfermagem
    Engenharias i
    Engenharias ii
    Engenharias iii
    Engenharias iv
    Engineering (all)
    Farmacia
    General engineering
    Geociências
    Interdisciplinar
    Matemática / probabilidade e estatística
    Materiais
    Medicina i
    Medicina ii
    Medicina iii
    Química
  • Documents:

  • Cerca a google

    Search to google scholar