Articles producció científicaEnginyeria Informàtica i Matemàtiques

A comparative analysis, enhancement and evaluation of text anonymization with pre-trained Large Language Models

  • Dades identificatives

    Identificador:  imarina:9465606
    Autors:  Manzanares-Salor, Benet; Sanchez, David
    Resum:
    Large Language Models (LLMs) have gained prominence for their remarkable proficiency across various natural language processing tasks. Recent studies have suggested their potential to outperform current text anonymization methods, although an objective evaluation is needed to validate these claims. To address this issue, this work introduces a comprehensive evaluation framework that automatically assesses both privacy protection and utility preservation without relying on manually curated ground-truth data. Moreover, we conduct an in-depth analysis of the LLM-based text anonymization methods proposed so far. Building on the strengths and limitations we found, we propose a novel method to enhance anonymization quality. We also report extensive experimental comparisons between LLM-based approaches and a variety of previous techniques, including those based on named entity recognition (NER), and those more oriented towards privacy-preserving data publishing (PPDP). The results show that LLM-based approaches effectively outperform traditional methods in terms of privacy and utility. Furthermore, we benchmark against manual anonymization, which performed poorly, thus highlighting the limitations of using them as evaluation ground truth. Notably, our LLM-based method stood out by achieving the best privacy protection, and the best privacy-utility trade-off.
  • Altres:

    Autor segons l'article: Manzanares-Salor, Benet; Sanchez, David
    Departament: Enginyeria Informàtica i Matemàtiques
    Autor/s de la URV: Sánchez Ruenes, David
    Paraules clau: Evaluation; Information-content; Large language models; Privacy; Record linkage; Semantic similarity; Text anonymization; Utility
    Resum: Large Language Models (LLMs) have gained prominence for their remarkable proficiency across various natural language processing tasks. Recent studies have suggested their potential to outperform current text anonymization methods, although an objective evaluation is needed to validate these claims. To address this issue, this work introduces a comprehensive evaluation framework that automatically assesses both privacy protection and utility preservation without relying on manually curated ground-truth data. Moreover, we conduct an in-depth analysis of the LLM-based text anonymization methods proposed so far. Building on the strengths and limitations we found, we propose a novel method to enhance anonymization quality. We also report extensive experimental comparisons between LLM-based approaches and a variety of previous techniques, including those based on named entity recognition (NER), and those more oriented towards privacy-preserving data publishing (PPDP). The results show that LLM-based approaches effectively outperform traditional methods in terms of privacy and utility. Furthermore, we benchmark against manual anonymization, which performed poorly, thus highlighting the limitations of using them as evaluation ground truth. Notably, our LLM-based method stood out by achieving the best privacy protection, and the best privacy-utility trade-off.
    Àrees temàtiques: Administração pública e de empresas, ciências contábeis e turismo; Administração, ciências contábeis e turismo; Arquitetura, urbanismo e design; Artificial intelligence; Astronomia / física; Biodiversidade; Biotecnología; Ciência da computação; Ciências agrárias i; Ciências ambientais; Ciências biológicas i; Ciências biológicas ii; Ciências biológicas iii; Ciências sociais aplicadas i; Ciencias sociales; Computer science applications; Computer science, artificial intelligence; Direito; Economia; Educação; Enfermagem; Engenharias i; Engenharias ii; Engenharias iii; Engenharias iv; Engineering (all); Engineering (miscellaneous); Engineering, electrical & electronic; Farmacia; General engineering; Geociências; Interdisciplinar; Matemática / probabilidade e estatística; Materiais; Medicina i; Medicina ii; Medicina iii; Operations research & management science; Química
    Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
    Adreça de correu electrònic de l'autor: david.sanchez@urv.cat
    Data d'alta del registre: 2025-09-27
    Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
    Enllaç font original: https://www.sciencedirect.com/science/article/pii/S0957417425030908?via%3Dihub
    Referència a l'article segons font original: Expert Systems With Applications. 297 129474-
    Referència de l'ítem segons les normes APA: Manzanares-Salor, Benet; Sanchez, David (2026). A comparative analysis, enhancement and evaluation of text anonymization with pre-trained Large Language Models. Expert Systems With Applications, 297(), 129474-. DOI: 10.1016/j.eswa.2025.129474
    URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
    DOI de l'article: 10.1016/j.eswa.2025.129474
    Entitat: Universitat Rovira i Virgili
    Any de publicació de la revista: 2026
    Tipus de publicació: Journal Publications
  • Paraules clau:

    Artificial Intelligence,Computer Science Applications,Computer Science, Artificial Intelligence,Engineering (Miscellaneous),Engineering, Electrical & Electronic,Operations Research & Management Science
    Evaluation
    Information-content
    Large language models
    Privacy
    Record linkage
    Semantic similarity
    Text anonymization
    Utility
    Administração pública e de empresas, ciências contábeis e turismo
    Administração, ciências contábeis e turismo
    Arquitetura, urbanismo e design
    Artificial intelligence
    Astronomia / física
    Biodiversidade
    Biotecnología
    Ciência da computação
    Ciências agrárias i
    Ciências ambientais
    Ciências biológicas i
    Ciências biológicas ii
    Ciências biológicas iii
    Ciências sociais aplicadas i
    Ciencias sociales
    Computer science applications
    Computer science, artificial intelligence
    Direito
    Economia
    Educação
    Enfermagem
    Engenharias i
    Engenharias ii
    Engenharias iii
    Engenharias iv
    Engineering (all)
    Engineering (miscellaneous)
    Engineering, electrical & electronic
    Farmacia
    General engineering
    Geociências
    Interdisciplinar
    Matemática / probabilidade e estatística
    Materiais
    Medicina i
    Medicina ii
    Medicina iii
    Operations research & management science
    Química
  • Documents:

  • Cerca a google

    Search to google scholar