Autor segons l'article: Manzanares-Salor, Benet; Sanchez, David
Departament: Enginyeria Informàtica i Matemàtiques
Autor/s de la URV: Sánchez Ruenes, David
Paraules clau: Evaluation; Information-content; Large language models; Privacy; Record linkage; Semantic similarity; Text anonymization; Utility
Resum: Large Language Models (LLMs) have gained prominence for their remarkable proficiency across various natural language processing tasks. Recent studies have suggested their potential to outperform current text anonymization methods, although an objective evaluation is needed to validate these claims. To address this issue, this work introduces a comprehensive evaluation framework that automatically assesses both privacy protection and utility preservation without relying on manually curated ground-truth data. Moreover, we conduct an in-depth analysis of the LLM-based text anonymization methods proposed so far. Building on the strengths and limitations we found, we propose a novel method to enhance anonymization quality. We also report extensive experimental comparisons between LLM-based approaches and a variety of previous techniques, including those based on named entity recognition (NER), and those more oriented towards privacy-preserving data publishing (PPDP). The results show that LLM-based approaches effectively outperform traditional methods in terms of privacy and utility. Furthermore, we benchmark against manual anonymization, which performed poorly, thus highlighting the limitations of using them as evaluation ground truth. Notably, our LLM-based method stood out by achieving the best privacy protection, and the best privacy-utility trade-off.
Àrees temàtiques: Administração pública e de empresas, ciências contábeis e turismo; Administração, ciências contábeis e turismo; Arquitetura, urbanismo e design; Artificial intelligence; Astronomia / física; Biodiversidade; Biotecnología; Ciência da computação; Ciências agrárias i; Ciências ambientais; Ciências biológicas i; Ciências biológicas ii; Ciências biológicas iii; Ciências sociais aplicadas i; Ciencias sociales; Computer science applications; Computer science, artificial intelligence; Direito; Economia; Educação; Enfermagem; Engenharias i; Engenharias ii; Engenharias iii; Engenharias iv; Engineering (all); Engineering (miscellaneous); Engineering, electrical & electronic; Farmacia; General engineering; Geociências; Interdisciplinar; Matemática / probabilidade e estatística; Materiais; Medicina i; Medicina ii; Medicina iii; Operations research & management science; Química
Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
Adreça de correu electrònic de l'autor: david.sanchez@urv.cat
Data d'alta del registre: 2025-09-27
Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
Enllaç font original: https://www.sciencedirect.com/science/article/pii/S0957417425030908?via%3Dihub
Referència a l'article segons font original: Expert Systems With Applications. 297 129474-
Referència de l'ítem segons les normes APA: Manzanares-Salor, Benet; Sanchez, David (2026). A comparative analysis, enhancement and evaluation of text anonymization with pre-trained Large Language Models. Expert Systems With Applications, 297(), 129474-. DOI: 10.1016/j.eswa.2025.129474
URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
DOI de l'article: 10.1016/j.eswa.2025.129474
Entitat: Universitat Rovira i Virgili
Any de publicació de la revista: 2026
Tipus de publicació: Journal Publications