Evaluating the Language Abilities of Large Language Models vs. Humans: Three Caveats

Leivada, Evelina; Dentella, Vittoria; Guenther, Fritz

Dades identificatives

Identificador: imarina:9369657

Handle: https://hdl.handle.net/20.500.11797/imarina9369657

Autors:
Leivada, EvelinaDentella, VittoriaGuenther, Fritz

Resum:
We identify and analyze three caveats that may arise when analyzing the linguistic abilities of Large Language Models. The problem of unlicensed generalizations refers to the danger of interpreting performance in one task as predictive of the models' overall capabilities, based on the assumption that because a specific task performance is indicative of certain underlying capabilities in humans, the same association holds for models. The human-like paradox refers to the problem of lacking human comparisons, while at the same time attributing human-like abilities to the models. Last, the problem of double standards refers to the use of tasks and methodologies that either cannot be applied to humans or they are evaluated differently in models vs. humans. While we recognize the impressive linguistic abilities of LLMs, we conclude that specific claims about the
Altres:

Autor segons l'article: Leivada, Evelina; Dentella, Vittoria; Guenther, Fritz
Departament: Estudis Anglesos i Alemanys
Autor/s de la URV: Dentella, Vittoria
Paraules clau: Probabilities Probabilitie Large language models Grammaticality Artificial intelligence
Resum: We identify and analyze three caveats that may arise when analyzing the linguistic abilities of Large Language Models. The problem of unlicensed generalizations refers to the danger of interpreting performance in one task as predictive of the models' overall capabilities, based on the assumption that because a specific task performance is indicative of certain underlying capabilities in humans, the same association holds for models. The human-like paradox refers to the problem of lacking human comparisons, while at the same time attributing human-like abilities to the models. Last, the problem of double standards refers to the use of tasks and methodologies that either cannot be applied to humans or they are evaluated differently in models vs. humans. While we recognize the impressive linguistic abilities of LLMs, we conclude that specific claims about the
Àrees temàtiques: Linguistics and language Linguistics Letras / linguística Language and linguistics Language & linguistics Interdisciplinary research in the social sciences Interdisciplinary research in the humanities Experimental and cognitive psychology Ciencias sociales Ciencias humanas
Accès a la llicència d'ús: https://creativecommons.org/licenses/by/3.0/es/
Adreça de correu electrònic de l'autor: vittoria.dentella@estudiants.urv.cat
Identificador de l'autor: 0000-0001-6697-9184
Data d'alta del registre: 2025-02-18
Versió de l'article dipositat: info:eu-repo/semantics/publishedVersion
Referència a l'article segons font original: Biolinguistics. 18 e14391-
Referència de l'ítem segons les normes APA: Leivada, Evelina; Dentella, Vittoria; Guenther, Fritz (2024). Evaluating the Language Abilities of Large Language Models vs. Humans: Three Caveats. Biolinguistics, 18(), e14391-. DOI: 10.5964/bioling.14391
URL Document de llicència: https://repositori.urv.cat/ca/proteccio-de-dades/
Entitat: Universitat Rovira i Virgili
Any de publicació de la revista: 2024
Tipus de publicació: Journal Publications

Paraules clau:

Experimental and Cognitive Psychology,Language & Linguistics,Linguistics and Language
Probabilities
Probabilitie
Large language models
Grammaticality
Artificial intelligence
Linguistics and language
Linguistics
Letras / linguística
Language and linguistics
Language & linguistics
Interdisciplinary research in the social sciences
Interdisciplinary research in the humanities
Experimental and cognitive psychology
Ciencias sociales
Ciencias humanas
Documents:

DocumentPrincipal
Cerca a google

Repositori URV

Articles producció científica> Estudis Anglesos i Alemanys

Evaluating the Language Abilities of Large Language Models vs. Humans: Three Caveats

Dades identificatives

Altres:

Paraules clau:

Documents:

Cerca a google