Entity: Universitat Rovira i Virgili (URV)
Confidenciality: No
Education area(s): Enginyeria de la Seguretat Informàtica i Intel·ligència Artificial
APS: No
Title in different languages: From traditional to large language models: a novel nlp-based model for sentiment analysis in social media
Abstract: This thesis addresses the need for Catalan sentiment analysis tools by creating a new 23,000-sample balanced corpus and developing a specialized classification model, the CSXSC. This fine-tuned, 125M-parameter, encoder-only RoBERTa model was benchmarked against two 7-billion-parameter decoder-only models trained efficiently using QLoRA. The results demonstrate the superiority of the specialized model, which achieved a final test set accuracy of 83.69\% while being over 24 times more computationally efficient. This study concludes that for this discriminative task, a smaller, architecturally appropriate model provides a more accurate and practical solution than larger, general-purpose LLMs.
Subject: Xarxes socials
Academic year: 2024-2025
Language: en
Work's public defense date: 2025-06-12
Subject areas: Computer engineering
Student: Arias Cámara, Daniel
Department: Enginyeria Informàtica i Matemàtiques
Creation date in repository: 2026-03-13
TFM credits: 9
Keywords: Large Language Models, Sentiment Analysis, Social Media
Title in original language: From traditional to large language models: a novel nlp-based model for sentiment analysis in social media
Access Rights: info:eu-repo/semantics/openAccess
Project director: Pascual Fontanilles, Jordi