Treballs Fi de MàsterEnginyeria Informàtica i Matemàtiques

From traditional to large language models: a novel nlp-based model for sentiment analysis in social media

Identification data

Identifier: TFM:2338

Handle: https://hdl.handle.net/20.500.11797/TFM2338

Authors: Arias Cámara, Daniel

Abstract:
This thesis addresses the need for Catalan sentiment analysis tools by creating a new 23,000-sample balanced corpus and developing a specialized classification model, the CSXSC. This fine-tuned, 125M-parameter, encoder-only RoBERTa model was benchmarked against two 7-billion-parameter decoder-only models trained efficiently using QLoRA. The results demonstrate the superiority of the specialized model, which achieved a final test set accuracy of 83.69\% while being over 24 times more computationally efficient. This study concludes that for this discriminative task, a smaller, architecturally appropriate model provides a more accurate and practical solution than larger, general-purpose LLMs.
Others:

Entity: Universitat Rovira i Virgili (URV)
Confidenciality: No
Education area(s): Enginyeria de la Seguretat Informàtica i Intel·ligència Artificial
APS: No
Subject: Xarxes socials
Academic year: 2024-2025
Work's public defense date: 2025-06-12
Student: Arias Cámara, Daniel
Department: Enginyeria Informàtica i Matemàtiques
Creation date in repository: 2026-03-13
TFM credits: 9
Access Rights: info:eu-repo/semantics/openAccess
Project director: Pascual Fontanilles, Jordi

Keywords:

Large Language Models
Sentiment Analysis
Social Media
Computer engineering
Documents:

Memòria
Cerca a google