Can Small Large Language Models Estimate the Hedonic Valence of Words?

Título: Can Small Large Language Models Estimate the Hedonic Valence of Words?

Autores: Paulo Cesar Vaz de Barros, Gabriel Casulari Motta Ribeiro, Frederico Caetano Jandre

Resumo: In recent years, small Large Language Models (sLLMs) have increasingly been recognized for their utility due to their cost-effective performance in generating high-quality responses without relying on cloud APIs that may pose privacy concerns. This study conducts a comparative ablation analysis to evaluate the hedonic valence rating capabilities of three distinct models: Llama3.2, Phi-4, and nomic-embed-text-v1.5. The investigation involves assigning valence ratings on a 9-point scale to 140 words sampled from an extensive human-rated dataset, a subset of which was used in a previous study. The chatbot models, Llama3.2 and Phi-4, were employed via ollama using prompts specifically engineered to solicit emojis and numerical valence ratings. Nomic embeddings were used in a linear regression between the word embeddings and the human ratings. Statistical analysis revealed significant correlations between the models’ outputs and human ratings (p-value ≤ 0.001). Despite limitations, these results underscore the potential of sLLMs and embedding models in enhancing sentiment analysis. This study shows how sLLMs can effectively approximate word valence and may help in linguistic research.

Palavras-chave: Small Large Language Models; Hedonic Valence Rating; Sentiment Analysis; Word Embeddings; Linear Regression; Chatbot Models; Model Evaluation and Performance; Natural Language Processing; Llama3.2; Phi-4; nomic-embed-text-v1.5; Machine Learning in Linguistics.

Páginas: 6

Código DOI: 10.21528/CBIC2025-1175901

Artigo em PDF: CBIC_2025_paper1175901.pdf

Arquivo BibTeX:
CBIC_2025_1175901.bib