Título: Musical Genre Classification Using Mel-Spectrograms and Mel-Scalograms
Autores: Luiz Alberto Viana, Beatriz M. dos Santos, Antonio Carlos Lopes Fernandes Júnior, Eduardo Furtado de Simas Filho
Resumo: Musical genre is a category that groups songs with similar characteristics in terms of style, form, instrumentation, rhythm, harmony, lyrics, or cultural function. The task of Musical Genre Classification is extensively studied in the field of Music Information Retrieval (MIR), and several deep learning techniques have been explored to address it. In this work, we propose the study of mel-scalograms as an alternative representation for the task of music genre classification. We present a systematic comparison between mel-spectrograms and mel-scalograms by evaluating different CNN architectures: MobileNetV2, EfficientNetB0, ResNet50, and VGG16. The experiments were conducted using the GTZAN dataset, which is widely used in the literature, applying a complete training pipeline that includes data augmentation, transfer learning, fine-tuning, and 5-fold cross-validation. The results showed that mel-spectrograms had a slight advantage in average validation accuracy, while mel-scalograms presented a lower standard deviation across training runs. The best-performing models were EfficientNetB0 and ResNet50, achieving up to 86% accuracy. Our findings suggest that both representations are viable, with consistent results across different network architectures and evaluation folds.
Palavras-chave: Musical Genre Classification; Musical Information Retrieval; MIR; Wavelet; Scalogram; Mel-Scalogram; Mel-Spectrogram; Convolutional Neural Network; Data Augmentation; Transfer Learning; Fine-tuning.
Páginas: 7
Código DOI: 10.21528/CBIC2025-1191224
Artigo em PDF: CBIC_2025_paper1191224.pdf
Arquivo BibTeX:
CBIC_2025_1191224.bib
