An Isolation Forest Approach for Robust Anomaly Detection in Industrial Machines Using Out-of-Distribution Acoustic Data

Título: An Isolation Forest Approach for Robust Anomaly Detection in Industrial Machines Using Out-of-Distribution Acoustic Data

Autores: Cristofer Silva, João Lucas Lopes Tavares Campos, Leonardo Afonso Ferreira Bortoni, Pablo Andretta Jaskowiak & Diego Pinheiro

Resumo: Anomaly detection, essential for predictive maintenance as Industry 4.0 advances, has become paramount in industrial environments, enabling early fault detection, reducing potential financial losses, and mitigating safety risks. Deep learning methods have attracted more attention for anomaly detection in industrial machines, given their convenient capability of automatic feature extraction. This convenience, however, often leads us to overlook simpler and more explainable models. Despite this consequence, the use and creation of simpler and more explainable machine learning pipelines are underexplored and underanalyzed compared to the pre-considered use of deep learning, especially in out-of-distribution scenarios. We hypothesized that a simple and explainable model using handcrafted extracted features can be at least non-inferior to deep learning models for anomaly detection in multivariate time series from industrial machinery. To validate the hypothesis, we compared the Isolation Forest—a simple and explainable model—combined with Mel-Frequency Cepstral Coefficients (MFCCs) as handcrafted features—with state-of-the-art deep learning models. Furthermore, we employed the Malfunctioning Industrial Machine Investigation and Inspection(MIMII) acoustic dataset, which provides sounds from valves, pumps, fans, and slide rails under normal and faulty conditions. The proposed methodology, based on Isolation Forest, is compared against state-of-the-art approaches, already implemented within the MTSA framework, namely, Hitachi, GANF, and RANSynCoders, considering two distinct scenarios: in-distribution (ID) and out-of-distribution (OOD). While in-distribution scenarios allow us to analyze the performance of machine learning models on data with distributions that almost mirror those of the training data, out-of-distribution scenarios allow us to go further and analyze the performance of models on data with distributions that deviate significantly from those of the training data, which brings us closer to the real-world application of the machine learning models. Our results support that Isolation Forest based on MFCCs is non-inferior to deep learning models. This showcases the effectiveness of a less complex method based on handcrafted features in industrial acoustic data and its capability of dealing with real-world scenarios.

Palavras-chave: anomaly detection; multivariate time series; acoustic data; MFCC; isolation forest; out-of-distribution.

Páginas: 7

Código DOI: 10.21528/CBIC2025-1175744

Artigo em PDF: CBIC_2025_paper1175744.pdf

Arquivo BibTeX:
CBIC_2025_1175744.bib