Object Detection in Urban Video Scenes Using Deep Learning Models Based on the YOLO Architecture

Título: Object Detection in Urban Video Scenes Using Deep Learning Models Based on the YOLO Architecture

Autores: Delmário dos Santos Gomes Galvão, Luciano dos Santos Gomes, Eduardo Furtado de Simas Filho & André. G. S. Conceição

Resumo: In autonomous navigation systems, real-time environmental perception is essential for safe and efficient operation. Object detection enables agents to identify obstacles, interpret traffic, and adapt routes within very short time intervals. However, dynamic and unstructured environments introduce challenges such as illumination changes, occlusions, and object overlap, which compromise detection accuracy. Addressing these challenges, this paper presents a comparative analysis between deep learning-based models derived from the YOLO architecture, applied to object detection in urban scenarios. A customized dataset derived from COCO was used, including seven relevant classes (person, car, truck, bus, bicycle, dog, and motorcycle). Both models were trained under identical configurations — 50 epochs, 640×640 resolution, and GPU-based processing — and evaluated using real-world video footage from Salvador, Brazil. The results indicate that YOLOv8n achieved higher accuracy, with an Average Precision (AP) of 0.713, compared to 0.699 for YOLOv5n. However, YOLOv5n demonstrated faster inference, operating at 7.81 FPS versus 7.04 FPS for YOLOv8n. These findings highlight the trade-off between accuracy and speed, guiding the selection of appropriate models for computer vision applications.

Palavras-chave: YOLO; Object Detection; Computer Vision; Deep Learning; COCO Dataset.

Páginas: 8

Código DOI: 10.21528/CBIC2025-1191832

Artigo em PDF: CBIC_2025_paper1191832.pdf

Arquivo BibTeX:
CBIC_2025_1191832.bib