Object Detection and Multimodal Interaction in the NAO Robot Using YOLOv8

Título: Object Detection and Multimodal Interaction in the NAO Robot Using YOLOv8

Autores: Vitor Amadeu Souza, Hebert Azevedo Sá

Resumo: This article proposes integrating the YOLOv8 model into the NAO humanoid robot, developed by SoftBank Robotics, for real-time detection of everyday objects like chairs, keyboards, monitors, and people. Addressing the NAO’s computational limitations, a hybrid PC-NAO system was implemented, leveraging an external computer to run YOLOv8 and transfer results to the robot efficiently. This approach combines computer vision with the NAO’s voice synthesis, enabling a multimodal system that detects and verbally announces objects, enhancing human-robot interaction. The YOLOv8 model, pre-trained on the COCO dataset, was chosen for its efficiency and robustness in challenging conditions, such as varying lighting and partial occlusions. Experimental tests validated the system’s effectiveness, achieving average confidence scores of 0.65 for people and 0.62 for chairs, though monitors scored 0.42 due to reflections, indicating areas for improvement. The voice synthesis integration allowed the NAO to announce detected objects in real time, broadening its potential for applications like robotic assistance and autonomous navigation. This work advances visual perception in humanoid robots by exploring the underexplored synergy of vision and speech, contributing to the NAO platform’s capabilities in real-world settings. The proposed system demonstrates promise for assistive technologies, such as aiding the visually impaired, and enhances the robot’s interactivity in dynamic environments. By overcoming hardware constraints and integrating multimodal features, this research paves the way for more intelligent and responsive humanoid robots, with implications for fields like education and healthcare.

Palavras-chave: YOLOv8; NAO robot; object detection; computer vision; speech synthesis; human-robot interaction; assistive robotics; autonomous navigation; deep learning; multimodal systems.

Páginas: 8

Código DOI: 10.21528/CBIC2025-1173480

Artigo em PDF: CBIC_2025_paper1173480.pdf

Arquivo BibTeX:
CBIC_2025_1173480.bib