Analysis of the Information Plane for Deep Reinforcement Learning Using Proximal Policy Optimization

Título: Analysis of the Information Plane for Deep Reinforcement Learning Using Proximal Policy Optimization

Autores: Arthur Fernandes, Denis Gustavo Fantinato

Resumo: The Information Bottleneck (IB) principle has been explored in Deep Reinforcement Learning (DRL) approaches for control of the flow of information within the deep neural networks (DNNs). The application of IB in DNNs exhibited a considerable improvement in generalization and in the reduction of overfitting, allowing greater robustness against environmental variations. Regarding the supervised paradigm, the IB framework, along with its extension to the Information Plane, is also used for analysis and/or evaluation of the training efficiency, compression of information, performance in regression/classification, complexity of the DNN architecture, among others. However, for DRL, the analysis through the IB perspective is still incipient. In that sense, in this work, we analyze the use of the IB and the Information Plane in DRL, using the Proximal Policy Optimization (PPO) algorithm in the CartPole environment. The results show that the learning process in DRL exhibits phases of information compression and expansion within the network’s layers, mirroring certain observations from supervised learning. Interestingly, different layers exhibit varying information flow patterns, with complex, not well-behaved trajectories in the Information Plane, suggesting a layer-specific adaptation to the task.

Palavras-chave: Deep Reinforcement Learning; Proximal Policy Optimization; Information Bottleneck; Information Plane.

Páginas: 8

Código DOI: 10.21528/CBIC2025-1166383

Artigo em PDF: CBIC_2025_paper1166383.pdf

Arquivo BibTeX:
CBIC_2025_1166383.bib