学术科研-Exploration versus Exploitation in Reinforcement Learning: a Stochastic Control Approach

学术科研

全部

Exploration versus Exploitation in Reinforcement Learning: a Stochastic Control Approach

演讲者：张佳琪（南科大）
时间：2024-01-18 13:00-14:00
地点：理学院大楼M5024

Abstract: We study the paper from Haoran Wang, Thaleia Zariphopoulou and XunYu Zhou. The paper considers reinforcement learning (RL) in continuous time and studies the problem of achieving the best trade-o between exploration of a black box environment and exploitation of current knowledge. They propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory formulation for the feature dynamics that captures repetitive learning under exploration. The resulting optimization problem is a revitalization of the classical relaxed stochastic control.