全部

Exploration versus Exploitation in Reinforcement Learning: a Stochastic Control Approach

  • 演讲者:张佳琪(南科大)

  • 时间:2024-01-18 13:00-14:00

  • 地点:理学院大楼M5024

Abstract: We study the paper from Haoran Wang, Thaleia Zariphopoulou and XunYu Zhou. The paper considers reinforcement learning (RL) in continuous time and studies the problem of achieving the best trade-o  between exploration of a black box environment and exploitation of current knowledge. They propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory formulation for the feature dynamics that captures repetitive learning under exploration. The resulting optimization problem is a revitalization of the classical relaxed stochastic control.