Speaker: Yuanhua Ni（Nankai University）
Time: Jul 19, 2023, 16:00-17:00
Location: Tencent meeting ID: 174807010
This paper aims to build a probabilistic framework for Howard's policy iteration algorithm using the language of forward-backward stochastic differential equations (FBSDEs). As opposed to conventional formulations based on partial differential equations, our FBSDE-based formulation can be easily implemented by optimizing criteria over sample data, and is therefore less sensitive to the state dimension. In particular, both on-policy and off-policy evaluation methods are discussed by constructing different FBSDEs. The backward-measurability-loss (BML) criterion is then proposed for solving these equations. By choosing specific weight functions in the proposed criterion, we can recover the popular Deep BSDE method or the martingale approach for BSDEs. The convergence results are established under both ideal and practical conditions, depending on whether the optimization criteria are decreased to zero. In the ideal case, we prove that the policy sequences produced by proposed FBSDE-based algorithms and the standard policy iteration have the same performance, and thus have the same convergence rate. In the practical case, the proposed algorithm is still proved to converge robustly under mild assumptions on optimization errors.
倪元华，毕业于中国科学院系统科学研究所，获理学博士学位，现为南开大学人工智能学院副教授、博士生导师，研究方向为运筹学与最优化、最优控制与强化学习、分布式优化与群体智能、随机系统与网络安全控制。主持国家自然科学基金面上项目二项，其中在研面上一项，曾获第22届关肇直奖（2016年度），理论研究结果丰富，发表SCI文章20余篇，在《IEEE Transactions on Automatic Control》、《Automatica》、《SIAM Journal on Control and Optimization》期刊上发表文章11篇，其中短文1篇。倪元华是期刊《System & Control Letters》和《系统科学与数学》的编委，中国自动化学会控制理论专业委员会随机学组委员、中国人工智能学会智能空天专业委员会委员等。