Research-Probability & Statistics Seminar

Nonstationary zero-sum Markov games with the probability criterion

Abstract

This talk concerns with a two-agent non-stationary discret-time stochastic game under the probability criterion, which focuses on the probability that the accumulated rewards of agent 1 (i.e., the costs of agent 2) exceed a prescribed threshold before the first passage into a target set. We first present two illustrative examples. The first one shows that the probability criterion breaks the implication from a nonzero-sum Nash equilibrium to a zero-sum saddle point. The second demonstrates that the non-stationary game can not be transformed into an equivalent stationary one via the standard state augmentation. Because of the non-stationariness, we introduce the notion of the n-th value of the game from time n onwards. Under a mild condition, we prove that the sequence of the n-th values is the unique solution of the system of Shapley equations for the probability criterion. From the system of Shapley equations, we establish the existence of the value and a saddle-point for the game, give an iteration algorithm for computing the approximation value and \epsilon-saddle-points of the game, and provide an explicit error bound. Finally, an energy management numerical example is presented to illustrate the theoretical results and the effectiveness of the proposed algorithm.