地点：腾讯会议 ID 437256016
Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in many complicated applications, due to Non-interactivity between agents, curse of dimensionality and computation complexity. Hence several decentralized MARL algorithms are motivated. However, existing decentralized methods only handling the fully cooperative setting where massive information needs to be transmitted in training. At the same time, they use the block coordinate gradient descent type algorithm to solve the cooperative MARL problem, which composed of successive independent actor and critic steps. Although this separation optimization scheme simplifies the solution calculation, it also introduces serious bias. We propose a fully decentralized actor-critic MARL framework, which can flexibly combine most of actor-critic methods and handle large-scale general cooperative multi-agent setting. Specifically, a primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, the framework can achieve scalability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning.